Televiz#

Televiz (isaacteleop.viz) is a lightweight compositor for Isaac Teleop. It composites camera and sensor feeds, plus 3D rendered content (gsplat, nvblox, neural reconstruction), into an XR headset, a desktop window, or an offscreen buffer, integrating directly with the device-tracking and retargeting pipeline.

It is a compositor, not a capture or streaming layer: it consumes GPU frames and assembles them into a final image. Camera capture, decode, and network transport live in the application (see Camera Streaming).

The compositor is implemented in C++ (namespace viz, built on Vulkan + OpenXR + CUDA with no external rendering-framework dependency) and exposed through a pybind11 binding. This page uses the Python API, which mirrors the C++ names one-to-one — see C++ API to link against the library directly.

Installation #

Televiz ships inside the isaacteleop wheel — install it from PyPI:

pip install isaacteleop

The published wheels (Linux x86_64 / aarch64, CPython 3.10–3.13) bundle the compiled isaacteleop.viz module, so no source build is required. Verify with:

import isaacteleop.viz as televiz

You only need to build from source when developing Isaac Teleop itself — that build enables Televiz automatically when Vulkan, the CUDA Toolkit, and glslangValidator are present.

Overview #

The central object is VizSession, which owns the Vulkan context, the display target, the OpenXR session (in XR mode), and a registry of layers. Content producers submit GPU buffers to layers; the session composites every layer into one frame each time you call render().

Two layer types are available:

QuadLayer — a CUDA-fed 2D texture plane (mono or stereo), optionally placed in 3D space. Use it for camera feeds.
ProjectionLayer — a full-view RGBD layer for external renderers (gsplat, nvblox, neural reconstruction) that produce per-view (color, depth) buffers. Use it to present a rendered 3D scene from the current head pose.

A session holds either one ProjectionLayer or any number of QuadLayer s, not both: quads composite into a shared render target, while a projection layer is presented directly (see ProjectionLayer).

All symbols are imported from the top-level module:

import isaacteleop.viz as televiz

Display modes #

A session runs in exactly one display mode, set on the config:

`DisplayMode`	Behavior
`kXr`	OpenXR + Vulkan. Per-eye swapchains, stereo rendering, depth composition layer. Requires a running OpenXR runtime (e.g. CloudXR).
`kWindow`	GLFW desktop window. Layers are aspect-fit tiled; stereo layers show the left eye.
`kOffscreen`	No display. Composite to an internal target and pull pixels back with `readback_to_host()`. Useful for tests and headless rendering.

Quick start #

A minimal offscreen render-and-readback (no GPU display, no headset):

import cupy as cp
import isaacteleop.viz as televiz

viz_cfg = televiz.VizSessionConfig()
viz_cfg.mode = televiz.DisplayMode.kOffscreen
viz_cfg.window_width = 1024
viz_cfg.window_height = 1024

session = televiz.VizSession.create(viz_cfg)

layer_cfg = televiz.QuadLayerConfig()
layer_cfg.name = "cam"
layer_cfg.resolution = televiz.Resolution(1024, 1024)
layer = session.add_quad_layer(layer_cfg)

# Any __cuda_array_interface__ array (CuPy / PyTorch / Numba) or a VizBuffer.
frame = cp.zeros((1024, 1024, 4), dtype=cp.uint8)   # RGBA8
layer.submit(frame)

info = session.render()                # wait + composite + present
img = session.readback_to_host()       # HostImage; numpy.asarray(img) for pixels

session.destroy()

For a window or headset, set mode to DisplayMode.kWindow or DisplayMode.kXr instead. The layer setup is identical; you just drive a frame loop (see Frame loop) rather than the one-shot readback_to_host(), which is offscreen-only.

Session configuration #

VizSessionConfig fields:

Field	Default	Description
`mode`	—	`DisplayMode.kXr` / `kWindow` / `kOffscreen`. Required.
`window_width` / `window_height`	—	Render size for window and offscreen modes. Ignored in XR (the runtime dictates per-eye resolution; query it with `get_recommended_resolution()`).
`app_name`	`"televiz"`	OpenXR application name.
`required_extensions`	`[]`	Extra OpenXR instance extensions to enable when Televiz hosts the session and downstream components (e.g. `TeleopSession` trackers) need them. Televiz already enables its own rendering extensions. See Sharing the XR session.
`xr_near_z` / `xr_far_z`	—	Near / far planes for the XR projection.
`xr_system_wait_seconds`	—	How long to wait for the OpenXR system (headset) to become available at create time.
`clear_color`	—	Background color as an `(r, g, b, a)` sequence in `[0, 1]`.
`gpu_timing`	—	Enable GPU timestamp queries, surfaced via `get_gpu_timing()`.

Construct the session with the factory; never call the class directly:

session = televiz.VizSession.create(cfg)

Layers #

Layers render in insertion order — the first added renders first (underneath). A layer is owned by the session; add_quad_layer returns a non-owning handle, so don’t keep it past the session’s lifetime.

QuadLayer #

A 2D plane fed by a CUDA buffer. Configure it with QuadLayerConfig:

Field	Default	Description
`name`	—	Layer name (used as the placement key in app config).
`resolution`	—	Source texture size, a `Resolution`. Submitted buffers must match it.
`format`	—	`PixelFormat` of the source (typically `kRGBA8`).
`placement`	—	Optional `QuadLayerPlacement` (`pose` + `size_meters`) for 3D placement in XR.
`stereo`	`False`	Per-eye stereo. When `True`, `submit` requires both eyes’ buffers; view 0 (left) samples the left buffer, view 1 (right) the right. Memory doubles.
`stereo_baseline_mm`	`0`	Horizontal disparity between the left/right planes (mm), along the placement’s local +x axis. `0` → both eyes see the same world quad. XR + stereo only.
`generate_mipmaps`	`True`	Allocate + regenerate a capped mip chain each frame; sampler uses trilinear filtering.

Submit and place a frame:

layer = session.add_quad_layer(layer_cfg)

# Mono: pass exactly one buffer. Stereo: layer.submit(left, right).
layer.submit(rgba_array)            # optional: stream=<cuda stream ptr>

# 3D placement (XR). Pose is OpenXR stage space: position (x,y,z),
# orientation quaternion (w,x,y,z). size_meters is (width, height).
placement = televiz.QuadLayerPlacement(
    televiz.Pose3D(position=(0.0, 0.0, -1.5), orientation=(1.0, 0.0, 0.0, 0.0)),
    size_meters=(1.0, 0.5625),
)
layer.set_placement(placement)
layer.set_visible(True)

submit(left, right=None, stream=0) accepts a VizBuffer or any __cuda_array_interface__ object; the binding converts it and releases the GIL across the copy. For a stereo layer both buffers are copied on the same stream and signaled together, so the renderer never sees a half-matched pair. Lock-mode placement strategies (world / head / lazy) are application policy and ship in the sample, not in the module.

ProjectionLayer #

A full-view RGBD layer for in-loop renderers — gsplat, nvblox, or neural reconstruction engines that produce per-view (color, depth) buffers. Configure it with ProjectionLayerConfig:

Field	Description
`name`	Layer name.
`view_resolution`	Per-view render resolution. Must equal `session.get_recommended_resolution()` — the layer’s images are copied 1:1 into the presentation swapchains (per-eye in XR). A mismatch is rejected by `add_projection_layer`.
`color_format`	`PixelFormat.kRGBA8`.
`depth_format`	`PixelFormat.kD32F` (default) so the depth reaches the XR runtime for positional reprojection, or `None` to present color only.
`stereo`	`True` for per-eye buffers. A stereo (XR) display requires a stereo layer; a mono layer is rejected at `add_projection_layer`.

Unlike QuadLayer, a projection layer is direct-present: each view’s (color, depth) is copied straight into the presentation swapchains (no shared render target). Because of that a session holds either one ProjectionLayer or any number of QuadLayer s, never both.

The renderer runs in-loop with the frame loop: read the predicted view poses from the FrameInfo returned by begin_frame(), render against them, then submit() before end_frame():

cfg = televiz.ProjectionLayerConfig()
cfg.view_resolution = session.get_recommended_resolution()
cfg.stereo = session.is_xr_mode()
layer = session.add_projection_layer(cfg)

while running:
    info = session.begin_frame()
    if info.should_render:
        # Render against THIS frame's per-eye poses (info.views[i].pose + .fov).
        color, depth = renderer.render(info.views)        # RGBA8 + D32F CUDA buffers
        if layer.stereo:
            layer.submit(left_color, left_depth, right_color, right_depth, stream=cuda_stream)
        else:
            layer.submit(color, depth, stream=cuda_stream)
    session.end_frame()

If the renderer is slower than display rate, the runtime / CloudXR paces the app via xrWaitFrame and reprojects the last submitted frame at display rate. In XR, a visible layer that does not submit for a frame presents nothing (the swapchains are cleared) rather than reproject stale RGBD under a new pose.

Frame loop #

Two API levels drive the frame loop. Both release the GIL during blocking waits.

Convenience — render() does wait + composite + present in one call and returns a FrameInfo. Internally it checks should_render and skips the GPU pass when the runtime says the frame won’t be visible; producers’ submit writes still land in the back buffer.

while running:
    cam_layer.submit(camera_frame)
    info = session.render()

Explicit — begin_frame() / end_frame(), when the app needs the FrameInfo before submitting (e.g. to read the predicted view poses before rendering, or to skip expensive decode when not visible):

while running:
    info = session.begin_frame()
    if info.should_render:
        cam_layer.submit(decode_camera())   # skip decode when not visible
    session.end_frame()

FrameInfo carries frame_index, predicted_display_time (XR time in ns; 0 outside XR), delta_time (CPU wall-clock seconds — usable without any XR knowledge), should_render, resolution, and views. Each ViewInfo in views has viewport, fov, and pose — 2 entries in XR stereo, 1 (identity pose) in window / offscreen.

Session state #

A session moves through SessionState:

kUninitialized → kReady → kRunning → kStopping → kLost → kDestroyed

kReady after create — add layers and submit content.
kRunning once the frame loop is active.
kStopping (XR) — the runtime is stopping; end_frame submits empty frames.
kLost (XR) — the session was lost; render / begin_frame raise. Destroy and recreate the VizSession (Televiz supports clean in-process recreation).
kDestroyed after destroy.

Query it with get_state(); in window mode should_close() reports the window-close request. OpenXR events are polled inside begin_frame, which drives the XR-specific transitions.

API reference #

VizSession #

create(config) -> VizSession (static) — validate config + initialize Vulkan / display backend.
render() -> FrameInfo — wait + composite + present.
begin_frame() -> FrameInfo / end_frame() — explicit two-phase frame loop.
add_quad_layer(config) -> QuadLayer — construct + register a layer; returns a non-owning handle.
readback_to_host() -> HostImage — most recent frame as RGBA8 host pixels (kOffscreen only).
get_state() -> SessionState, should_close() -> bool, is_xr_mode() -> bool.
get_recommended_resolution() -> Resolution — runtime per-eye resolution (XR).
head_pose_now() -> Optional[Pose3D] — current head pose (kXr only; None on tracking loss).
get_oxr_handles() -> Optional[tuple] — (instance, session, space, proc_addr) as raw uint64.
get_frame_timing_stats() -> FrameTimingStats / get_gpu_timing() -> GpuFrameTiming.
destroy() — release all resources (idempotent).
Properties vk_device / vk_physical_device / vk_queue_family_index — raw handles for wiring Televiz into a foreign Vulkan app. Most users won’t touch these.

QuadLayer #

submit(left, right=None, stream=0) — submit a frame (mono: left only; stereo: both).
set_placement(placement) / placement() — 3D placement (None → fullscreen, window mode).
set_visible(visible) / is_visible().
Properties resolution, format, aspect_ratio, name.

Data types #

VizBuffer — non-owning 2D pixel buffer descriptor. Device buffers expose __cuda_array_interface__ (cupy.asarray(buf)); host buffers expose __array_interface__ (numpy.asarray(buf)).
HostImage — owning host pixel buffer returned by readback_to_host; wrap with numpy.asarray.
Resolution (width, height), Pose3D (position, orientation as (w, x, y, z)), Fov, ViewInfo.
Enums DisplayMode, PixelFormat (kRGBA8 / kD32F), MemorySpace, SessionState.

C++ API #

Televiz is a C++ library; isaacteleop.viz is a thin pybind11 binding over it. The Python and C++ APIs share the same type and method names (VizSession, QuadLayer, submit, set_placement, DisplayMode::kXr, …), so everything on this page maps directly to C++. All symbols live in namespace viz, and headers use nested include paths:

#include <viz/session/viz_session.hpp>
#include <viz/layers/quad_layer.hpp>
#include <viz/core/viz_buffer.hpp>

BUILD_VIZ auto-enables when Vulkan, the CUDA toolkit, and glslangValidator are detected (force it with -DBUILD_VIZ=ON / -DBUILD_VIZ=OFF); link the relevant CMake target:

Target	Alias	Provides
`viz_core`	`viz::core`	Core types (`VizBuffer`, `Pose3D`, `HostImage`, `DeviceImage`) and Vulkan / CUDA infrastructure
`viz_layers`	`viz::layers`	`LayerBase` and the built-in layers (`QuadLayer`, …)
`viz_session`	`viz::session`	`VizSession`, the compositor, `FrameInfo`, window / offscreen backends
`viz_xr`	`viz::xr`	OpenXR backend — per-eye swapchains, depth composition layer

Public headers live under src/viz/<module>/cpp/inc/viz/. One difference from the Python bindings: in C++, layers are added with a single templated VizSession::add_layer<L>(args...) method, which also accepts your own LayerBase subclasses — the route for plugging in a custom renderer. See Build System for build options and output locations.

More information #

Camera Streaming — the reference camera_viz sample built on Televiz
Teleop Session — how TeleopSession works and how to share its OpenXR session
src/viz/ — module source, organized as core / layers / session / xr / shaders / python sub-modules