NVIDIA OmniDreams#

OmniDreams is a HDMap-conditioned world model for single-view and multi-view driving generation, with presets that balance visual fidelity and runtime throughput.

Teaser video source: OmniDreams project page.

Requirements#

  • Minimum VRAM: ~48 GB.

  • PyTorch: >= 2.11.

Installation#

# from the repo root
uv sync --project integrations/omnidreams

Running the method#

To run OmniDreams, launch one of the registered runner slugs. For example:

uv run --project integrations/omnidreams \
    flashdreams-run \
    omnidreams-sv-2steps-chunk2-loc6-lightvae-lighttae-perf \
    --example-data True \
    --example_data_uuid "239560dc-33d1-11ef-9720-00044bcbccac" \
    --total-blocks 20

Sample example-data UUIDs for the inference script are available in the nvidia/omni-dreams-samples Hugging Face dataset.

We provide the following variants:

Method

Description

omnidreams-sv-2steps-chunk2-loc6-lightvae-lighttae-perf

Single-view 2-step HDMap-conditioned I2V.

For multi-GPU inference, use:

uv run --project integrations/omnidreams \
    torchrun --nproc_per_node=4 --no-python flashdreams-run \
    omnidreams-sv-2steps-chunk2-loc6-lightvae-lighttae-perf \
    --example-data True \
    --example_data_uuid "239560dc-33d1-11ef-9720-00044bcbccac" \
    --total-blocks 20

To inspect all supported CLI arguments and their default values, run:

uv run --project integrations/omnidreams \
    flashdreams-run \
    omnidreams-sv-2steps-chunk2-loc6-lightvae-lighttae-perf \
    --help

Some generated samples from the above commands:

example_data_uuid: "239560dc-33d1-11ef-9720-00044bcbccac"
example_data_uuid: "24b84744-4156-11ef-b27d-00044bf655de"

Launch the interactive demo#

interactive-drive runs the OmniDreams single-view pipeline in a single process and streams the camera view to your browser. The demo machine only needs a CUDA-capable GPU – no graphics-capable GPU, display server, or Vulkan support are required.

The demo requires access to NVIDIA/flashdreams and an HF_TOKEN with read access to nvidia/omni-dreams-scenes (scene USDZs) and nvidia/omni-dreams-models (checkpoints).

First-time setup:

git clone https://github.com/NVIDIA/flashdreams.git
cd flashdreams
export HF_TOKEN=<your-hf-token>
uv sync --package flashdreams-omnidreams --extra interactive-drive

Optionally, pre-download scenes and checkpoints so the first launch isn’t blocked on network I/O:

uv run --package flashdreams-omnidreams omnidreams-prepare

Run the demo and stream to your browser:

uv run --package flashdreams-omnidreams interactive-drive --stream-mjpeg :8080

Then open http://<server-ip>:8080/ in any browser on the same network and pick a scene from the picker in the bottom-right.

Note

The first launch is slow. The first time you start the demo, the world model spends several minutes in a one-time optimization pass – checkpoint loading, torch.compile / CUDA-graph capture, and Triton autotuning – before the view becomes interactive. The on-screen indicator shows Loading world model... during warmup and then Optimizing world model... while the first generated chunk is autotuned; this phase is longest on the perf manifest. Subsequent launches are much faster because the compiled kernels and CUDA graphs are cached and reused.

Note

Add --offload-text-encoder to reduce peak VRAM usage by ~15 GB:

uv run --package flashdreams-omnidreams interactive-drive \
    --stream-mjpeg :8080 \
    --offload-text-encoder

The text and first-frame encoders are run once per scene and freed before the diffusion pipeline is built, and the resulting embeddings are cached and reused across world-model resets.

Trade-off: the world model is rebuilt on each scene load instead of staying resident, so the first load and scene/variant switches are slower. Prefer it when VRAM-constrained; otherwise leave it off for faster switching.

For execution using a consumer NVIDIA GPU that exposes a graphics stack, omit the --stream-mjpeg flag to open the demo in a local Vulkan window instead:

uv run --package flashdreams-omnidreams interactive-drive

The local window’s HUD adds a weather-variant selector (clear, rain, snow) next to the scene picker, so the same scene can be switched between conditions.

Note

The local window requires a display server and the system OpenGL / Vulkan client libraries. On Debian/Ubuntu:

sudo apt install -y libx11-6 libxcb1 libgl1 libglx-mesa0 libvulkan1

A Failed to initialize GLFW error indicates the display or one of these libraries are missing.

Steering wheel and game controller#

A steering wheel or game controller can be used to control the local window mode. Any device that Ubuntu detects as a standard game controller or joystick is viable. We provide a configuration tool to calibrate these:

uv run --package flashdreams-omnidreams interactive-drive-configuration

The demo auto-loads your default profile on subsequent launches. When you have more than one profile, the configuration tool’s start screen lists them with Make default (plus Edit and Delete) buttons – re-run the tool to choose which profile interactive-drive loads by default, tweak a profile (steering sensitivity, deadzone, buttons, force feedback), or remove one.

Multiple devices. A profile can bind controls across several devices – for example a wheel base plus a separately-connected or different-brand pedal set. Ctrl+click to select more than one device on the configuration tool’s device page; each control binds to whichever selected device it moves on.

Force feedback. The method is auto-detected per wheel: a driver-managed autocenter spring (Thrustmaster, Logitech) or a self-rendered constant force (Fanatec, which has no autocenter). FFB needs the vendor’s Linux driver and write access to /dev/input/* (add your user to the input group):

Vendor

Driver

Thrustmaster

Out-of-tree hid-tmff2 plus a wheel-mode init (hid-tminit, or tmdrv for TX / TS-XW), for modern wheels (T300RS, T248, TX, T-GT II, TS-PC, TS-XW, …).

Fanatec

hid-fanatecff with the base in PC mode (CSL DD, ClubSport, Podium, DD Pro).

Logitech

In-kernel hid-lg4ff or new-lg4ff (G29, G27, G923 PS); the G920 and Xbox/PC G923 use the HID++ driver (kernel 6.3+).

Native acceleration (perf manifest)#

The bundled example_world_model_perf.yaml manifest runs the DiT and LightVAE through the OmniDreams single-view CUDA extension (native_dit_acceleration: required), which is faster than the default PyTorch path. The extension builds against pinned checkouts of CUTLASS, SageAttention, SpargeAttn, and cudnn-frontend that are not vendored in the repo. omnidreams-prepare --perf clones them at their pinned commits into integrations/omnidreams/omnidreams_singleview/3rdparty/:

uv run --package flashdreams-omnidreams omnidreams-prepare --perf

This step only syncs sources; the extension itself compiles on the first launch that uses the manifest (one-time, a few minutes). It requires a Blackwell-class GPU (SM 12.0) or newer, a source checkout (the omnidreams_singleview sources ship only in the git tree, not the wheel), git, and a CUDA toolchain (nvcc) matching your PyTorch build. Then point the demo at the perf manifest:

uv run --package flashdreams-omnidreams interactive-drive \
    --manifest example_world_model_perf.yaml

native_dit_acceleration: required makes the manifest fail loudly if the extension can’t build or load, rather than silently falling back to PyTorch.

Alternative: WebRTC server#

For deployments that require a richer browser frontend with WebRTC’s lower video-delivery latency and a streaming gRPC service for multi-client setups, the standalone server at omnidreams.webrtc.server ships a polished HTML5 client on top of the same OmniDreams pipeline. The MJPEG path above is the recommended starting point for most users; consider WebRTC if you need bidirectional camera-control APIs or are already integrating the gRPC service into a larger product.

# from the repo root
uv run --package flashdreams-omnidreams torchrun --nproc_per_node 1 \
    -m omnidreams.webrtc.server \
    --host 0.0.0.0 --port 8089 \
    --pipeline_config_name omnidreams-sv-2steps-chunk2-loc6-lightvae-lighttae-perf \
    --scene-uuid "0d404ff7-2b66-498c-b047-1ed8cded60d4"

Sample scene UUIDs for the interactive server are available in the nvidia/omni-dreams-scenes Hugging Face dataset. Each scene ships clear, rain, and snow weather variants as sibling archives; add --scene-variant rain (or snow) to serve a specific one (the default is the clear-weather scene).

The server may take a few minutes to warm up. Once ready, it prints Connect via http://<server-ip>:8089/request_session. Here, <server-ip> is the server IP address you are connecting to (can use localhost when running locally).

Note

On a remote or cloud GPU instance (e.g. Brev), the server port is usually not reachable at the host IP directly. Forward it to your local machine first, then open http://localhost:8089/request_session:

# Brev
brev port-forward <instance> -p 8089:8089
# or plain SSH
ssh -L 8089:localhost:8089 <user>@<host>

Once successfully connected, the browser-based UI looks like this:

Note

If /request_session loads but the video never appears, the browser is likely obfuscating local IPs in WebRTC ICE candidates (replacing them with mDNS .local hostnames), which prevents the peer connection from completing. Disable the setting and reload:

  • Chrome / Edge: chrome://flags/#enable-webrtc-hide-local-ips-with-mdnsDisabled, then restart the browser.

  • Brave: brave://settings/privacy/securityWebRTC IP handling policyDefault public and private interfaces.

  • Firefox: about:configmedia.peerconnection.ice.obfuscate_host_addressesfalse.

Performance table#

Single-view latency on NVIDIA GB300 at 704 x 1280 resolution.

Stage

1x GPU

2x GPU

4x GPU

8x GPU

HDMap Encoder

28 ms

26 ms

26 ms

26 ms

Diffusion DiT

84 ms

71 ms

49 ms

47 ms

VAE Decoder

6 ms

5 ms

5 ms

5 ms

KV-cache Update

42 ms

34 ms

23 ms

22 ms

Total

118 ms

102 ms

80 ms

78 ms

Effective FPS

68

78

100

103

KV-cache Update is off the hot path and excluded from Total.

Further reading#

  • Interactive-drive latency tuning covers the supported interactive-drive latency knobs: model and backend choice, resolution, chunk-size constraints, FP8 and native acceleration, transport, and the validated GB300 reference.

Citation#

If you use OmniDreams, please cite the original work:

@misc{nvidia2026omnidreams,
  title={OmniDreams: Real-Time Generative Closed-Loop Autonomous Vehicle Simulation Built on NVIDIA Cosmos},
  author={Basant, Aarti and Kar, Amlan and Paschalidou, Despoina and Garcia Cobo, Guillermo and Turki, Haithem and Ling, Huan and Seo, Jaewoo and Wang, Jialiang and Lucas, James and Wu, Jay and Lorraine, Jonathan and Gao, Jun and He, Kai and Tothova, Katarina and Xie, Kevin and Tyszkiewicz, Michal and Wu, Qi and de Lutio, Riccardo and Li, Ruilong and Fidler, Sanja and Kim, Seung Wook and Shen, Tianchang and Cao, Tianshi and Pfaff, Tobias and Lew, William and Ren, Xuanchi and Lu, Yifan and Gojcic, Zan and Wang, Zian},
  year={2026},
  note={Technical report},
  howpublished={\url{https://research.nvidia.com/labs/sil/projects/omnidreams-blog/paper.pdf}}
}