Server runtime#
The server-runtime/ package hosts the XR-Media-Hub — the single process
clients connect to and agents fan out from. It owns the internal LiveKit
transport, the shared-memory + ZMQ IPC boundary to agents, the per-participant
return path, and the same-origin wss:// proxy that fronts LiveKit signaling.
It runs as one process:
uv run xr_media_hub # auto-discovers ./xr_media_hub.yaml
uv run xr_media_hub --config path.yaml # explicit config
python -m xr_media_hub # equivalent module form
Configuration comes from a xr_media_hub.yaml file (defaults are used when
none is found). server-runtime/xr_media_hub.yaml is the reference copy
documenting every field; each sample ships its own copy under its yaml/
directory. Relative paths inside the YAML (such as web_client_dir) resolve
against the YAML file’s own directory, not the working directory.
For where the hub sits in the wider system, refer to Architecture.
XR-Media-Hub#
The hub is one hub, many clients, many agents. A single instance fans the inbound media stream out to every connected agent and routes any return traffic back to the originating client only.
On startup, __main__.py constructs a HubEndpoint (the IPC server),
registers hub-local callbacks (on_frame, on_audio, on_data,
on_participant), loads the configuration, and brings up the LiveKitConnector.
The hub task and the connector task then run concurrently until SIGINT /
SIGTERM. A periodic stats loop logs per-participant video, audio, and data
rates.
Isolation contract#
The hub is not a routing switch between participants. There is no supported path for participant A’s media or data to reach participant B. The only supported flow is:
participant → hub → consumer (agent) → hub → same participant
This is enforced at several layers:
send_return_audio,send_return_data, andsend_return_audio_flushvalidate that the target participant is currently connected; messages for unknown participants are dropped with a warning.Return-traffic topics (
return_audio.*,return_audio_flush.*,return_data.*) are connector-only — an agent’s default subscription excludes them.On the LiveKit side, return audio is published as one track per participant with subscribe permissions restricted to that participant, and return data is addressed with
destination_identities(refer to the per-participant return path).
Note
This isolation is a property of the hub’s routing, not a limitation of the transport. LiveKit natively supports client-to-client communication, and an application is free to use those native features directly for peer-to-peer media or data. Doing so is outside the scope of XR AI: the hub neither routes nor guarantees that traffic, and because the transport is an implementation detail, a future streaming backend may not offer the same client-to-client capability. Build on the hub’s participant ↔ agent contract for behavior that ports across backends.
Internal LiveKit transport#
LiveKit is an internal transport implementation detail. It is not exposed to the agent or MCP layer — agents only ever speak the IPC protocol below, and never need to know which transport carries the media.
LiveKitConnector (transport/livekit/) owns the transport lifecycle:
Starts the LiveKit server in a Docker container, listening on the loopback interface only (signaling
ws://127.0.0.1:7880, plus WebRTC TCP/UDP media ports 7881/7882).Optionally starts the browser-facing web server and/or token server.
Registers itself as a
ConnectorEndpointwith the IPC layer.Connects a Python
RoomClientto the LiveKit room. The room client is subscribe-only — it never publishes media of its own except per-participant return-audio tracks.
The connector translates LiveKit room events into IPC messages: it pushes decoded frames, audio chunks, and data into the hub, and emits participant join and leave events.
Note
The LiveKit connector requires NVENC and NVDEC hardware video codecs, which it checks at startup.
IPC boundary to agents#
The hub and its producers and consumers communicate over ZMQ using msgpack-encoded
messages. The layer lives in server-runtime/xr_media_hub/ipc/ and defines
three endpoints:
Endpoint |
Role |
Who |
|---|---|---|
|
producer + return-traffic receiver |
LiveKit connector process |
|
server: dispatch + fan-out |
XR-Media-Hub process |
|
subscriber + publisher |
agents, analytics, downstream processors |
The hub binds two sockets (defaults shown):
PULLonipc:///tmp/xr_hub_in— connectorsPUSHinbound media here.PUBonipc:///tmp/xr_hub_pub— consumersSUBhere for the fanned-out stream.
connector_A ──PUSH──┐
connector_B ──PUSH──┤─► PULL HubEndpoint PUB ──SUB──► consumers (agents)
connector_N ──PUSH──┘ ↓ dispatch
on_frame / on_audio / on_data / on_participant
Each connector owns and creates its own shared-memory ring buffer and
announces it to the hub with a ConnectorRegistration message; the hub opens
that buffer on demand. Video frames travel zero-copy through the ring buffer:
the connector writes pixels into a slot and pushes a lightweight
FRAME_SIGNAL (metadata) at full frame rate; consumers that want the pixels
issue a FRAME_REQUEST, and the hub replies with the held slot’s
FRAME_DATA. Audio, data, and participant events are carried inline as
msgpack payloads.
Messages are tagged with a MsgType and routed by topic. Topics follow the
"<type>.<participant_id>.<track_or_topic>" convention, and ZMQ’s byte-prefix
subscription lets a consumer subscribe at any granularity:
audio — all audio, all participants
audio.alice — all of alice's audio tracks
audio.alice.TR_mic_001 — alice's specific mic track
data.alice.chat — alice's "chat" data channel only
participant — join/leave events
The message types and codec are extensible: new MsgType IDs can be
registered at import time via register_encoder and register_decoder.
Note
Agent code should import the IPC types and ProcessorEndpoint from
xr_ai_agent directly, not from xr_media_hub.ipc. The agent SDK’s only
runtime dependencies are pyzmq and msgpack — importing from the agent SDK
avoids pulling in the full server-runtime dependency tree (LiveKit, FastAPI,
uvicorn, GPU codecs). xr_media_hub.ipc re-exports the same names for the
server side.
Per-participant return path#
Agents send audio, data, and flush signals back toward a specific participant through the same IPC channel. The hub guards every return path by participant id:
send_return_audiopublishes on topicreturn_audio.{pid}., dropping the chunk if{pid}is not connected.send_return_datapublishes onreturn_data.{pid}.{topic}, with the same connectivity guard.send_return_audio_flushpublishes onreturn_audio_flush.{pid}.so a processor can cleanly interrupt the agent’s own audio playback.
The trailing . after the participant id terminates the pid segment so that a
subscription for alice does not byte-prefix-match a topic addressed to
alice2; the connector subscribes with the identical delimiter when a
participant joins, and unsubscribes when they leave.
On the LiveKit side the return path maps to per-participant resources: the room
client lazily publishes one xr-hub-return-{pid} audio track per participant
and refreshes subscribe permissions so each participant may subscribe only to
their own return track. Return data is sent with destination_identities set
to the target participant, so it is never broadcast to peers. Return audio is
paced into LiveKit at audio rate by a per-participant pipe, which a flush can
drain to interrupt playback.
Same-origin wss proxy#
LiveKit server itself runs plain ws:// on the loopback interface
(127.0.0.1:7880) and nothing reaches that port from off-box. External
clients — browser, web-xr, Android, iOS, visionOS — connect only to a
same-origin wss:// URL exposed by the hub’s web server.
When web_server_tls is enabled (the default), the web server
(_web_server.py) terminates TLS on web_server_port (8080 by default) and
mounts a /rtc route that proxies LiveKit signaling bidirectionally to the
internal ws://127.0.0.1:7880 (_lk_proxy.py). A self-signed certificate is
auto-generated on first run; supply cert_file and key_file to use your own.
The proxy forwards end-to-end headers so SDK authentication (such as the LiveKit
Swift SDK’s Authorization: Bearer) reaches the server, and handles both the
versioned (/rtc/v1) and legacy (/rtc) signaling paths.
The web server’s /token endpoint returns a signed LiveKit JWT together with
the URL the client should use. With TLS on, that URL is the same-origin
wss://<host>:<web_server_port> — so the client SDK never needs a
per-deployment toggle. A /cert endpoint serves the active certificate as an
installable iOS profile.
Set web_server_tls: false for the two cases where the hub should not
terminate TLS itself: a TLS-terminating reverse proxy (nginx, Caddy,
Cloudflare Tunnel) sits in front and speaks plain http:// + ws:// to the
hub on the loopback, or localhost-only development where browsers already grant
camera and microphone access on http://localhost. In that mode /token
returns a plain
ws:// URL and the proxy carries plain WebSocket.
For runtime symptoms and fixes, refer to Troubleshooting.