Skip to main content
Spatius does not stream digital human video to the client. Instead, it renders the Avatar locally on the client. The cloud only transmits motion data, and client renders the Avatar in real time.

Overall Flow

For one Avatar response: the audio the Avatar should speak is sent to Motion Server, Motion Server generates motion data, AvatarKit plays the audio locally while rendering synchronized lip and body motion, and your app observes the process through state callbacks.

Core Components

AvatarKit Client SDK

The client SDK covers Web, iOS, Android, and Flutter. It renders the Avatar on the client and keeps the audio and digital human visuals synchronized. Reference: Web | iOS | Android | Flutter

Motion Server

Cloud service provided by Spatius.
  • Input: audio, the audio the digital human should speak, usually the TTS output from an ASR -> LLM -> TTS pipeline.
  • Output: motion data, about 20 KB/s, much lower than a video stream.
  • Role: drives AvatarKit on the client to produce the digital human speaking visuals. Audio-video synchronization is handled by the SDK, so application developers don’t need to manage it.

Integration Modes

Different integration modes match different development scenarios.
IntegrationDescription
Direct ModeThe simplest integration path. The client SDK connects directly to Motion Server, suitable for quickly integrating Web, iOS, Android, or Flutter apps.
LiveKit Agents IntegrationFor projects already using the LiveKit Agents framework. Add livekit-plugins-spatius in a few lines.
Agora Convo AI IntegrationFor projects already using Agora Convo AI. Configure Spatius as the Convo AI avatar provider, or use spatius_avatar_python when running a TEN Framework graph directly.
Backend ModeA custom integration path. Your backend uses the Server SDK and chooses a downstream transport such as your own WebSocket, LiveKit Room, or Agora channel.
For mode selection, see How to Integrate.