Overall Flow
For one Avatar response: the audio the Avatar should speak is sent to Motion Server, Motion Server generates motion data, AvatarKit plays the audio locally while rendering synchronized lip and body motion, and your app observes the process through state callbacks.
Core Components
AvatarKit Client SDK
The client SDK covers Web, iOS, Android, and Flutter. It renders the Avatar on the client and keeps the audio and digital human visuals synchronized. Reference: Web | iOS | Android | FlutterMotion Server
Cloud service provided by Spatius.
- Input: audio, the audio the digital human should speak, usually the TTS output from an ASR -> LLM -> TTS pipeline.
- Output: motion data, about 20 KB/s, much lower than a video stream.
- Role: drives AvatarKit on the client to produce the digital human speaking visuals. Audio-video synchronization is handled by the SDK, so application developers don’t need to manage it.
Integration Modes
Different integration modes match different development scenarios.| Integration | Description |
|---|---|
| Direct Mode | The simplest integration path. The client SDK connects directly to Motion Server, suitable for quickly integrating Web, iOS, Android, or Flutter apps. |
| LiveKit Agents Integration | For projects already using the LiveKit Agents framework. Add livekit-plugins-spatius in a few lines. |
| Agora Convo AI Integration | For projects already using Agora Convo AI. Configure Spatius as the Convo AI avatar provider, or use spatius_avatar_python when running a TEN Framework graph directly. |
| Backend Mode | A custom integration path. Your backend uses the Server SDK and chooses a downstream transport such as your own WebSocket, LiveKit Room, or Agora channel. |