Audio is the source of the Avatar’s speech - the audio that the digital human should “speak”.Documentation Index
Fetch the complete documentation index at: https://docs.spatius.ai/llms.txt
Use this file to discover all available pages before exploring further.
Key Facts
- It is not user microphone audio: In voice-agent scenarios, user speech usually goes through ASR -> LLM -> TTS. The TTS output is the avatar speech audio. Spatius does not consume user microphone audio by default.
- It is sent to Motion Server: Avatar speech audio is the input to Motion Server. Motion Server uses it to generate synchronized motion data.
- Who sends it: In Basic Mode, the client sends it. In LiveKit Plugin, it is handled automatically. In Custom Mode, the developer maintains the flow.
Format
Motion Server accepts mono 16-bit PCM (s16le). Choose one of the following sample rates and configure it during session initialization:
8000 / 16000 / 22050 / 24000 / 32000 / 44100 / 48000 Hz
Audio is not resampled automatically. If the source does not match, convert it first. For details, see FAQ - Supported Audio Format.
Reference: Web Configuration | iOS AudioFormat | Android AudioFormat
