Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.spatius.ai/llms.txt

Use this file to discover all available pages before exploring further.

Audio is the source of the Avatar’s speech - the audio that the digital human should “speak”.

Key Facts

  • It is not user microphone audio: In voice-agent scenarios, user speech usually goes through ASR -> LLM -> TTS. The TTS output is the avatar speech audio. Spatius does not consume user microphone audio by default.
  • It is sent to Motion Server: Avatar speech audio is the input to Motion Server. Motion Server uses it to generate synchronized motion data.
  • Who sends it: In Basic Mode, the client sends it. In LiveKit Plugin, it is handled automatically. In Custom Mode, the developer maintains the flow.

Format

Motion Server accepts mono 16-bit PCM (s16le). Choose one of the following sample rates and configure it during session initialization: 8000 / 16000 / 22050 / 24000 / 32000 / 44100 / 48000 Hz Audio is not resampled automatically. If the source does not match, convert it first. For details, see FAQ - Supported Audio Format. Reference: Web Configuration | iOS AudioFormat | Android AudioFormat