> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spatius.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Audio

> Audio input rules for Spatius: source timing, PCM format, buffering, interruption, and lip-sync guidance.

export const AudioSendTimingDiagram = () => {
  return <div className="spatius-diagram spatius-audio-timing-diagram not-prose" aria-label="Comparison of generated audio and paced audio for client playback buffering">
      <svg viewBox="0 0 980 980" role="img">
        <defs>
          <marker id="spatius-audio-timing-good" viewBox="0 0 12 10" refX="10.5" refY="5" markerWidth="4.5" markerHeight="6" orient="auto-start-reverse">
            <path d="M0 0L12 5L0 10Z" fill="var(--spatius-diagram-ink)" />
          </marker>
          <marker id="spatius-audio-timing-bad" viewBox="0 0 12 10" refX="10.5" refY="5" markerWidth="4.5" markerHeight="6" orient="auto-start-reverse">
            <path d="M0 0L12 5L0 10Z" fill="var(--spatius-diagram-warning)" />
          </marker>
        </defs>

        <text x="305" y="54" textAnchor="middle" fill="var(--spatius-diagram-red)" fontSize="24" fontWeight="600">Generated Audio Path</text>
        <text x="770" y="54" textAnchor="middle" fill="var(--spatius-diagram-brand)" fontSize="24" fontWeight="600">Inference + Playback</text>

        <rect x="40" y="100" width="530" height="340" rx="18" fill="var(--spatius-diagram-owned)" stroke="var(--spatius-diagram-stroke)" strokeWidth="3" />
        <text x="305" y="136" textAnchor="middle" fill="var(--spatius-diagram-red)" fontSize="18" fontWeight="600">Works: send new TTS chunks immediately</text>

        <rect x="80" y="180" width="170" height="102" rx="12" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-stroke)" strokeWidth="3" />
        <text x="165" y="222" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="28" fontWeight="500">TTS</text>
        <text x="165" y="254" textAnchor="middle" fill="var(--spatius-diagram-muted)" fontSize="18" fontWeight="500">new chunks</text>

        <rect x="360" y="180" width="170" height="102" rx="12" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-stroke)" strokeWidth="3" />
        <text x="445" y="220" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="25" fontWeight="500">SDK send</text>
        <text x="445" y="252" textAnchor="middle" fill="var(--spatius-diagram-muted)" fontSize="18" fontWeight="500">immediate</text>

        <polyline points="262,231 348,231" className="spatius-diagram-flow" fill="none" stroke="var(--spatius-diagram-ink)" strokeWidth="4" strokeLinecap="round" markerEnd="url(#spatius-audio-timing-good)" />

        <text x="85" y="334" fill="var(--spatius-diagram-muted)" fontSize="17" fontWeight="600">generation time</text>
        <line x1="85" y1="360" x2="525" y2="360" stroke="var(--spatius-diagram-stroke)" strokeWidth="4" strokeLinecap="round" />
        <g className="spatius-timing-fast-chunks">
          <rect x="255" y="336" width="48" height="48" rx="8" fill="var(--spatius-diagram-managed)" stroke="var(--spatius-diagram-brand)" strokeWidth="2.5" />
          <text x="279" y="367" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="18" fontWeight="700">1</text>
          <rect x="318" y="336" width="48" height="48" rx="8" fill="var(--spatius-diagram-managed)" stroke="var(--spatius-diagram-brand)" strokeWidth="2.5" />
          <text x="342" y="367" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="18" fontWeight="700">2</text>
          <rect x="381" y="336" width="48" height="48" rx="8" fill="var(--spatius-diagram-managed)" stroke="var(--spatius-diagram-brand)" strokeWidth="2.5" />
          <text x="405" y="367" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="18" fontWeight="700">3</text>
        </g>
        <text x="305" y="416" textAnchor="middle" fill="var(--spatius-diagram-muted)" fontSize="17" fontWeight="500">chunks arrive ahead of playback</text>

        <rect x="620" y="100" width="300" height="340" rx="18" fill="var(--spatius-diagram-managed)" stroke="var(--spatius-diagram-brand)" strokeWidth="3" />
        <rect x="670" y="158" width="200" height="120" rx="12" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-brand)" strokeWidth="3" />
        <text x="770" y="203" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="28" fontWeight="500">
          <tspan x="770">Motion</tspan>
          <tspan x="770" dy="36">Server</tspan>
        </text>
        <g className="spatius-timing-window-pulse">
          <rect x="682" y="318" width="52" height="46" rx="8" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-brand)" strokeWidth="2.5" />
          <rect x="744" y="318" width="52" height="46" rx="8" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-brand)" strokeWidth="2.5" />
          <rect x="806" y="318" width="52" height="46" rx="8" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-brand)" strokeWidth="2.5" />
        </g>
        <text x="770" y="396" textAnchor="middle" fill="var(--spatius-diagram-muted)" fontSize="17" fontWeight="500">buffer stays ahead</text>

        <polyline points="540,231 658,231" className="spatius-diagram-flow" fill="none" stroke="var(--spatius-diagram-ink)" strokeWidth="4" strokeLinecap="round" markerEnd="url(#spatius-audio-timing-good)" />
        <circle className="spatius-timing-packet spatius-timing-packet-good" cx="550" cy="231" r="9" fill="var(--spatius-diagram-brand)" />
        <circle className="spatius-timing-packet spatius-timing-packet-good spatius-timing-delay-1" cx="550" cy="231" r="9" fill="var(--spatius-diagram-brand)" />
        <circle className="spatius-timing-packet spatius-timing-packet-good spatius-timing-delay-2" cx="550" cy="231" r="9" fill="var(--spatius-diagram-brand)" />
        <text x="600" y="207" textAnchor="middle" fill="var(--spatius-diagram-muted)" fontSize="17" fontWeight="600">fast</text>

        <text x="305" y="526" textAnchor="middle" fill="var(--spatius-diagram-warning)" fontSize="24" fontWeight="600">Paced Audio Path</text>
        <text x="770" y="526" textAnchor="middle" fill="var(--spatius-diagram-warning)" fontSize="24" fontWeight="600">Playback Buffer</text>

        <rect x="40" y="570" width="530" height="340" rx="18" fill="var(--spatius-diagram-warning-surface)" stroke="var(--spatius-diagram-warning-stroke)" strokeWidth="3" />
        <text x="305" y="606" textAnchor="middle" fill="var(--spatius-diagram-warning)" fontSize="18" fontWeight="600">Avoid: send 1x playback-speed output</text>

        <rect x="80" y="650" width="170" height="102" rx="12" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-stroke)" strokeWidth="3" />
        <text x="165" y="690" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="24" fontWeight="500">Paced audio</text>
        <text x="165" y="722" textAnchor="middle" fill="var(--spatius-diagram-muted)" fontSize="18" fontWeight="500">already heard</text>

        <rect x="360" y="650" width="170" height="102" rx="12" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-stroke)" strokeWidth="3" />
        <text x="445" y="690" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="25" fontWeight="500">Decode</text>
        <text x="445" y="722" textAnchor="middle" fill="var(--spatius-diagram-muted)" fontSize="18" fontWeight="500">paced at 1x</text>

        <polyline points="262,701 348,701" className="spatius-diagram-flow spatius-timing-flow-bad" fill="none" stroke="var(--spatius-diagram-warning)" strokeWidth="4" strokeLinecap="round" markerEnd="url(#spatius-audio-timing-bad)" />

        <text x="85" y="804" fill="var(--spatius-diagram-muted)" fontSize="17" fontWeight="600">playback time</text>
        <line x1="85" y1="830" x2="525" y2="830" stroke="var(--spatius-diagram-stroke)" strokeWidth="4" strokeLinecap="round" />
        <g className="spatius-timing-slow-chunks">
          <rect x="225" y="806" width="48" height="48" rx="8" fill="var(--spatius-diagram-warning-surface)" stroke="var(--spatius-diagram-warning)" strokeWidth="2.5" />
          <text x="249" y="837" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="18" fontWeight="700">1</text>
          <rect x="347" y="806" width="48" height="48" rx="8" fill="var(--spatius-diagram-warning-surface)" stroke="var(--spatius-diagram-warning)" strokeWidth="2.5" />
          <text x="371" y="837" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="18" fontWeight="700">2</text>
          <rect x="469" y="806" width="48" height="48" rx="8" fill="var(--spatius-diagram-warning-surface)" stroke="var(--spatius-diagram-warning)" strokeWidth="2.5" />
          <text x="493" y="837" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="18" fontWeight="700">3</text>
        </g>
        <text x="305" y="886" textAnchor="middle" fill="var(--spatius-diagram-muted)" fontSize="17" fontWeight="500">chunks arrive only when listeners hear them</text>

        <rect x="620" y="570" width="300" height="340" rx="18" fill="var(--spatius-diagram-warning-surface)" stroke="var(--spatius-diagram-warning)" strokeWidth="3" />
        <rect x="670" y="625" width="200" height="118" rx="12" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-warning)" strokeWidth="3" />
        <text x="770" y="670" textAnchor="middle" fill="var(--spatius-diagram-ink)" fontSize="28" fontWeight="500">
          <tspan x="770">Motion</tspan>
          <tspan x="770" dy="36">Server</tspan>
        </text>
        <text x="770" y="772" textAnchor="middle" fill="var(--spatius-diagram-warning)" fontSize="20" fontWeight="700">motion returned</text>
        <text x="770" y="884" textAnchor="middle" fill="var(--spatius-diagram-muted)" fontSize="18" fontWeight="500">playback can stall</text>

        <polyline points="540,701 658,701" className="spatius-diagram-flow spatius-timing-flow-bad" fill="none" stroke="var(--spatius-diagram-warning)" strokeWidth="4" strokeLinecap="round" markerEnd="url(#spatius-audio-timing-bad)" />
        <circle className="spatius-timing-packet spatius-timing-packet-bad" cx="550" cy="701" r="9" fill="var(--spatius-diagram-warning)" />
        <text x="600" y="677" textAnchor="middle" fill="var(--spatius-diagram-muted)" fontSize="17" fontWeight="600">1x</text>

        <g className="spatius-timing-window-pulse">
          <rect x="682" y="802" width="52" height="46" rx="8" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-warning)" strokeWidth="2.5" />
          <rect x="744" y="802" width="52" height="46" rx="8" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-warning)" strokeWidth="2.5" />
          <rect x="806" y="802" width="52" height="46" rx="8" fill="var(--spatius-diagram-node)" stroke="var(--spatius-diagram-warning)" strokeWidth="2.5" />
        </g>
        <g className="spatius-timing-blocker">
          <line x1="742" y1="795" x2="798" y2="855" stroke="var(--spatius-diagram-warning)" strokeWidth="8" strokeLinecap="round" />
          <line x1="798" y1="795" x2="742" y2="855" stroke="var(--spatius-diagram-warning)" strokeWidth="8" strokeLinecap="round" />
        </g>

      </svg>
    </div>;
};

**Audio** is the source of the Avatar's speech - the audio that the digital human should "speak".

## Key Facts

* **It is not user microphone audio**: In voice-agent scenarios, user speech usually goes through ASR -> LLM -> TTS. **The TTS output** is the avatar speech audio. Spatius does not consume user microphone audio by default.
* **It is sent to Motion Server**: Avatar speech audio is the input to Motion Server. Motion Server uses it to generate synchronized motion data.
* **Who sends it**: In [Direct Mode](/direct-mode/overview), the client sends it. In [LiveKit Agents Integration](/livekit-agents/overview), it is handled by `livekit-plugins-spatius`. In [Agora Convo AI Integration](/agora-convoai/overview), it is handled by the Spatius avatar provider or `spatius_avatar_python` for direct TEN Framework graphs. In [Backend Mode](/backend-mode/overview), the developer maintains the flow.
* **Do not pace it like playback audio**: Send avatar speech audio when it is generated. Do not wait for real-time playback timing or feed paced playback output, such as RTC or WebSocket audio that arrives at 1x speed, back into Spatius.

## Send timing

Send avatar speech audio to Spatius when the speech audio is produced, not when the audio would be heard during playback.

Motion Server is an audio-to-motion converter. It performs inference over buffered audio windows and can generate motion data from any valid audio you send, even if the audio arrives slowly. The problem with 1x playback-speed input happens later, when AvatarKit tries to play audio and motion data in sync.

Avatar playback consumes ready audio and motion continuously. Motion Server, however, returns motion after it has enough audio buffered for the next inference window. The first inference window is optimized for fast startup, while later windows are larger. If your input arrives only at playback speed, the client may start playback from the first ready segment, then consume that segment before the next motion segment is ready. The result is a playback buffer stall.

For TTS output, send chunks at **TTS generation speed**, which is usually faster than real-time playback speed. This gives Motion Server enough audio ahead of playback so AvatarKit can keep a healthy audio + motion buffer.

<Warning>
  Do not pace audio sends by wall-clock playback time, and do not capture paced playback output and feed it back into AvatarKit as "new" audio. This includes RTC output and WebSocket APIs that return audio at 1x playback speed. Motion Server can still infer motion from that audio, but AvatarKit playback is likely to stall because the next synchronized audio + motion segment may not be ready before the current one is consumed.
</Warning>

Use this rule of thumb:

* **Good**: TTS provider emits a PCM chunk -> send that chunk to AvatarKit or the Server SDK immediately.
* **Good**: The final TTS chunk is sent with the SDK's end-of-input flag (`end: true`, `end=True`, or platform equivalent).
* **Avoid**: Receiving paced audio at playback speed, decoding it, and sending it to AvatarKit as the source audio.
* **Avoid**: Sleeping between chunks to match the chunk's audio duration.

<Frame>
  <AudioSendTimingDiagram />
</Frame>

## If you only have paced audio

If your only available source is paced playback-speed audio, add a **pre-buffer** before sending it to AvatarKit or the Server SDK.

Instead of forwarding each 1x chunk immediately, accumulate enough audio locally to cover Motion Server's startup window and give the next inference window time to complete. Then start sending from that local buffer while continuing to fill it from the paced source. This adds startup latency, but it gives AvatarKit a much better chance of keeping synchronized audio + motion data available during playback.

Use this pre-buffer at the start of **every turn**. Do not only pre-buffer the first response in a session; each new avatar speech turn needs its own buffer before playback begins.

As a practical starting point, pre-buffer **3.5 seconds** of audio before sending it. If you want a safer value, use **4 seconds**. Treat this duration as an application-level tuning value rather than a fixed SDK constant. Increase it if you still observe playback stalls, and reset it when you interrupt or start a new turn.

## Format

Motion Server accepts **mono 16-bit PCM (`s16le`)**. Choose one of the following sample rates and configure it during session initialization:

`8000` / `16000` / `22050` / `24000` / `32000` / `44100` / `48000` Hz

Audio is not resampled automatically. If the source does not match, convert it first. For details, see [FAQ - Supported Audio Format](/resources/faq#supported-audio-format).

Reference: [Web `Configuration`](/sdk-reference/web-sdk/reference#configuration) | [iOS `AudioFormat`](/sdk-reference/ios-sdk/api-reference#audioformat) | [Android `AudioFormat`](/sdk-reference/android-sdk/api-reference#audioformat) | [Flutter `AudioFormat`](/sdk-reference/flutter-sdk/api-reference#initialize)
