Skip to main content

What is Direct Mode Integration?

Direct Mode Integration maps to DrivingServiceMode.direct in SDK code. AvatarKit on the client establishes a WebSocket connection to Motion Server, sends avatar speech audio, receives motion data, and renders the avatar locally.

At a glance

DimensionDirect Mode Integration
Dev effort🟢 Low
Latency profile🕒 Moderate; the client sends avatar speech audio to Motion Server directly.
You build🔑 A small Session Token endpoint and 🧩 the client AvatarKit integration.
You do not build🚫 A runtime relay server, Server SDK pipeline, or RTC transport.
Best first demo🌐 Web SDK quickstart, then native or Flutter quickstarts as needed.
Client support🌐 Web / 🍎 iOS / 🤖 Android / 📱 Flutter

When to use

  • You already have avatar speech audio — your own TTS, a TTS provider, or prerecorded audio.
  • Smallest backend footprint — your backend only mints Session Tokens; ASR, LLM, and TTS run wherever you already host them.
  • Cross-platform — Web, iOS, Android, and Flutter clients use the same Direct Mode model.

What the token server does

Direct Mode has a small backend requirement because SPATIUS_API_KEY is a server-side secret. Your client asks your backend for a Session Token; your backend calls the Console API with the API Key; the client then uses that short-lived Session Token to open the Motion Server WebSocket. This token server is not part of the avatar runtime. It does not send audio, receive motion data, run ASR / LLM / TTS, or proxy the Motion Server connection.
PathBackend responsibilityClient responsibility
Direct ModeMint Session Tokens only.Connect to Motion Server, send avatar speech audio, receive motion data, and render locally.
Backend ModeRun the Server SDK pipeline, connect to Motion Server, and transport encoded audio payloads + motion data payloads to clients.Receive encoded payloads from your backend and render locally.

Requirements

RequirementDescription
App IDObtained from Spatius Studio.
Session TokenIssued from your backend (max 24 h validity). See Credentials.
Audio formatPCM16, mono, configurable sample rate (default 16 kHz). See Audio.
Authentication flow:
Your Client → Your token endpoint → Spatius Console API → Session Token (24 h max)
The Session Token must be set before start(). Keep SPATIUS_API_KEY in the token endpoint only; never ship it in Web, mobile, or Flutter client code. See the Session token API for backend implementation.

Platform comparison

FeatureWebiOSAndroidFlutter
Package@spatius/avatarkitAvatarKit.xcframework / SPMai.spatius:avatarkitspatius
RenderingWebGL / WebGPUMetalVulkanNative iOS / Android rendering through Flutter
UI FrameworkDOM CanvasUIKit + SwiftUI wrapperAndroid View + Compose wrapperFlutter widget
Audio initinitializeAudioContext() in user gestureAutomaticAutomaticAutomatic
Build configVite plugin / Next.js wrapper requiredXcode linker flagsGradle dependencyFlutter pub package + platform build setup

Key concepts

Fallback mechanism

If the WebSocket connection fails within 15 seconds, the SDK enters audio-only fallback — audio continues to play without animation. Your audio playback remains uninterrupted even when Motion Server is unreachable.

ConversationId

Every send() call returns a conversationId that identifies the current conversation round. When end: true is passed, it marks the end of audio input. The avatar continues playing remaining animation until finished, then automatically returns to idle (notified via onConversationState). Sending new audio after that starts a new round and interrupts any ongoing playback. For audio source and timing guidance, see Audio.

Next steps

Pick the platform you want to integrate on:

Web

Direct Mode integration for the browser with @spatius/avatarkit.

iOS

Direct Mode integration for iOS with AvatarKit.xcframework.

Android

Direct Mode integration for Android with ai.spatius:avatarkit.

Flutter

Direct Mode integration for Flutter with the spatius package.