Stack Voice · vision · text chat · TLS
JPEG /ws/mobile-voice-vision Browser Assistant Console PCM + JSON TLS proxy :7860 → :7861 tls_tcp_proxy bot_vllm.py FastAPI + Pipecat /ws /api/* Dialogue LLM llama :8000 or Ollama :11434 OpenAI /v1 Nemotron ASR WebSocket Parakeet :8080 XTTS HTTP synthesis :80 audio → bot Ollama VLM Vision captions /api/chat VISION_OLLAMA_* + text augment path vision_session_store Last JPEG + caption / augment
Text chat: POST /api/text-chat/completions → same dialogue LLM; if “Camera for text chat” is on, the proxy augments the last user message using the vision buffer + Ollama caption. Discovery: /api/mobile-voice, /api/mobile-voice-vision.
RAG Ingest · FTS5 · retrieval · grounding

RAG architecture flow

Offline ingest, chunking (SQLite FTS5), retrieval, and snippet injection for text and voice paths. Use Jump to component in the RAG inspector beside this diagram; details appear below it. Standalone page: GET /rag-flow.