Workspace configuration
Voice / vision controls
Vision / routing: loading…
Text camera & debug toggles
vision_session_id on text chat requests. Use Voice+Video for spoken questions such as “what do you see?”
Core session & instructions
Session memory actions
LLM generation (expand)
Sampling for text chat and the next voice connect (toggle voice off/on to re-send).
Model context length is set when the stack starts (e.g. local llama --ctx-size in start_current_stack.sh, overridable via LOCAL_LLAMA_CTX_SIZE); Text history lines limits how many prior chat lines are included. Text chat max prompt tokens/chars add separate caps on total prompt size before the request is sent.
Local vs Ollama: effective context is not the same. Long prompts can use more of the window on local llama than on Ollama, where the server may truncate inputs near a fixed token ceiling (large prompts were observed to cap around 32k prompt tokens while the same character padding counted higher on local). Max tokens is only a completion cap—prompt + reply must still fit the active backend’s context.
Loading limits for the active model…
Voice only: if your max-tokens setting is below this number, the server raises it. Sent as max_tokens_floor when you connect voice.
Local RAG index (expand)
Offline lexical index (SQLite FTS5). Text chat: enable Local RAG below with agent tools. Voice / voice+video: snippets are prepended when VOICE_LOCAL_RAG is on (stack default). Chat attachments are mirrored under from_chat/ for later questions.
Tool Management (expand)
Offline only: tools run on this machine. HTTP connectors must use allowlisted hosts (default 127.0.0.1 / localhost; see TOOL_HTTP_ALLOW_* in stack docs). No public internet.
Local RAG tool settings
LLM Routing (expand)
Vision: image captions always go through Ollama (VISION_OLLAMA_MODEL in the stack), independent of this dialogue route.
With Local dialogue + vision + XTTS on one GPU, VRAM can spike; with Ollama for both dialogue and captions, use smaller tags (e.g. gemma4:e2b-it-q4_K_M) or run ./offline_setup/lisa_stack.sh status to inspect GPU rows.
Full stack restart (expand)
Runs offline_setup/start_current_stack.sh (ASR, XTTS, LLM, bot on :7861, HTTPS on :7860). Takes several minutes; this page may disconnect until the bot is back.