Enterprise Assistant Console
Stack architecture
Runtimeloading...
GPU Liveloading...
Power Consumption Loading…

Workspace configuration

Voice / vision controls
Live preview when Voice+Video or Camera for text chat is enabled. Requires a secure context (HTTPS or localhost). This block stays in normal flow so it does not cover saved instructions or the prompt below.
Camera off — enable Voice+Video or “Camera for text chat”
Vision & routing status Live summary of the vision caption model (Ollama), dialogue route (local vs Ollama), and VRAM guidance. The text below updates automatically from the running stack.

Vision / routing: loading…

Text camera & debug toggles
When checked, JPEG frames are sent with vision_session_id on text chat requests. Use Voice+Video for spoken questions such as “what do you see?”
Attaches a fresh camera caption on every user turn while vision is on (adds latency). Leave off for faster chat—camera context is merged only when your message matches vision phrases (“see”, “camera”, “what we’re doing”, …). With Voice+Video connected, toggling syncs immediately over the voice link; text+camera syncs on the next send.
When enabled, extra technical detail from the assistant response (where available) is shown in the transcript for debugging.
When the model returns a separate thinking / reasoning section, this opens those blocks expanded by default in the chat log. Visibility still depends on the model and API returning that content.
Core session & instructions
Session memory actions
LLM generation (expand)

Sampling for text chat and the next voice connect (toggle voice off/on to re-send). Model context length is set when the stack starts (e.g. local llama --ctx-size in start_current_stack.sh, overridable via LOCAL_LLAMA_CTX_SIZE); Text history lines limits how many prior chat lines are included. Text chat max prompt tokens/chars add separate caps on total prompt size before the request is sent.

Local vs Ollama: effective context is not the same. Long prompts can use more of the window on local llama than on Ollama, where the server may truncate inputs near a fixed token ceiling (large prompts were observed to cap around 32k prompt tokens while the same character padding counted higher on local). Max tokens is only a completion cap—prompt + reply must still fit the active backend’s context.

Loading limits for the active model…

Voice only: if your max-tokens setting is below this number, the server raises it. Sent as max_tokens_floor when you connect voice.

Local RAG index (expand)

Offline lexical index (SQLite FTS5). Text chat: enable Local RAG below with agent tools. Voice / voice+video: snippets are prepended when VOICE_LOCAL_RAG is on (stack default). Chat attachments are mirrored under from_chat/ for later questions.

Dir paths in useLoading…
No. of files (indexable on disk)
File types
Total size
Files chunked / pending
FTS chunk rows

Tool Management (expand)

Offline only: tools run on this machine. HTTP connectors must use allowlisted hosts (default 127.0.0.1 / localhost; see TOOL_HTTP_ALLOW_* in stack docs). No public internet.

Local RAG tool settings
Local RAG (indexed docs)
Local calendar
Local automation connector
LLM Routing (expand)

Vision: image captions always go through Ollama (VISION_OLLAMA_MODEL in the stack), independent of this dialogue route. With Local dialogue + vision + XTTS on one GPU, VRAM can spike; with Ollama for both dialogue and captions, use smaller tags (e.g. gemma4:e2b-it-q4_K_M) or run ./offline_setup/lisa_stack.sh status to inspect GPU rows.

Full stack restart (expand)

Runs offline_setup/start_current_stack.sh (ASR, XTTS, LLM, bot on :7861, HTTPS on :7860). Takes several minutes; this page may disconnect until the bot is back.

Session: Disconnected | Conversation stream: voice + text
Switching LLM Routing
Preparing switch...