← back to work2025–2026
private build
Jarvis
Local-first agentic AI platform
PythonMLXOllamaMCPSQLiteFastAPI
$ jarvis --status
liveroute ▸ /chatqueue 1 · q3:14b
4
tiers
31
mcp tools
6
servers
reasoning · synthesis▍
─── 01
Four-tier model router
A deterministic sub-millisecond classifier picks the tier per query based on cost vs. capability. Most queries resolve at flash or standard. The agent tier is reserved for multi-step plans where tools must execute.
─── 02
31 MCP tools across 6 servers
- Filesystem — read, write, glob, search.
- Git — status, log, diff, commit.
- Shell — governed; only a whitelisted command set.
- HTTP fetch — outbound requests with explicit allowlist.
- Calendar / mail.
- Workspace state — persistence, memory snapshots, run history.
─── 03
Why local-first
The constraint was no cloud round-trips for anything that touches personal context. Models had to fit on Apple Silicon (Ollama, qwen3 family). Persistence had to be embedded (SQLite). Tool execution had to be operator-approved, not autonomous (Telegram approval flow — the agent proposes, the operator signs off, the tool runs).
─── 04
What I learned
- Deterministic router beats LLM-as-router for cost, latency, and predictability.
- Tool-calling latency dominates once the agent tier engages. Chunk what you can.
- Operator-in-the-loop is a feature, not a constraint. Catches more bad calls than test coverage ever will.
