Savvy | Alenna C. Spiro

Overview

Savvy is a self-hosted AI secretary I built to manage the parts of life that get dropped when work picks up — calendar, email, tasks, and the small daily check-ins that help me notice patterns. It connects Claude to a Google account suite (Calendar, Gmail, Tasks) and to Signal for two-way texting, with all long-term storage staying on my machine. The goal is a tool that knows enough context to be genuinely useful without that context ever leaving my local hardware.

Architecture

The design separates three layers: a reasoning layer (Claude, accessed through the Anthropic API), a memory layer (local SQLite plus Ollama for embedding generation), and an actuation layer (Google APIs and a signal-cli-rest-api Podman container for external side effects).

You ↔ Savvy ─┬─ Anthropic API (reasoning + tool use)
             ├─ Ollama (local embeddings)
             ├─ SQLite (conversation + fact storage)
             ├─ Google APIs (Calendar, Gmail, Tasks)
             └─ Signal (via local Podman container)

The privacy model is straightforward: only the current user message, a system prompt, and a small locally-selected context window go to the API. Embeddings, conversation history, extracted facts, OAuth tokens, and diary entries all live on local disk and are never transmitted.

Capabilities

Savvy exposes 35+ tools to the model, covering most of what a human assistant would do with a calendar and inbox.

Calendar. Create timed and all-day events, parse natural-language event descriptions via Google’s quick-add, update or delete events, move events between sub-calendars, create new sub-calendars, query free/busy windows.

Email. Search Gmail with native query syntax, read full message bodies and threads, draft replies, send (only on explicit request, drafts are the default), star, archive, mark read, trash, manage labels.

Tasks. Create, list, complete, and delete Google Tasks across multiple lists.

Diary and notes. Store morning plans, evening reflections, and ad-hoc notes in a local store that gets pulled into context on later turns.

Multi-account. Personal and work Google accounts authenticated simultaneously, with per-tool defaults. Secretary reads my work account by default for the calendar, email reads from both.

Memory

Memory is the part of the system I spent the most time on. Each turn:

The user message is embedded locally via Ollama using nomic-embed-text.
The memory layer retrieves a hybrid set of past messages: the most recent N (recency window) plus the top-K most semantically similar from history (relevance window).
Stored facts — concrete commitments, deadlines, and preferences — are retrieved separately and prepended to the system prompt.
After the response, a second small model call extracts new facts from the exchange and writes them back to the local fact store.

The hybrid recency-plus-relevance retrieval is the part that makes the system feel like it actually remembers things. Pure recency loses anything from more than a few exchanges ago; pure relevance can miss the obvious “what we just talked about” context. Combining them gives you both. Embeddings stay on disk because they are generated by the local Ollama instance, so Anthropic never sees the index.

Tool gating

Every tool call passes through a confirmation gate before the underlying Python function runs. Each tool is tagged with one of three danger levels:

Read — read-only operations like checking the calendar or searching email. Auto-approved.
Internal — write operations on data I own, like creating an event or completing a task. Auto-approved by default but escalates to a human prompt if the model marks the call as high-importance, which it tends to do for unusual or destructive operations, or if I define a rule in memory for.
External — anything that crosses an external communication boundary, like sending email. Always prompts for confirmation before running.

If I deny a call, the gate returns "denied" to the model so it can adjust. Every call (approved, denied, or errored) is appended to a JSONL audit log, which has been useful for catching cases where the model hallucinates.

This is a small layer, but it is important for how comfortable I am pointing the system at my real accounts. The Anthropic API never executes anything - my code does.

Two interfaces, one brain

Savvy runs in two modes that share the same memory and tool registry:

Desktop REPL. Rich-formatted terminal interface with custom commands (/bod, /eod, /calendar, /memory, etc.) for ritualized daily check-ins and quick reference.
Signal bot. Two-way texting via signal-cli, so the same assistant is available from my phone with end-to-end encryption between phone and machine.

The Signal mode uses a more concise system prompt tuned for SMS-style replies, but it has access to the same tools and memory. Both run as part of a single launch script that brings up Ollama, the Signal bot, and the proactive scheduler in the background and drops into the REPL in the foreground.

Background scheduler

A separate process can send proactive Signal messages: reminders surfaced from upcoming calendar events, deadline check-ins, or noticing skipped commitments. It enforces quiet hours and a daily notification budget so it stays useful instead of becoming noise.

Personalization

There are two places to tell Savvy who I am and how I want it to act. The first is the system prompt itself, which covers identity, goals, and communication style — the things that change rarely. The second is a small savvy_rules.md file for operational quirks that change more often: don’t book over my standing therapy appointment, laundry takes 3-4 hours and I need to be home for it, prefer passive time over active appointments when scheduling reminders. Splitting the two means I can iterate on day-to-day rules without touching the prompt that defines the assistant’s character.

Why I built this

Off-the-shelf assistants either don’t have access to the right context or send everything to a vendor’s cloud. I wanted something that could read my calendar and email to give specific, informed answers, but where the long-term record of what we have talked about — and the embedding index over it — stays local. Building it also forced me to think carefully about what context actually helps a model give better answers (recency vs. relevance vs. structured facts), where the trust boundary should sit between the model and my real accounts, and how to keep behavior coherent across two interfaces and a background process.

Resources

Code on GitHub