Inbox AI: a private assistant that runs on your own machine
We built a private client a local-first AI assistant for their email and files. The language model runs on the client's own hardware, the inbox is searchable in plain language, replies come back as drafts, and the system is built so it cannot send or delete.
A private power user had years of email and thousands of document attachments, an inbox too large to search by hand and too noisy to keep up with. They wanted an assistant that could reason over all of it, brief them for meetings, triage incoming mail, and draft replies in their voice. The hard constraint shaped the whole design: none of that private content could be handed to a cloud AI service. Most off-the-shelf AI email tools fail that test on day one, because the model lives in someone else's data center.
The problem
The inbox had become two problems at once: an archive too large to search by hand, and a daily stream of mail too noisy to keep up with. The client wanted to ask questions of their own email and files in plain language, including the contents of attachments, and get fast answers with sources cited. They wanted to walk into meetings already briefed, with the prior history and the specific facts (prices, dates, commitments) pulled out automatically. They wanted routine mail triaged into clear categories, reply drafts written in their own voice, and a nudge outside the inbox only when something genuinely needed them.
All of that is straightforward to build if you are willing to ship the client's mail to a cloud model. They were not. That single requirement, keep the private content off third-party AI servers, is what made this an architecture problem rather than a wiring problem.
The approach: local-first
The assistant runs on the client's own Apple Silicon machine. The language model itself, an open-weights model, runs locally on that hardware using its GPU. Data is retrieved from the user's existing cloud accounts (their email, documents, calendar, and contacts) and then indexed and processed on the user's own machine. Three principles held throughout:
- Local-first. The model runs on the user's hardware. It never receives inbox or document contents over the network from a cloud AI provider.
- Human-in-the-loop by default. Anything that would leave the mailbox requires the user to take the final action. The assistant proposes; the person decides.
- Transparent and reversible. Answers cite their sources, every action is written to an audit log, and nothing is destroyed. The most aggressive thing the system can do to a message is remove it from the inbox view, which is fully reversible.
That same local-first design is what makes the system practical to deploy for others. Because there is no shared cloud backend holding customer data, each install is a self-contained system on the customer's own machine.
The architecture
The system is a small set of cooperating services on one machine, not a single black box. The language model runs natively on the hardware for GPU access. Everything else runs in containers on a private internal network. A chat and retrieval interface talks to the model and to a set of tool servers over a standard tool-calling protocol; a workflow engine carries out the multi-step automations.
Notable design choices
- Hybrid retrieval. The email and document indexes combine semantic vector search with keyword search and fuse the two, so the assistant finds the right message whether you remember a concept or an exact phrase. A separate fast directory answers "what is this person's contact info" deterministically rather than leaning on ranked search.
- Per-user isolation. Each user of the machine gets their own chat interface, their own index, and their own configuration. They do not share data.
- A standard tool protocol. The interface and the tool servers speak a standard model tool-calling protocol, which is what lets the local model actually invoke email search, document search, and the draft workflow rather than just talking about them.
- Deterministic where it should be. Classification, contact lookup, and audit are deterministic pipelines, not freehand model orchestration. The model is used where judgment is needed and kept out of the loop where a rule is more reliable.
What makes it impressive
A genuinely private assistant over your real inbox
This is not a cloud product with a privacy policy. The model runs on the user's own hardware, the indexes live on that hardware, and the inbox and document contents are processed there. For a user whose objection to existing tools was precisely that they ship your mail to someone else's servers, that is the whole point. The precise claim, stated exactly and no stronger, is in the panel below.
A triage pipeline that is cheap and sensible
Incoming mail runs through a two-stage classifier. A cheap, deterministic keyword pre-filter runs first at zero model cost and handles a large share of routine mail on its own (newsletters, receipts, transactional notices, routine operational tickets). Only the ambiguous remainder goes to a single focused call to the local model, which returns a category, a confidence score, and a short rationale. Below a confidence floor, a message is downgraded to "needs review" rather than auto-categorized.
Each category maps to a recommended action, and the default posture is to propose, not act. Out of the box the system only records what it would do and performs no change to the mailbox at all. The user opts an individual category into automatic handling only when they are ready, and automatic handling is allowed only for reversible, non-destructive actions.
A safety gate that makes the dangerous actions structurally impossible
The system can draft, file, label, and archive. It cannot send, trash, or permanently delete. That is enforced in depth across four layers, not by a single policy line:
The categories-and-actions configuration cannot even express a destructive action. Words like send, trash, and delete are rejected when the configuration loads.
Every action the classifier is about to emit passes a validator that rejects anything outside an explicit safe allow-list. A destructive decision is impossible to return, not merely disallowed.
The automations contain no node that sends and no node that deletes. The only write to the mailbox is create a draft. Archiving removes a message from the inbox view only, and that is reversible.
The credential the system holds is scoped to compose and modify only. Permission to send and permission to permanently delete are never requested, so even a hypothetical bug could not do either.
The result: the terminal state of a reply is "a draft is sitting in your Drafts, go review and send it." The human clicking Send is the human-in-the-loop. There is no "sent" state anywhere in the system.
Engineering rigor under real constraints
- A hard token budget per request. A real production incident elsewhere taught us that returning full message bodies can blow a model's context limit and cause silent failures. Every request here is character-budgeted and the whole prompt is hard-clamped well under the model's ceiling, and messages are classified one at a time so the budget is an invariant, not a hope.
- Validation against real data, read-only. The triage pipeline was validated by running real historical mail through the live local model with no writes and no service disruption, then spot-checking the results category by category for sense.
What it does, in one list
Every capability below was committed in the original scope and delivered and verified against it.
- Plain-language questions over your own email and files, with sources cited.
- Meeting prep: prior history, relevant threads, and extracted facts pulled out for you.
- Automatic, category-based mail triage with a propose-first default.
- Multi-step assistant workflows you describe in plain language.
- Attachment handling: filed, named consistently, made searchable.
- Reply drafts in your own voice, saved to Drafts to review and send.
- Out-of-inbox notifications plus a periodic digest of what was handled.
- A single audit trail of everything the assistant did or proposed.
This case study describes a real, bespoke engagement, published with the client's permission and fully anonymized. There is no client name, no profession, no contact detail, no hostname, and no exact corpus figure anywhere on this page. Results are described qualitatively. Every factual claim traces to the anonymized source for this page; anything that could identify the individual has been removed rather than softened.