AI × CEO

Rebuilding Gmail for the AI future

EVAN REISER / MAR 2, 2026 / 7 MIN READ

Email is a context-assembly problem in a writing problem's costume. The system I built makes four or five judgments before any words get drafted.

Last post I argued that the gap between an AI-augmented CEO and a 100x CEO is mostly context assembly, and I committed to building it, starting with email.

Email is a context-assembly problem in a writing problem's costume. Every reply requires knowing who this person is, what they're asking, what we last discussed, what I'm waiting on, what they're waiting on, who else is in the loop, and what I've said in public about the topic this week. What actually makes the inbox feel like work is the constant loading and reloading of all of that. Every thread is a different relationship in a different state, and your head has to rebuild the whole picture from scratch each time you open one, before any words can happen. Most of what looks like "I'll get to this later" is really "I don't have the context loaded right now."

Email is a context-assembly problem in a writing problem's costume.

Once you frame email that way, the answer is the architecture from the last post. Pull the right ten thousand tokens out of a billion-token corpus, into a small window where they can be acted on.

So I rebuilt Gmail.

I know how that sounds. A year ago it would have been a multi-quarter engineering project with a team. Today, a single motivated person with the right AI tools on their laptop can ship a usable version in about two weeks. The hard work is deciding what the system should be deciding.

When most people picture "AI email" they picture Smart Compose, autocomplete with better suggestions and tab-to-accept, which is really just a typing assistant that speeds up the easiest part of email and ignores the actual cost. The system I'm running now isn't that, because typing was never the bottleneck. The bottleneck is the four or five judgments that happen before any words get drafted.

What People Picture as AI Email
Smart Compose, autocomplete, tab-to-accept
A typing assistant that speeds up the easiest part
Words first, judgment never
Ignores the actual cost of email
The System I Actually Built
Triage, context assembly, voice, edit-loop
Four or five judgments before any words get drafted
Most emails never get a draft at all
Tightens automatically every week from my edits
TRIAGE BEFORE DRAFT

Triage is the first decision

The first judgment is triage. Every email that lands in my inbox gets routed into one of six buckets before a draft is even considered: auto-respond for acknowledgments and confirmations and simple status updates, delegate when someone else on the team owns the thread, escalate when it's strategic or sensitive or board-level or a crisis, schedule for calendar requests, archive for FYIs that need no action, and no-response for the surprisingly common case where the right answer is to just not reply. Most emails never get a draft. They get archived, scheduled, or routed away, and the system reads, judges, and writes a one-line reasoning trace for each.

This morning my inbox got 47 emails overnight. 38 were handled before I sat down with coffee, and almost none of those 38 were drafts I had to review, they were judgments the system made and acted on. I only saw the slice where the bucket landed at low confidence or where escalate was the right call.

38of 47 overnight emails handled before I sat down with coffee. Most never became drafts at all — they were judgments the system made and acted on.
WHAT THE SYSTEM ALREADY KNOWS

Context is what makes it work

When something does need a real response, this is where the thoughtfulness lives. Drafts pull from the person's dossier, recent threads with them, commitments I made in our last 1:1, Gong moments where this account came up, Slack chatter the same week, and the Drive docs they're waiting on. The system also mines the transcripts of every meeting I've been in this week for positions I actually stated, frameworks I actually used, claims I actually made and defended. If I argued a specific position on agent security in Tuesday's staff meeting, the email I draft on Friday on that same topic carries that position. The drafts don't synthesize a generic Evan take. They reflect what I'm actually thinking right now, because the system has been listening to me think out loud all week.

A reply about a regional pipeline question doesn't just answer the question, it knows the deal moved to stage 3 last week, references the Gong moment where the customer raised the integration concern, and aligns with the position I took in Tuesday's deal review. I would never load all of that by hand for a single email, and the system loads it every time.

Where context assembly really earns its keep is when the email isn't an email at all. A real chunk of my inbox is documents sent to me for review. Proposals, decks, strategy memos, customer rollups, draft posts. For these, "draft a reply" is the wrong frame. The right frame is the one I'd run in my own head if I had the time. Is the call they're making actually the right one? Does the doc meet my bar? Did the author incorporate the feedback I gave on the prior version, or did they just resubmit?

So the system goes deeper. It pulls every prior version of the document and runs what I call a Feedback Incorporation Scorecard, which walks line by line through the comments I left last time and grades each one on whether the new version actually addressed it, partially addressed it, or quietly ignored it. It pulls every meeting where I've taken a position on the topic, with verbatim dated quotes. It pulls related dossiers for the recipient, the team, the product. Then it grades the doc against my actual standards on a dimension table that matches the kind of artifact this is, not generic best practices for "good writing."

The output is a critique I can scan in 30 seconds: overall grade, strengths with evidence, gaps by severity, suggested additions, and a copy-paste reply I can send back. This is the same context-assembly architecture as the rest of the inbox, applied harder. The reply is correct because the upstream judgment was correct, and the upstream judgment was correct because all of the relevant context lived in the same window when the judgment was made.

VOICE HAS TO BE LOAD-BEARING

Voice has to be load-bearing

Even the most thoughtfully assembled draft fails the moment it doesn't sound like me. Context buys you the right things to say, but voice is what carries them, and voice matters because people don't respond well to email that sounds like a machine. Once the recipient has the thought "this isn't actually him," every word after that gets read through the wrong frame, replies dry up, and the time savings get eaten by the relationship cost. Voice has to be load-bearing, not garnish.

SAME VOICE UNDERNEATH, DIFFERENT TAILORING ON TOP

The voice system runs in a few layers. The foundation is my own voice DNA, the cadence and vocabulary and rhythm the system has learned from years of my own writing, which every draft starts from. The second layer is the more interesting one. None of us actually write the same way to everyone we email. The cadence I use with a long-time direct report is different from the cadence I use with a board member, which is different again from how I write to a vendor or a peer CEO. Topic, role, familiarity, the last thing we discussed, all of it shapes the register. So the system keeps a separate voice profile for each person I email regularly, built from real exemplars of how I've actually written to them, tiered automatically based on history. Thin when we don't have much, basic when we have some, rich when there are dozens or hundreds of past messages and the model has fully calibrated to that relationship. Same Evan voice underneath, different tailoring on top. There's also a contrastive layer in there, examples of what AI tends to write that I would never write, so the model learns what to avoid as much as what to imitate.

The last layer of the voice system is deterministic and unromantic. Strip the em-dashes, the semicolons, the forbidden vocabulary, the AI-y openers like "Thanks so much for reaching out," and the trailing -ing clauses, all the tells the model still produces sometimes. And when I do edit a draft, the diff between what the system wrote and what I sent gets converted into a rule and injected into future prompts. These are real rules, not "this word is wrong." Lead with the number, not the narrative, when emailing board members. Cut the second paragraph when the recipient is a peer CEO, and match their last cadence when the relationship is rich. The feedback compounds, and drafts get more like me every week, automatically.

A QUEUE SORTED BY CONFIDENCE

What happens to the draft

When a draft does come out the other end, the same call self-assesses its confidence. Nothing auto-sends on my account, that's intentional, but the score sorts the queue for me. High-confidence drafts I scan and ship in seconds. Low-confidence ones get a real read.

The point of all of this is not speed, although speed is real. The point is that every email I see now arrives with more context already loaded around it than I could ever assemble by hand: who this person is, what we've discussed, what I'm waiting on, what positions I've taken in this week's meetings. The system uses that context to make a sequence of judgments before any email reaches me: whether it needs a reply at all, what voice belongs to this specific person, whether the draft carries what I'm actually thinking right now. Every edit I make tightens the next one, and the system gets better every week.

For most of my career, the inbox was a reactive surface I tried to keep up with. The system I run now is closer to a decision system that drafts the replies for me when drafting is the right answer. I still hit send on every one of them, but I'm personally writing fewer and fewer from scratch and making fewer and fewer edits to the ones I do touch, and the ones that go out sound like me to that specific person.

Next post is the same idea applied to meeting prep, which is the higher-leverage side of the same coin.

Next in AI × CEO: How AI is reshaping my calendar

-Evan