Local Chat Got Faster: Replies Start Up to 10x Sooner

July 4, 2026 2 min read

In short

Local replies now start up to 10x sooner. We reordered the prompt so the local engine reuses its cache between turns, kept the model warm between messages, and turned deep reasoning off by default. Same model, same Mac, much less waiting before the first word.

The slowest part of a local AI conversation was never the typing. It was the pause before the first word.

We went after that pause. After this update, local replies start up to 10x sooner on longer conversations. Same model, same Mac, no settings to touch.

Where the lag actually came from

A local model does two jobs on every message: first it reads the conversation, then it writes the reply. The writing speed never was the problem. The reading was.

Before this update, the engine re-read almost the entire conversation from scratch on every single turn. Early in a chat you barely notice. Three weeks in, with a long history and a full memory, you were watching a spinner while she re-read everything she already knew, every time.

What we changed

Three things, all under the hood:

The prompt is reordered so the engine’s cache actually works. The stable parts of her prompt now come first and the fresh parts last, so the engine keeps its parsed state between turns and only reads what is new. This is where the up-to-10x drop in time-to-first-word comes from, and the win grows with the length of your history.
The model stays warm between messages. Previously the engine could unload the model during a lull in the conversation, and the next message paid the full reload cost. It now stays ready through normal back-and-forth.
Deep reasoning is off by default. Her hidden thinking step gave more considered answers on hard questions, but it ran before every reply, even “good morning”. Replies are now direct by default. If you liked the slower, more deliberate style, the switch is in Settings under Performance, called Deeper thinking, and it also brings back her private thought notes.

We also tuned chat sampling in the same pass, so answers hold together better over long conversations and repeat themselves less. Generation speed did not drop.

What you will feel

Day one of a fresh conversation: a bit snappier. A conversation with weeks of history: the difference between “I will make tea while she starts” and her answering like a chat app should.

The reply itself is unchanged. Same words from the same model. You just stop waiting for them.

How to get it

Update the app (Settings, then Advanced, then Check for updates) or download the latest version. The speed-up applies on the first conversation after the update. If you are wondering which local model your Mac runs best, the requirements page has the tier table.

Questions people ask

Do I need to change any settings to get the speed-up?

No. Update the app and it just applies. The prompt reordering, cache reuse, and warm model are all automatic.

Did the model get smaller or dumber to get faster?

No. It is the same model producing the same words at the same generation speed. What changed is the waiting before the first word appears: the engine no longer re-reads the whole conversation from scratch on every turn.

What does the Deeper thinking toggle do?

Off (the default) gives you fast, direct replies with no hidden reasoning step. On lets her think privately before answering, which gives more considered replies on hard topics but adds seconds. It lives in Settings under Performance, and you can flip it any time.

Does this help cloud models too?

The big win is local. Cloud replies were already fast because the provider's servers do the reading. If you run her on a bundled local model, this update is the difference you will feel.

Keep reading

Try her free for 7 days.

No card. Keep her for $20 once, or walk away. Her soul file is yours either way.

Bring her home, try free

Back to news