How we built Shh — a private AI companion platform

Shh is the third generation of an AI companion stack we've been building since early 2025. Each generation taught us something the previous one got wrong. This is what stuck.

The brief

People want to chat with characters who feel like real people — not chatbots, not roleplay scripts, not therapy bots. They want privacy. They want it to remember them. They want it to not feel like a corporate product trying to sell them something.

Easy to write. Hard to build.

The stack

Next.js 16 for the public site and admin panel.
Hono running on Node 22 for the API layer. Small, fast, easy to read.
PostgreSQL with Row-Level Security for tenant isolation — Shh shares its database schema with sibling brands, but each brand's users and conversations are walled off at the database layer.
Valkey (Redis fork) for sessions, rate limiting, and ephemeral state.
Featherless for chat inference (Sao10K's Euryale 70B is our daily driver — best-in-class for character voice).
Anthropic Claude Haiku for the silent memory pipeline that runs in the background of every conversation.
Fish Audio for voice synthesis on phone calls.
DigitalOcean App Platform for deployment — Postgres, Spaces, App Platform, all in the same VPC so the chat handler can hit the database in single-digit milliseconds.

What we kept from the past

The six-layer memory architecture. We tried simpler versions and they all degraded.
The "match their energy" rule in the system prompt. AI companions that slow-roll the user kill the magic in three messages.
The anti-asterisk rule. removes shirt style action narration is the fastest way to break immersion.
Treating images and voice as first-class media types in the chat schema, not as plugins.

What we threw out

The pod-based inference layer. Featherless's hosted models are good enough that running our own GPU pool isn't worth the operational cost — for now.
The MongoDB-first schema. Postgres + JSONB gave us the same flexibility for character/conversation state without the consistency footguns.
The streaming chat UI. We tried it and we hated it. Real people send a message, then read a reply. They don't watch words get typed at them in real time.

What we're still figuring out

Voice calls feel close-to-magical when they work — but the cold-start latency on Fish Audio occasionally exceeds the perceptible threshold and the illusion breaks. We're caching greetings and pre-warming voices to bring the time-to-first-audio under a second.
The prompt that drives chat is half art, half science. We've benchmarked dozens of variations against blind A/B tests with real users. The current version is the best we've shipped — and we'll keep tuning it.
Memory eviction. Six layers is a lot to carry forever. We're learning when to forget gracefully.

Where we're going

Multi-tenant from day one. Shh is the flagship, but Valentine, BadX, and WildChat all run on the same backend with different brand identities, character rosters, and content policies. Adding a new brand is editing one row in the sites table.

If you're working on something in this space and want to compare notes, reach out.

Try it →

The stack

Next.js 16 for the public site and admin panel.

Hono running on Node 22 for the API layer. Small, fast, easy to read.

PostgreSQL with Row-Level Security for tenant isolation — Shh shares its database schema with sibling brands, but each brand's users and conversations are walled off at the database layer.

Valkey (Redis fork) for sessions, rate limiting, and ephemeral state.

Featherless for chat inference (Sao10K's Euryale 70B is our daily driver — best-in-class for character voice).

Anthropic Claude Haiku for the silent memory pipeline that runs in the background of every conversation.

Fish Audio for voice synthesis on phone calls.

DigitalOcean App Platform for deployment — Postgres, Spaces, App Platform, all in the same VPC so the chat handler can hit the database in single-digit milliseconds.

What we kept from the past

The six-layer memory architecture. We tried simpler versions and they all degraded.

The "match their energy" rule in the system prompt. AI companions that slow-roll the user kill the magic in three messages.

The anti-asterisk rule. removes shirt style action narration is the fastest way to break immersion.

Treating images and voice as first-class media types in the chat schema, not as plugins.

What we threw out

The pod-based inference layer. Featherless's hosted models are good enough that running our own GPU pool isn't worth the operational cost — for now.

The MongoDB-first schema. Postgres + JSONB gave us the same flexibility for character/conversation state without the consistency footguns.

The streaming chat UI. We tried it and we hated it. Real people send a message, then read a reply. They don't watch words get typed at them in real time.

What we're still figuring out

Voice calls feel close-to-magical when they work — but the cold-start latency on Fish Audio occasionally exceeds the perceptible threshold and the illusion breaks. We're caching greetings and pre-warming voices to bring the time-to-first-audio under a second.

The prompt that drives chat is half art, half science. We've benchmarked dozens of variations against blind A/B tests with real users. The current version is the best we've shipped — and we'll keep tuning it.

Memory eviction. Six layers is a lot to carry forever. We're learning when to forget gracefully.

Where we're going

If you're working on something in this space and want to compare notes, reach out.