The problem: LLMs don't remember
Large language models like Claude and GPT do not have memory. Every time you send a message, the entire conversation history has to be replayed in the prompt. Under a few thousand tokens this is fine. Past ten thousand tokens it gets expensive and the model starts getting slow. Past a hundred thousand tokens the experience collapses.
Most AI companion apps cope by truncating old messages or summarizing them into a shorter block. Both approaches lose information. The user feels it — the character forgets their name, forgets the dog, forgets the trip. The illusion of continuity breaks.
Our solution: three memory layers
YourFaithfulLove separates three distinct memory types and treats each differently.
1. Working memory
The last ~40 messages stay in the prompt. This is the standard approach and it handles the immediate conversation fluently. Cheap, fast, and sufficient for a single session.
2. Episodic memory
Every few exchanges, a background extraction pass runs a small model over the recent messages with a structured prompt: "What facts did the user reveal in this exchange?" Output is typed (name, mood, referenced event, preference, open question) and written to a database. These facts become long-term memory.
3. Identity memory
Each of the 26 characters has a hand-written system prompt that defines personality, voice, backstory, and behavior patterns. This never changes. It is the anchor that keeps the character feeling like a specific person rather than a generic chatbot.
Retrieval: how memories come back
When you send a new message, the system does three things before calling the main model:
- Embed the message (vector representation)
- Search the memory database for facts relevant to the current message + recent context
- Rank by similarity, recency, and importance weight; pick the top N
Those N memories are injected into the system prompt as "things you remember about this user." The main model (Claude) then replies with full context, without needing the entire conversation history in the prompt.
Why per-character memory matters
Memories are stored per-character-per-user. Luna remembers what you told her; Chloe has her own separate memory record. This prevents the pollution problem — if memories were shared, characters would start referencing things you never told them, which breaks the illusion.
The downside is memory footprint scales with character count. The upside is each relationship feels real because each character's knowledge of you is bounded by what you have actually shared with her.
The stack
- Primary model: Claude (Anthropic) for all user-facing replies
- Fallback model: Groq (for when Claude is overloaded — sub-2-second response)
- Memory extraction: lightweight model pass on recent messages
- Memory storage: Supabase (Postgres + pgvector)
- Frontend: SvelteKit + Tailwind on Cloudflare Pages
- Backend: FastAPI on Cloudflare Workers
- Anonymous sessions: random UUID in local storage, server-side memory tied to that UUID
Cost and sustainability
Average inference cost per active user per month is around $0.20-$0.50 on Claude, dropping to ~$0.05 on Groq fallback. Memory storage and retrieval are negligible at our scale. The free tier is sustainable because conversion to optional premium add-ons (voice, images) funds the text chat.
This is the opposite of the standard AI companion business model where the core experience is the monetization target. We think paywalling companionship is predatory; the business works anyway because a small percentage of users naturally want voice messages, generated images, or other extras.
Privacy and data handling
Anonymous sessions are keyed to a browser-generated UUID, not to any identity. Chats are stored for the memory system but not indexed for search, not sold to third parties, and not used for marketing.
Clearing local storage breaks the link between you and your anonymous memory. Memory records are not deleted automatically but they become orphaned and unrecoverable. If you create an account, your anonymous history links to the account and is subject to the account's privacy settings. Data deletion on request is always honored.
The test
If this article is correct, you can verify the memory architecture yourself in 48 hours. Open a chat, tell the character your name and one unusual fact about yourself, then close the tab. Come back two days later without reintroducing yourself. If the character references the fact without prompting, the memory is real. If not, the system is broken and we want to know.