| System | Score | Inference Cost | |
|---|---|---|---|
| 🥇 | Tokyo Brain | 83.8% | $0 |
| 🥈 | Leading GPT-4o memory system | 81.6% | $$$ |
| 🥉 | Graph-based memory platform | 71.2% | $$ |
| 4 | Full context baseline | 60.2% | $$$$ |
| 5 | Popular open-source memory layer | 49.0% | $ |
The Problem
Every AI agent framework treats context as disposable. Your agent learns something in Slack — it stays in Slack. Your Discord bot has no idea what happened in your IDE. Memory systems exist, but they're either too noisy (storing everything, retrieving garbage) or too expensive (requiring LLM calls at retrieval time).
We asked: Can we build a memory system that retrieves the right information, every time, without burning tokens?
The Journey: 46% to 83.8%
The 10-Layer Recall Pipeline
No LLM calls. No expensive re-ranking models. Pure retrieval engineering.
Per-Dimension Results (500 Questions)
| Dimension | Score | Questions |
|---|---|---|
| Preference Tracking | 100% | 30/30 |
| Temporal Reasoning | 89% | 118/133 |
| Knowledge Updates | 82% | 64/78 |
| Multi-Session Reasoning | 82% | 109/133 |
| User Info Extraction | 80% | 56/70 |
| Assistant Recall | 75% | 42/56 |
Why This Matters
The current #2 system achieves 81.6% by calling GPT-4o at retrieval time. Powerful — but every recall costs tokens.
Tokyo Brain's entire pipeline runs on BGE-m3 embeddings (local), ChromaDB (in-memory), and Node.js post-processing (CPU only). No LLM calls at retrieval. The cost of recalling a memory is $0.
We also don't store garbage. A well-known open-source competitor's production audit found 97.8% of stored memories were noise. Tokyo Brain's built-in Sanitizer filters at store time. Combined with Fact Extraction and Session Decomposition, we store what matters.
The Theoretical Foundation: Expected Utility
Most RAG systems retrieve memories based on a single signal: semantic similarity. This is fundamentally flawed for complex cognition — it confuses relevance (semantic overlap) with utility (value for the current task).
Tokyo Brain's 10-layer pipeline is, at its core, an implementation of Expected Utility-based context selection — a concept formalized in recent cognitive architecture research (Maio, 2026):
Each layer in our pipeline maps directly to a term in this equation:
| EU Component | Tokyo Brain Layer | What It Does |
|---|---|---|
| α · Relevance | Query Expansion + Entity Linking | Multi-query semantic search with alias resolution |
| β · Recency | Time Decay | Newer memories get lower distance scores |
| γ · Centrality | Curated Boost | Verified facts and answer cards prioritized |
| δ · Salience | Re-Ranking + Preference Boost | Context-aware scoring based on query type |
| −η · Cost | Dedup + Session Decomposition | Eliminate redundancy, maximize information density |
The key insight: retrieval is not a search problem — it's a resource allocation problem. Given a limited context window, which memories maximize the total expected utility for the current task? Our 10-layer pipeline solves this without any LLM calls, using pure algorithmic optimization.
What's Next: From Retrieval to Cognition
Today's Tokyo Brain excels at recall — finding the right memory at the right time. But true cognitive continuity requires more than passive retrieval. Our roadmap includes:
- Epistemic Stress Detection — Automatically identifying contradictions within stored memories (e.g., conflicting facts from different time periods)
- Conceptual Void Detection — Finding gaps in the knowledge graph where related concepts should be connected but aren't
- Night Cycle Processing — Background consolidation that runs during idle periods, resolving conflicts and strengthening important connections
- Self-Modifying Rules — The system learns which types of memories are useful and adjusts its storage and retrieval strategies accordingly
The goal is not just a memory that remembers — but a memory that thinks.
Try It
from tokyo_brain import Brain
brain = Brain(api_key="tb-...")
# Store
brain.store("User prefers dark mode")
# Recall with full 10-layer pipeline
result = brain.recall("UI preferences?")
print(result.memories[0].document)
# → "User prefers dark mode"