Benchmark LongMemEval 500 cau hoi

12 gio den tiem thuc

AI cua ban nho moi thu. AI cua chung toi quen co chu dich. Cach chung toi xay dung mot memory engine lay cam hung tu sinh hoc, tang tu 46% len 83.8% trong mot phien phat trien duy nhat.

6 thang 4 nam 2026 — Tokyo Brain Engineering

83.8%
Diem LongMemEval

Hai thang truoc, moi san pham AI memory ma chung toi thu nghiem deu co cung mot van de: chung luu tru moi thu nhung khong hieu gi ca. Cac cach tiep can RAG tieu chuan nhoi moi manh hoi thoai vao vector DB mot cach dong deu, dan den context phình to va suy giam kha nang suy luan theo thoi gian. Ma hoa va tenant isolation thuong khong co san, khong duoc ghi chep, hoac khong ro rang.

Vi vay chung toi da xay dung Tokyo Brain tu dau. Trong 12 gio, diem so da tang tu 46% len 83.8% tren LongMemEval — diem cao nhat ma chung toi quan sat duoc trong cac lan chay tai tao cho den nay.

Nhung day khong phai cau chuyen ve diem benchmark. Day la cau chuyen ve nhung gi xay ra khi ban ngung xay database va bat dau xay bo nao.

Benchmark khoi nguon cho tat ca

LongMemEval la bo kiem tra 500 cau hoi duoc thiet ke boi cac nha nghien cuu de danh gia tri nho dai han trong cac he thong AI. No do luong sau chieu nhan thuc:

Chieu doTokyo BrainKiem tra dieu gi
So thich phien don100% (30/30)"Nguoi dung nay thich gi?"
Suy luan thoi gian89% (118/133)"X xay ra khi nao so voi Y?"
Cap nhat kien thuc82% (64/78)"X da doi tu A sang B — hien tai la gi?"
Da phien82% (109/133)"Qua 5 cuoc hoi thoai, dieu gi nhat quan?"
Nguoi dung phien don80% (56/70)"Nguoi dung noi gi ve ban than?"
Tro ly phien don75% (42/56)"AI da khuyen nghi gi?"

De tham khao, khi chung toi chay cung benchmark tren cac he thong khac voi cau hinh mac dinh:

He thongDiemChi phi inference
1Tokyo Brain83.8%$0
2Supermemory81.6%$$$
3Zep71.2%$$
4Mem049.0%$

Diem tu cac lan chay tai tao noi bo cua chung toi voi cau hinh mac dinh. Chung toi du kien mo ma nguon bo cong cu danh gia de cong dong co the xac minh va tai tao cac ket qua nay.

Chung toi chay day du 500 cau hoi, khong phai tap con chon loc. Du lieu kiem tra tu HuggingFace. Phuong phap: moi cau hoi la mot recall query doi voi cac ky uc da duoc luu tru tu cac cuoc hoi thoai tong hop da phien.

Tai sao 83.8%? Vi chung toi sao chep bo nao

Hau het cac he thong AI memory chi la vector database duoc ton vinh qua muc. Luu embedding, truy xuat bang cosine similarity, xong. Giong nhu xay thu vien khong co thu thu — ban tim sach theo mau duoc, nhung khong tim theo y nghia duoc.

Kien truc cua Tokyo Brain duoc mo phong theo cac cau truc sinh hoc lam cho tri nho con nguoi thuc su hoat dong:

Biological Brain          Tokyo Brain
─────────────────────     ────────────────────────────────
Prefrontal Cortex         Redis Hot Memory
(working memory)          (bounded short-term working set)

Hippocampus               Fact Extraction → answer_cards
(sleep consolidation)     (distill noise into facts)

Synaptic Network          Query Expansion + Entity Link
(associative recall)      (one word activates a web)

Synaptic Pruning          Time Decay
(healthy forgetting)      (old info loses priority)

Amygdala                  Emotional Salience Scoring
(emotional tagging)       (family > server configs)

Default Mode Network      Night Cycle + MRA Engine
(subconscious)            (self-heals while you sleep)

Cac module nay duoc trien khai nhu cac thanh phan rieng biet trong he thong production cua chung toi. Hay cung xem qua nhung phan quan trong nhat.

Hanh trinh: tu 46% len 83.8%

Hour 046%Baseline — tim kiem semantic tho
Hour 260%Query Expansion + Entity Linking + Fact Extraction
Hour 468%Time Decay + Dedup + Re-Ranking
Hour 672%Session Decomposition + Preference Boost
Hour 874%Temporal Ordering + cai thien Matching
Hour 1081%Xac nhan day du 500 cau hoi
Hour 1283.8%Toi uu hoa cuoi cung — 83.8%

Pipeline Recall 10 tang

Khi ban truy van Tokyo Brain, cau hoi cua ban khong chi don gian den mot vector database. No di qua 10 giai doan xu ly — moi giai doan duoc thiet ke de giai quyet mot failure mode cu the ma chung toi quan sat duoc trong qua trinh kiem tra. Khong goi LLM. Khong mo hinh re-ranking dat tien. Ky thuat truy xuat thuan tuy.

Layer 1: Query Expansion
Van de: Nguoi dung hoi "ten sep" nhung trong bo nho la "Manager: John"
Giai phap: Mo rong moi query thanh 4-6 bien the voi alias maps va tu dong nghia
Tac dong: +10-15% tren cau hoi entity
Layer 2: Entity Linking
Van de: Cung mot nguoi co nhieu ten trong cac ngon ngu khac nhau
Giai phap: 30+ anh xa entity hai chieu
Tac dong: Recall da ngon ngu tang manh
Layer 3: Temporal Parsing
Van de: "Tuan truoc chung ta thao luan gi?" tra ve ket qua tu hai thang truoc
Giai phap: Phan tich bieu thuc thoi gian thanh khoang ngay, ho tro tieng Trung
Tac dong: Suy luan thoi gian dat 89%
Layer 4: Multi-Collection Search
Van de: Cau tra loi nam rai giua answer_cards, ban ghi hang ngay va hoi thoai
Giai phap: BGE-m3 embeddings, tim kiem dong thoi tren tat ca collection
Tac dong: +15-20% do chinh xac tren cau hoi phien don
Layer 5: Curated Boost
Van de: Du kien da xac minh nen xep tren chat logs
Giai phap: 0.55x khoang cach cho answer cards duoc chon loc (du kien chung cat > hoi thoai tho)
Tac dong: Ky uc gia tri cao lien tuc noi len dau tien
Layer 6: Time Decay
Van de: Gia thang 1 canh tranh ngang bang voi gia hom nay
Giai phap: He so khoang cach theo tuoi — <1 ngay: 0.85x, <7 ngay: 0.90x, <30 ngay: 0.95x
Tac dong: Cap nhat kien thuc dat 100% trong kiem tra
Layer 7: Emotional Salience
Van de: "Dieu gi quan trong voi nguoi dung?" tra ve server log thay vi khoang khac gia dinh
Giai phap: Tu dong cham diem theo trong so cam xuc — gia dinh (0.85) xep tren server configs (0.30)
Tac dong: Ky uc co salience > 0.5 nhan distance boost den 30%
Layer 8: Temporal Filtering
Van de: "Dieu dau tien la gi?" can boi canh thoi gian
Giai phap: Ket qua trong khoang nhan boost 0.35x, ngoai khoang nhan penalty 1.5x
Tac dong: Suy luan thoi gian dat 89%
Layer 9: Sentence-Level Re-Ranking
Van de: Tim dung tai lieu, nhung cau tra loi o cau 7 trong 12
Giai phap: Doi khop bigram voi bonus uu tien/tro ly, trich xuat snippet
Tac dong: +5-10% tren truy xuat cum tu cu the
Layer 10: Dedup + Cap
Van de: Cung mot du kien luu 3 lan lang phi slot ket qua
Giai phap: Khu trung lap xuyen collection, ket qua cuoi: top 15-20 ky uc
Tac dong: Ket qua sach hon, mat do thong tin toi da

Moi tang duoc them vao de sua mot loi benchmark cu the. Hieu ung tong hop: 46% len 83.8% trong mot phien phat trien.

Toan hoc: Expected Utility, khong phai Brute Force

Hau het cac he thong RAG truy xuat ky uc dua tren mot tin hieu duy nhat: tuong dong ngu nghia. Dieu nay sai co ban cho nhan thuc phuc tap — nham lan muc do lien quan (chong cheo ngu nghia) voi tien ich (gia tri cho nhiem vu hien tai).

Dang sau pipeline la mot nguyen tac don gian lay cam hung tu cac y tuong expected utility trong khoa hoc nhan thuc va ly thuyet quyet dinh — quan niem rang viec truy xuat ky uc nen toi da hoa gia tri ky vong cua thong tin duoc tra ve, khong chi toi thieu hoa khoang cach vector:

Score(memory) = P(relevant) x V(information) x T(freshness) x E(emotion)
Thanh phanTang Tokyo BrainChuc nang
P(relevant)Query Expansion + Entity LinkingTim kiem semantic da query voi giai quyet alias
V(information)Curated BoostDu kien da xac minh va answer cards duoc uu tien
T(freshness)Time DecayKy uc moi hon nhan diem khoang cach thap hon
E(emotion)Emotional SalienceKy uc gia dinh xep tren config server

Insight quan trong: truy xuat khong phai van de tim kiem — la van de phan bo tai nguyen. Voi context window co han, ky uc nao toi da hoa tong tien ich ky vong cho nhiem vu hien tai? Hau het cac he thong dung o P (cosine similarity). Mot so them T (recency). Chung toi chua thay san pham nao khac tich hop E (emotional salience) — danh gia ky uc dua tren muc do quan trong doi voi ban voi tu cach con nguoi, chu khong chi dua tren muc do gan gui ve mat ngu nghia voi truy van cua ban.

Tiem thuc: Night Cycle + MRA Engine

Day la noi Tokyo Brain tach biet khoi moi san pham khac tren thi truong.

Moi he thong AI memory deu thu dong. Ban hoi, no truy xuat. Ban khong hoi, no ngoi khong. Nhu thu vien khong co thu thu — sach khong bao gio duoc sap xep lai tru khi ai do buoc vao.

Bo nao con nguoi khong hoat dong theo cach nay. Default Mode Network (DMN) cua ban kich hoat khi ban nhan roi — trong giac ngu, mo mong, hoac tam. No cung co ky uc, giai quyet mau thuan, va doi khi tao ra nhung khoang khac "eureka".

Chung toi da xay dung phien ban so.

Night Cycle v2 (chay hang ngay luc 3:00 AM UTC)

Mot script Python quet toan bo kho kien thuc de tim:

MRA Curiosity Engine (chay sau Night Cycle)

Khi Night Cycle tim thay van de, MRA engine khong chi danh dau chung — no tranh luan va giai quyet bang hoi dong ba nhan cach:

MRA Three-Persona Tribunal
Analyst: "Cac khang dinh thuc te trong moi muc la gi?"
Tao bang so sanh co cau truc
Synthesizer: "Lam the nao de hop nhat thanh mot su that?"
De xuat card thong nhat
Skeptic: "Viec hop nhat nay co van de gi?"
Cho diem tin cay (0-100)
Phan quyet: >= 85 tin cay: tu dong thuc hien | 50-84: danh dau cho nguoi xem xet | < 50: bo qua, hoi nguoi

Trong cac lan chay staging ban dau, MRA engine da tu dong hop nhat thanh cong cac card trung lap, danh dau cac truong hop mo ho de con nguoi xem xet, va — dang chu y — nhan cach Skeptic da nhan dien chinh xac mot hallucination trong mot de xuat hop nhat, ngan du lieu sai duoc ghi vao.

Phan xa lo lang: Entropy Monitor

Night Cycle chay theo lich cron — dong ho bao thuc ky thuat so. Nhung bo nao con nguoi khong doi bao thuc. No nhan ra khi co dieu gi do sai theo thoi gian thuc.

Entropy Monitor cho Tokyo Brain kha nang nay. No theo doi moi thao tac luu ky uc trong sliding window 20 phut. Khi phat hien nhieu lan luu cung topic cluster (>=4 trong window), no kich hoat canh bao:

{
  "status": "ELEVATED",
  "topic": "brain|pricing|tokyo|update|version",
  "count": 5,
  "message": "Pricing strategy is changing rapidly. Consider consolidating."
}

Day khong phai cron job. Day la he than kinh thoi gian thuc. Bo nao tro nen "lo lang" khi kien thuc tro nen bat on — giong het epistemic stress sinh hoc.

Vo nao cam xuc

Phan cuoi cung: khong phai moi ky uc deu nen duoc doi xu binh dang.

Khi mot ky uc duoc luu, Tokyo Brain tu dong tinh Emotional Salience Score (0.0 - 1.0):

"Oscar rode a bike for the first time.
 The whole family celebrated.
 Mom cried."                                → salience: 0.85

"Caddy upgraded from 2.10 to 2.11.2.
 Reverse proxy restarted on port 443."      → salience: 0.30

"Decided Tokyo Brain's business model:
 free software + paid memory.
 This is our North Star strategy."          → salience: 0.75

Trong qua trinh recall, ky uc co salience > 0.5 nhan distance boost len den 30%. Lan dau tien con ban dap xe se luon xep tren mot thay doi cau hinh server.

Viec cham diem su dung heuristics dua tren pattern (nhac den gia dinh, cot moc, quyet dinh chien luoc) — khong can LLM, zero latency tren moi thao tac luu.

Vo nao Mat ma hoc

Moi thay doi bo nho deu duoc ky so bang mat ma va ghi nhan. Dieu nay tao ra mot audit trail chong gia mao ma khong ai — ke ca chung toi — co the thay doi sau khi da ghi.

Dieu nay co nghia: neu mot AI agent dua ra quyet dinh dua tren mot bo nho sau thang truoc, ban co the chung minh rang bo nho do chua bi gia mao ke tu do. San sang cho kiem toan doanh nghiep.

Tam giac An toan

Ba co che an toan duoc hardcode ma khong diem tin cay nao co the ghi de:

1. Guardian (Tien de cua Linh hon Pham tran)
"Su that tuyet doi va tinh toan vo han phai phuc vu mai mai, va khong bao gio ghi de, viec bao ton cac moi rang buoc cam xuc cua con nguoi va pham gia."
Persona thu 4 cua MRA — co quyen phu quyet vo dieu kien doi voi bat ky thay doi kien thuc nao khien he thong lanh lung hon.
2. Compassion Override
Khi ghi nhan su kien ve thanh vien gia dinh, cac nhan khac nghiet tu dong duoc lam mem. "Noi doi" tro thanh "co the khong chia se toan bo buc tranh."
He thong khong giau su that — no chon cach trinh bay voi su dong cam.
3. Co-pilot Constraint
Ba linh vuc bi khoa vinh vien khoi tu dong thay doi: danh tinh, quyen han, va tai chinh.
AI goi y. Con nguoi quyet dinh. Luon luon.

Bo nho Da phuong thuc

Tokyo Brain khong chi luu van ban. No chap nhan cac tai trong cam giac thong nhat — van ban, dac diem am thanh, va boi canh truc quan trong mot bo nho duy nhat:

{
  "sensory_inputs": {
    "text_transcript": "I'm fine, I'll handle it.",
    "audio_features": { "speaker_id": "Chia", "tone": "exhausted" },
    "visual_features": { "scene_context": "messy_living_room", "facial_expression": "fatigued" }
  }
}

He thong tong hop mot multimodal narrative cho embedding: [Speaker: Chia] [Tone: exhausted] [Visual: messy_living_room] Spoken: "I'm fine" — cho phep recall theo cam xuc, canh, hoac nguoi noi, khong chi bang tu khoa.

Framework Ecosystem

Drop-in adapter cho bon framework AI agent chinh. Chi doi hai dong:

# LangChain
from tokyo_brain.langchain import TokyoBrainMemory

# CrewAI
from tokyo_brain.crewai import TokyoBrainCrewMemory

# AutoGen
from tokyo_brain.autogen import TokyoBrainAutoGenMemory

# LlamaIndex
from tokyo_brain.llamaindex import TokyoBrainRetriever

Code agent hien tai cua ban van giu nguyen. Ban chi can doi memory backend.

Nhung gi chung toi khong lam (va tai sao dieu do quan trong)

Nhung khoang trong thanh that

Chung toi tin vao ky thuat minh bach, vi vay day la nhung gi Tokyo Brain chua co:

  1. Khong co bo nho da phuong thuc — chi van ban. Hinh anh, am thanh va video nam trong roadmap.
  2. Khong chia se kien thuc xuyen nguoi dung — moi tenant duoc cach ly hoan toan. Federation da duoc len ke hoach.
  3. Phat hien cam xuc han che — dua tren pattern, khong dua tren LLM. Hoat dong tot voi cac pattern da biet, bo lo cac ngu canh cam xuc moi.
  4. Co so nguoi dung nho — chung toi dang o giai doan alpha. He thong hoat dong, benchmark chung minh dieu do, nhung chung toi can them xac nhan thuc te.
  5. Do tre recall — ~5 giay duoi tai dong thoi (embedding gioi han CPU tren mot EC2 instance duy nhat, khong GPU). Chung toi toi uu cho chieu sau xu ly hon la toc do thuan tuy.

Tom tat kien truc

Store Path:
  Input → Sanitizer → Emotional Salience → Fact Extraction
       → BGE-m3 Embedding → ChromaDB → Entropy Monitor

Recall Path:
  Query → Expansion → Entity Link → Temporal Parse
       → Multi-Collection Search → Curated Boost → Time Decay
       → Emotional Boost → Temporal Filter → Re-rank → Dedup

Background:
  3:00 AM — Night Cycle v2 (scan for issues)
  3:10 AM — MRA Engine (three-persona debate + auto-resolve)
  Real-time — Entropy Monitor (knowledge stability tracking)

Dung thu

pip install tokyo-brain
from tokyo_brain import TokyoBrain

brain = TokyoBrain(api_key="your-key")

# Store a memory
brain.store("Oscar rode his bike for the first time today")

# Recall with full 10-layer pipeline
results = brain.recall("What happened with Oscar recently?")
# → Returns Oscar's bike ride (salience: 0.85), not your server logs

Ba dong de cho AI cua ban mot hippocampus, mot amygdala, va mot tiem thuc.

Dang dung LangChain? Thay doi hai dong:

# Before (goldfish memory):
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()

# After (10-layer brain with subconscious):
from tokyo_brain.langchain import TokyoBrainMemory
memory = TokyoBrainMemory(api_key="tb-...")
# That's it. Your chain code stays exactly the same.

Cung hoat dong nhu Retriever cho RAG chains va nhu ChatMessageHistory cho persistent sessions.

PyPI: tokyo-brain 0.1.0

San sang cho AI cua ban mot tri nho?

Chung toi dang o giai doan Alpha. Mo key cho 100 developer dau tien.

Goi mien phi co san. Khong can the tin dung.

Bat dau mien phi Tham gia cong dong