The Amnesia Problem
Every time you close a chat with an AI coding agent, it forgets everything. agentmemory fixes that.
Imagine This Scenario
You just spent an hour with an AI coding agent setting up JWT authentication for your app. You worked through the middleware, wrote tests, chose a library. Everything works.
Next day, you open a new session and ask: “Add rate limiting to the API.”
The agent says: “Sure! First, let me look at your project structure...”
It has no idea you already set up auth. No idea you chose jose over jsonwebtoken for Edge compatibility. No idea your tests cover token validation. You have to explain everything again.
Built-in memory (like CLAUDE.md files) caps at 200 lines and goes stale. At 240 observations, you would burn 22,000+ tokens every session — that is like re-reading a short novel before every conversation.
What agentmemory Does
agentmemory is a persistent memory system for AI coding agents. It runs silently in the background and remembers what your agent does across sessions.
The agent writes code, runs tests, fixes bugs. agentmemory silently captures every tool use.
Raw tool calls become structured facts and concepts — stored in searchable indexes.
The agent already knows your auth setup. No re-explaining. It just starts working.
Why It Matters: The Numbers
95.2% Recall Accuracy
When you search for “database performance optimization,” it finds the N+1 query fix you did last week. Simple keyword matching cannot do that.
92% Fewer Tokens
~1,900 tokens per session instead of 22,000+. That is roughly $10/year instead of “impossible because it exceeds the context window.”
Works With Every Agent
Claude Code, Cursor, Gemini CLI, Codex CLI, Cline, Windsurf — one memory server, all agents share the same memories.
Zero External Dependencies
No Postgres, no Redis, no cloud service. It uses SQLite + a built-in engine called iii. Everything runs locally.
One Command, Full Memory
# Start the memory server
npx @agentmemory/agentmemory
# In another terminal, see it work
npx @agentmemory/agentmemory demo
# Open the real-time viewer
open http://localhost:3113
The first command starts the memory server in the background. It listens for your AI agent to do things.
The demo command seeds 3 realistic sessions and runs searches against them. You can see it find “N+1 query fix” when you search “database performance optimization.”
The viewer shows memories building live — like watching a brain form connections in real time.
This is not a database you manually query. It is a passive observer that silently captures what your AI agent does, compresses it into searchable knowledge, and injects the right context when the next session starts. You never have to “save” anything — it just happens.
Check Your Understanding
What is the main advantage of agentmemory over built-in agent memory (like CLAUDE.md files)?
Meet the Cast
agentmemory has six key players. Knowing who they are helps you tell your AI agent where to put things.
Six Components, One Mission
Think of agentmemory as a modular system where each component has a clear role. Here are the six characters you need to know:
iii Engine
The runtime that powers everything. It replaces what would normally need Express.js, SQLite, Redis, and pm2 — all in one binary.
Hooks (12 of them)
Scripts that fire automatically when your AI agent does something — starts a session, uses a tool, ends a conversation.
Functions (50+)
The business logic: observe, compress, search, consolidate, forget. Every capability is registered as an iii function.
Search Indexes
Three search engines working together: BM25 (keyword), Vector (meaning), and Graph (relationships).
MCP Server
51 tools your AI agent can call directly — like “search my memories” or “save this insight.”
Real-Time Viewer
A web dashboard at port 3113 where you can watch memories being created live.
A Day in the Life: The Cast Talks
Watch how the six components collaborate when your AI agent edits a file:
How They Connect
Your AI Agent
agentmemory Server
Storage & Observability
The Secret Sauce: iii Engine
The iii engine is what makes agentmemory work without any external dependencies. It provides three building blocks:
Registered with sdk.registerFunction("mem::observe", handler). Every piece of business logic is a function — observe, search, compress, forget. REST endpoints, MCP tools, and internal calls all route through functions.
HTTP triggers map REST endpoints to functions. POST /agentmemory/observe triggers mem::observe. Cron triggers run consolidation on a schedule. Event triggers fire on state changes.
SQLite-backed key-value storage with logical scopes. mem:obs:sessionId stores observations, mem:summaries stores session summaries. 49 scopes total.
// Every capability is an iii function
import { registerWorker } from "iii-sdk";
const sdk = registerWorker("ws://localhost:49134");
await sdk.connect();
sdk.registerFunction("mem::observe",
async (data) => {
// privacy filter, dedup, compress, index
return { stored: true };
}
);
Import the iii SDK — the toolkit for connecting to the engine.
Connect to the engine running on port 49134. This is the control channel.
Register a function named “mem::observe” — this is what captures what your agent does.
The handler receives data, processes it (privacy filter, dedup, compress), and returns a result.
Where Everything Lives
The index.ts file registers 50+ functions in sequence. Each registerXxxFunction(sdk) call adds capabilities to the engine. This pattern — every feature is a function registered with the engine — is what makes the system so extensible.
Check Your Understanding
What does the iii engine replace in a traditional tech stack?
The Memory Pipeline
From a raw tool call to a searchable memory — five stages, each one critical.
Five Stages of Memory Creation
Think of it like a factory pipeline. Raw tool calls enter one end, and structured, searchable memories come out the other. Let us trace the journey:
Capture
Hook fires, sends to REST API
Privacy Filter
Strip secrets and API keys
Dedup
Skip repeated operations
Compress
Extract structure from raw data
Index
Add to BM25 + Vector + Graph
Watch It Flow
Click through each step to see how data moves through the pipeline:
Stage 2: The Privacy Filter
Before anything gets stored, a privacy filter scrubs sensitive data. This is your first line of defense — your API keys never reach the memory store.
const SECRET_PATTERNS = [
/Bearer\s+[A-Za-z0-9._\-+/=]{20,}/gi,
/sk-ant-[A-Za-z0-9\-_]{20,}/g,
/gh[pus]_[A-Za-z0-9]{36,}/g,
/AKIA[0-9A-Z]{16}/g,
/eyJ[A-Za-z0-9_-]{10,}\.../g,
];
export function stripPrivateData(input: string) {
let result = input
.replace(PRIVATE_TAG_RE, "[REDACTED]");
for (const pattern of SECRET_PATTERNS) {
result = result.replace(pattern, "[REDACTED_SECRET]");
}
return result;
}
A list of 14 regex patterns that match common secret formats: Bearer tokens, Anthropic keys, GitHub tokens, AWS keys, JWTs...
First, strip anything the user wrapped in <private> tags — an explicit opt-out.
Then scan for every known secret pattern. If it looks like an API key, it gets replaced with [REDACTED_SECRET].
The cleaned text is what actually gets stored. Your secrets never touch the memory store.
Stage 3: Deduplication
AI agents often repeat the same action multiple times — running the same test, reading the same file. The content-addressable dedup system prevents storing the same observation twice:
computeHash(sessionId, toolName, toolInput) {
const input = typeof toolInput === "string"
? toolInput.slice(0, 500)
: JSON.stringify(toolInput).slice(0, 500);
const raw = `${sessionId}:${toolName}:${input}`;
return createHash("sha256")
.update(raw).digest("hex");
}
Take the first 500 characters of the tool input — enough to be unique without being wasteful.
Combine session ID + tool name + truncated input into one string.
Hash it with SHA-256. If we have seen this exact hash in the last 5 minutes, skip it. Same action, same result — no duplicates.
Dedup entries expire after 5 minutes. This means if the agent runs the same test twice with a 10-minute gap, both get stored — they might have different results. The TTL is short enough to catch rapid retries, but long enough to let legitimate repeated actions through.
Stage 4: Compression (Zero LLM)
By default, agentmemory uses synthetic compression — pure heuristics that extract structure from tool calls without spending a single token on an LLM. It is like having a smart filing clerk who knows that grep means “searching” and bash means “running a command.”
function inferType(toolName: string): ObservationType {
const n = toolName.replace(/([a-z])([A-Z])/g, "$1_$2")
.toLowerCase();
if (["grep","search","glob"].some(hasWord))
return "search";
if (["edit","update","patch"].some(hasWord))
return "file_edit";
if (["bash","shell","exec"].some(hasWord))
return "command_run";
}
Take the tool name and normalize it — WebFetch becomes web_fetch, Grep becomes grep.
If the tool name contains “grep,” “search,” or “glob,” classify this as a search operation.
If it contains “edit,” “update,” or “patch,” it is a file edit. Same logic for commands, writes, reads, and web fetches.
No LLM call needed. The tool name itself tells us what kind of action happened.
file_edit
Agent modified a file (Edit, Write, Update tools)
search
Agent searched for something (Grep, Search, Glob)
command_run
Agent ran a terminal command (Bash, Shell, Exec)
web_fetch
Agent fetched data from the internet (WebFetch, HTTP)
file_read
Agent read a file (Read, View tools)
Spot the Issue
Here is a simplified version of the hook that captures tool usage. There is a subtle timing issue — can you find it?
async function main() {
const response = await fetch(`${REST_URL}/observe`, {
method: "POST",
signal: AbortSignal.timeout(3000),
});
}
Check Your Understanding
How does the deduplication system know if an observation is a repeat?
How It Finds What You Need
Three search engines combine forces to find the right memory — even when your query uses completely different words.
The Search Problem
Last week you fixed an N+1 query problem in your codebase. Today you search for “database performance optimization.”
A simple keyword search would never find “N+1 query fix” — the words do not match. But agentmemory finds it. How?
The answer: three search engines running simultaneously, each looking at the query from a different angle, then fused together into one ranked result.
Three Engines, One Query
BM25 (Keyword)
Always on. Matches words in your query to words in stored memories. Uses stemming and synonym expansion. Fast, reliable, but literal.
Vector (Meaning)
When enabled. Converts text into numerical embeddings and finds memories with similar meaning. “Database performance” matches “N+1 query fix” because the concepts are related.
Graph (Relationships)
When enabled. Extracts entities (people, files, concepts) and their relationships. Traverses the knowledge graph to find memories connected through shared entities.
The Magic: Reciprocal Rank Fusion
Each engine returns a ranked list. But how do you merge three different rankings into one? The answer is Reciprocal Rank Fusion (RRF).
const RRF_K = 60;
// Each engine produces a ranked list
const bm25Results = this.bm25.search(query, limit * 2);
const vectorResults = this.vector
.search(queryEmbedding, limit * 2);
// Merge: score = weight * 1/(K + rank)
combined = w_bm25 * (1/(60 + bm25Rank))
+ w_vector * (1/(60 + vectorRank))
+ w_graph * (1/(60 + graphRank));
K=60 is the magic number — large enough that rank #1 is not infinitely better than rank #2, but small enough that top ranks still matter.
Ask each engine for results. BM25 searches keywords, Vector searches meaning.
For each result, compute a combined score: its weight times the inverse of (60 plus its rank in that list).
A memory ranked #1 in both keyword AND meaning scores much higher than one ranked #5 in just one list. That is how “database optimization” finds “N+1 query fix”.
What Happens When an Engine Is Missing?
Not everyone has an embedding provider configured. Not everyone enables the knowledge graph. The system adapts automatically:
If you do not have a vector search engine, the system does not just set that weight to zero and lose search quality. It redistributes the weight to the engines that are available. No vector search? BM25 gets the full weight. No graph? Vector + BM25 split the budget. The system always uses the best available signals.
The default weights are: BM25 = 0.4, Vector = 0.6, Graph = 0.3. But these are configurable:
BM25_WEIGHT=0.4
Keyword search weight (always active)
VECTOR_WEIGHT=0.6
Semantic search weight (when embedding provider configured)
GRAPH_WEIGHT=0.3
Knowledge graph weight (when graph extraction enabled)
TOKEN_BUDGET=2000
Maximum tokens of context to inject into the session
The Complete Search Journey
When your AI agent starts a new session and asks agentmemory for context, here is what happens behind the scenes:
The agent starts a new conversation. The hook loads the project profile — top concepts, frequently edited files, known patterns.
BM25 scans for keyword matches. Vector index finds semantically similar memories. Graph traverses relationships. All three run in parallel.
Three ranked lists are combined into one. Session diversification limits each session to max 3 results — no single session dominates.
The top results are trimmed to fit within 2,000 tokens. The agent gets the most relevant context without overflowing its context window.
The compressed context is written to stdout, which the AI agent treats as additional conversation context. The agent now “knows” your project history.
The Embedding Options
Vector search needs an embedding model. agentmemory supports six providers, and auto-detects which one to use:
Local (Recommended)
Uses all-MiniLM-L6-v2. Runs on your machine. Free. Offline. +8 percentage points recall over BM25-only.
Gemini
Free tier, 100+ languages, supports 3072 dimensions. Good for multilingual codebases.
OpenAI
Highest quality embeddings at $0.02 per million tokens. Best if accuracy matters more than cost.
Check Your Understanding
You search for “how do tokens refresh” but the stored memory is titled “Implemented JWT rotation middleware in auth.ts.” Which search engine finds this?
How Memories Age
Like the human brain, agentmemory has a consolidation process, a decay curve, and an auto-forget mechanism.
Why Your Brain Forgets Things (On Purpose)
Your brain does not store everything forever — that would be overwhelming. Instead, it has a system:
- During the day, you form short-term memories — what you had for lunch, what your coworker said.
- While you sleep, your brain consolidates those memories — extracting the important parts and filing them into long-term storage.
- Memories you rarely access fade over time (the Ebbinghaus forgetting curve).
- Memories you use frequently get strengthened — they become easier to recall.
agentmemory implements this exact pattern in code. Four tiers, automatic consolidation, exponential decay, and importance-based eviction.
The 4-Tier Memory Model
The “what am I doing right now” layer. Raw observations from tool use, stored in real time. Gets 30% of the token budget when injecting context. Entries that score low get demoted to archival.
“What happened” — compressed session summaries. When the Stop hook fires, the entire session gets summarized into a structured narrative. Stored in the mem:summaries KV scope.
“What I know” — extracted facts and patterns. When 5+ session summaries exist, the consolidation pipeline feeds them to the LLM to extract generalized facts with confidence scores.
“How to do it” — workflows and decision patterns. When patterns recur across sessions (same type, frequency >= 2), the LLM synthesizes step-by-step procedures with trigger conditions.
The Consolidation Pipeline
Consolidation is the “sleep” phase — it happens periodically, not on every observation. The pipeline in src/functions/consolidation-pipeline.ts runs through each tier:
Check thresholds
5+ summaries for semantic, 2+ recurring patterns for procedural
Extract facts
LLM processes summaries, outputs <fact confidence="0.8"> XML entries
Deduplicate
Case-insensitive match against existing semantic memories
Apply decay
Existing memories lose strength: strength *= 0.9^periods
// Exponential decay applied to memories
strength *= 0.9 ** number_of_decay_periods;
// Decay period: configurable, default 30 days
// Floor at 0.1 — never fully zero
if (strength < 0.1) strength = 0.1;
Every 30 days (by default), each memory loses 10% of its strength. After 30 days: 90%. After 60 days: 81%. After a year: 6%.
But memories that get accessed get reinforced — their strength goes back up. The more you use a memory, the longer it survives.
The floor at 0.1 means no memory fully disappears through decay alone. It takes active eviction to remove it.
The Retention Score
Not all memories are created equal. A retention score determines each memory's fate:
score = salience * exp(-lambda * days)
+ sigma * reinforcementBoost
// Salience weights by memory type:
architecture = 0.9 // structural decisions
bug = 0.7 // bugs and fixes
pattern = 0.8 // recurring patterns
preference = 0.85 // user preferences
// Tier classification:
hot >= 0.7 // always keep
warm >= 0.4 // keep but watch
cold >= 0.15 // candidate for eviction
evict < 0.15 // remove on next sweep
The score has two parts: how important the memory inherently is (salience), and how recently it was accessed (reinforcement).
Architecture decisions start with a high base score (0.9) because they are fundamental. Bugs start lower (0.7) because they get fixed.
Each time a memory is accessed, it gets a reinforcement boost. Frequently accessed memories stay “hot” even as they age.
The four tiers determine what to keep and what to evict. “Hot” memories always survive. “Evict” memories get cleaned up.
Three Ways to Forget
Auto-forget runs every 60 minutes with three strategies:
Memories with a forgetAfter date past the current time are deleted. Like a self-destruct timer — useful for temporary observations that should not persist.
When two memories share concepts and have a Jaccard similarity above 0.9, the older one gets marked isLatest=false. The newer version wins. Like Wikipedia — the article gets updated, not duplicated.
Observations older than 180 days with importance <= 2 are deleted. These are the “I searched for a file” type observations that have no lasting value.
Without forgetting, your memory store grows unbounded and search quality degrades — more results means more noise. Smart forgetting keeps the signal-to-noise ratio high. The system actively maintains the quality of your memory, not just its quantity.
The Complete Lifecycle
Putting it all together — from capture to eventual forgetting:
Capture
PostToolUse hook
Working
Raw observation stored
Episodic
Session summarized
Semantic
Facts extracted
Procedural
Patterns synthesized
Reinforce
Access = strength up
Decay
No access = fade
Evict
Score < 0.15 = remove
Final Check
Your agent remembers that “user prefers jose over jsonwebtoken for JWT.” Six months pass and the memory is never accessed. What happens?
You understand how agentmemory captures, compresses, indexes, searches, consolidates, and forgets. These patterns — privacy filtering, content-addressable dedup, zero-LLM compression, hybrid search with RRF fusion, tiered consolidation, and importance-based eviction — appear in production systems everywhere. Knowing them helps you debug, configure, and extend agentmemory. More importantly, you can now recognize these patterns in other tools and ask for them by name.