01

The Amnesia Problem

Every time you close a chat with an AI coding agent, it forgets everything. agentmemory fixes that.

Imagine This Scenario

You just spent an hour with an AI coding agent setting up JWT authentication for your app. You worked through the middleware, wrote tests, chose a library. Everything works.

Next day, you open a new session and ask: “Add rate limiting to the API.”

The agent says: “Sure! First, let me look at your project structure...”

It has no idea you already set up auth. No idea you chose jose over jsonwebtoken for Edge compatibility. No idea your tests cover token validation. You have to explain everything again.

⚠️
This Is Not a Small Problem

Built-in memory (like CLAUDE.md files) caps at 200 lines and goes stale. At 240 observations, you would burn 22,000+ tokens every session — that is like re-reading a short novel before every conversation.

What agentmemory Does

agentmemory is a persistent memory system for AI coding agents. It runs silently in the background and remembers what your agent does across sessions.

1
Session 1: You set up JWT auth

The agent writes code, runs tests, fixes bugs. agentmemory silently captures every tool use.

2
Session ends: Observations get compressed

Raw tool calls become structured facts and concepts — stored in searchable indexes.

3
Session 2: You ask for rate limiting

The agent already knows your auth setup. No re-explaining. It just starts working.

Why It Matters: The Numbers

🎯

95.2% Recall Accuracy

When you search for “database performance optimization,” it finds the N+1 query fix you did last week. Simple keyword matching cannot do that.

📉

92% Fewer Tokens

~1,900 tokens per session instead of 22,000+. That is roughly $10/year instead of “impossible because it exceeds the context window.”

🔌

Works With Every Agent

Claude Code, Cursor, Gemini CLI, Codex CLI, Cline, Windsurf — one memory server, all agents share the same memories.

💾

Zero External Dependencies

No Postgres, no Redis, no cloud service. It uses SQLite + a built-in engine called iii. Everything runs locally.

One Command, Full Memory

TERMINAL

# Start the memory server
npx @agentmemory/agentmemory

# In another terminal, see it work
npx @agentmemory/agentmemory demo

# Open the real-time viewer
open http://localhost:3113
            
PLAIN ENGLISH

The first command starts the memory server in the background. It listens for your AI agent to do things.

The demo command seeds 3 realistic sessions and runs searches against them. You can see it find “N+1 query fix” when you search “database performance optimization.”

The viewer shows memories building live — like watching a brain form connections in real time.

💡
The Key Insight

This is not a database you manually query. It is a passive observer that silently captures what your AI agent does, compresses it into searchable knowledge, and injects the right context when the next session starts. You never have to “save” anything — it just happens.

Check Your Understanding

What is the main advantage of agentmemory over built-in agent memory (like CLAUDE.md files)?

02

Meet the Cast

agentmemory has six key players. Knowing who they are helps you tell your AI agent where to put things.

Six Components, One Mission

Think of agentmemory as a modular system where each component has a clear role. Here are the six characters you need to know:

⚙️

iii Engine

The runtime that powers everything. It replaces what would normally need Express.js, SQLite, Redis, and pm2 — all in one binary.

🪝

Hooks (12 of them)

Scripts that fire automatically when your AI agent does something — starts a session, uses a tool, ends a conversation.

🧠

Functions (50+)

The business logic: observe, compress, search, consolidate, forget. Every capability is registered as an iii function.

🔍

Search Indexes

Three search engines working together: BM25 (keyword), Vector (meaning), and Graph (relationships).

🔌

MCP Server

51 tools your AI agent can call directly — like “search my memories” or “save this insight.”

👁️

Real-Time Viewer

A web dashboard at port 3113 where you can watch memories being created live.

A Day in the Life: The Cast Talks

Watch how the six components collaborate when your AI agent edits a file:

How They Connect

Your AI Agent

🤖
Agent (Claude Code, Cursor...)

agentmemory Server

🪝
Hooks (12)
⚙️
iii Engine + Functions (50+)
🔍
Search Indexes (BM25 + Vector + Graph)
🔌
MCP Server (51 tools)

Storage & Observability

🗄️
State KV (SQLite)
👁️
Viewer (port 3113)
Click any component to learn what it does

The Secret Sauce: iii Engine

The iii engine is what makes agentmemory work without any external dependencies. It provides three building blocks:

Functions

Registered with sdk.registerFunction("mem::observe", handler). Every piece of business logic is a function — observe, search, compress, forget. REST endpoints, MCP tools, and internal calls all route through functions.

📡
Triggers

HTTP triggers map REST endpoints to functions. POST /agentmemory/observe triggers mem::observe. Cron triggers run consolidation on a schedule. Event triggers fire on state changes.

💾
State (KV)

SQLite-backed key-value storage with logical scopes. mem:obs:sessionId stores observations, mem:summaries stores session summaries. 49 scopes total.

TYPESCRIPT

// Every capability is an iii function
import { registerWorker } from "iii-sdk";

const sdk = registerWorker("ws://localhost:49134");
await sdk.connect();

sdk.registerFunction("mem::observe",
  async (data) => {
    // privacy filter, dedup, compress, index
    return { stored: true };
  }
);
            
PLAIN ENGLISH

Import the iii SDK — the toolkit for connecting to the engine.

Connect to the engine running on port 49134. This is the control channel.

Register a function named “mem::observe” — this is what captures what your agent does.

The handler receives data, processes it (privacy filter, dedup, compress), and returns a result.

Where Everything Lives

src/ All source code
hooks/ 12 lifecycle scripts (session-start, post-tool-use, stop...)
functions/ 50+ business logic modules (observe, search, compress, graph...)
mcp/ MCP server (51 tools, 6 resources, 3 prompts)
state/ KV store, search indexes (BM25, vector, hybrid)
index.ts Main entry point — registers all functions and triggers
config.ts Environment variable loading and defaults
plugin/ Claude Code / Codex CLI plugin (hooks + skills + MCP)
integrations/ Agent-specific adapters (Hermes, OpenClaw, pi)
📌
Good to Know

The index.ts file registers 50+ functions in sequence. Each registerXxxFunction(sdk) call adds capabilities to the engine. This pattern — every feature is a function registered with the engine — is what makes the system so extensible.

Check Your Understanding

What does the iii engine replace in a traditional tech stack?

03

The Memory Pipeline

From a raw tool call to a searchable memory — five stages, each one critical.

Five Stages of Memory Creation

Think of it like a factory pipeline. Raw tool calls enter one end, and structured, searchable memories come out the other. Let us trace the journey:

1

Capture
Hook fires, sends to REST API

2

Privacy Filter
Strip secrets and API keys

3

Dedup
Skip repeated operations

4

Compress
Extract structure from raw data

5

Index
Add to BM25 + Vector + Graph

Watch It Flow

Click through each step to see how data moves through the pipeline:

🪝
Hook
⚙️
Engine
🔍
Indexes
👁️
Viewer
Click “Next Step” to begin

Stage 2: The Privacy Filter

Before anything gets stored, a privacy filter scrubs sensitive data. This is your first line of defense — your API keys never reach the memory store.

TYPESCRIPT — src/functions/privacy.ts

const SECRET_PATTERNS = [
  /Bearer\s+[A-Za-z0-9._\-+/=]{20,}/gi,
  /sk-ant-[A-Za-z0-9\-_]{20,}/g,
  /gh[pus]_[A-Za-z0-9]{36,}/g,
  /AKIA[0-9A-Z]{16}/g,
  /eyJ[A-Za-z0-9_-]{10,}\.../g,
];

export function stripPrivateData(input: string) {
  let result = input
    .replace(PRIVATE_TAG_RE, "[REDACTED]");
  for (const pattern of SECRET_PATTERNS) {
    result = result.replace(pattern, "[REDACTED_SECRET]");
  }
  return result;
}
            
PLAIN ENGLISH

A list of 14 regex patterns that match common secret formats: Bearer tokens, Anthropic keys, GitHub tokens, AWS keys, JWTs...

First, strip anything the user wrapped in <private> tags — an explicit opt-out.

Then scan for every known secret pattern. If it looks like an API key, it gets replaced with [REDACTED_SECRET].

The cleaned text is what actually gets stored. Your secrets never touch the memory store.

Stage 3: Deduplication

AI agents often repeat the same action multiple times — running the same test, reading the same file. The content-addressable dedup system prevents storing the same observation twice:

TYPESCRIPT — src/functions/dedup.ts

computeHash(sessionId, toolName, toolInput) {
  const input = typeof toolInput === "string"
    ? toolInput.slice(0, 500)
    : JSON.stringify(toolInput).slice(0, 500);
  const raw = `${sessionId}:${toolName}:${input}`;
  return createHash("sha256")
    .update(raw).digest("hex");
}
            
PLAIN ENGLISH

Take the first 500 characters of the tool input — enough to be unique without being wasteful.

Combine session ID + tool name + truncated input into one string.

Hash it with SHA-256. If we have seen this exact hash in the last 5 minutes, skip it. Same action, same result — no duplicates.

💡
The 5-Minute Window

Dedup entries expire after 5 minutes. This means if the agent runs the same test twice with a 10-minute gap, both get stored — they might have different results. The TTL is short enough to catch rapid retries, but long enough to let legitimate repeated actions through.

Stage 4: Compression (Zero LLM)

By default, agentmemory uses synthetic compression — pure heuristics that extract structure from tool calls without spending a single token on an LLM. It is like having a smart filing clerk who knows that grep means “searching” and bash means “running a command.”

TYPESCRIPT — src/functions/compress-synthetic.ts

function inferType(toolName: string): ObservationType {
  const n = toolName.replace(/([a-z])([A-Z])/g, "$1_$2")
    .toLowerCase();
  if (["grep","search","glob"].some(hasWord))
    return "search";
  if (["edit","update","patch"].some(hasWord))
    return "file_edit";
  if (["bash","shell","exec"].some(hasWord))
    return "command_run";
}
            
PLAIN ENGLISH

Take the tool name and normalize it — WebFetch becomes web_fetch, Grep becomes grep.

If the tool name contains “grep,” “search,” or “glob,” classify this as a search operation.

If it contains “edit,” “update,” or “patch,” it is a file edit. Same logic for commands, writes, reads, and web fetches.

No LLM call needed. The tool name itself tells us what kind of action happened.

file_edit Agent modified a file (Edit, Write, Update tools)
search Agent searched for something (Grep, Search, Glob)
command_run Agent ran a terminal command (Bash, Shell, Exec)
web_fetch Agent fetched data from the internet (WebFetch, HTTP)
file_read Agent read a file (Read, View tools)

Spot the Issue

Here is a simplified version of the hook that captures tool usage. There is a subtle timing issue — can you find it?

1 async function main() {
2 const response = await fetch(`${REST_URL}/observe`, {
3 method: "POST",
4 signal: AbortSignal.timeout(3000),
5 });
6 }

Check Your Understanding

How does the deduplication system know if an observation is a repeat?

04

How It Finds What You Need

Three search engines combine forces to find the right memory — even when your query uses completely different words.

The Search Problem

Last week you fixed an N+1 query problem in your codebase. Today you search for “database performance optimization.”

A simple keyword search would never find “N+1 query fix” — the words do not match. But agentmemory finds it. How?

The answer: three search engines running simultaneously, each looking at the query from a different angle, then fused together into one ranked result.

Three Engines, One Query

📝

BM25 (Keyword)

Always on. Matches words in your query to words in stored memories. Uses stemming and synonym expansion. Fast, reliable, but literal.

🧬

Vector (Meaning)

When enabled. Converts text into numerical embeddings and finds memories with similar meaning. “Database performance” matches “N+1 query fix” because the concepts are related.

🕸️

Graph (Relationships)

When enabled. Extracts entities (people, files, concepts) and their relationships. Traverses the knowledge graph to find memories connected through shared entities.

The Magic: Reciprocal Rank Fusion

Each engine returns a ranked list. But how do you merge three different rankings into one? The answer is Reciprocal Rank Fusion (RRF).

TYPESCRIPT — src/state/hybrid-search.ts

const RRF_K = 60;

// Each engine produces a ranked list
const bm25Results = this.bm25.search(query, limit * 2);
const vectorResults = this.vector
  .search(queryEmbedding, limit * 2);

// Merge: score = weight * 1/(K + rank)
combined = w_bm25 * (1/(60 + bm25Rank))
       + w_vector * (1/(60 + vectorRank))
       + w_graph * (1/(60 + graphRank));
            
PLAIN ENGLISH

K=60 is the magic number — large enough that rank #1 is not infinitely better than rank #2, but small enough that top ranks still matter.

Ask each engine for results. BM25 searches keywords, Vector searches meaning.

For each result, compute a combined score: its weight times the inverse of (60 plus its rank in that list).

A memory ranked #1 in both keyword AND meaning scores much higher than one ranked #5 in just one list. That is how “database optimization” finds “N+1 query fix”.

What Happens When an Engine Is Missing?

Not everyone has an embedding provider configured. Not everyone enables the knowledge graph. The system adapts automatically:

💡
Dynamic Weight Renormalization

If you do not have a vector search engine, the system does not just set that weight to zero and lose search quality. It redistributes the weight to the engines that are available. No vector search? BM25 gets the full weight. No graph? Vector + BM25 split the budget. The system always uses the best available signals.

The default weights are: BM25 = 0.4, Vector = 0.6, Graph = 0.3. But these are configurable:

BM25_WEIGHT=0.4 Keyword search weight (always active)
VECTOR_WEIGHT=0.6 Semantic search weight (when embedding provider configured)
GRAPH_WEIGHT=0.3 Knowledge graph weight (when graph extraction enabled)
TOKEN_BUDGET=2000 Maximum tokens of context to inject into the session

The Complete Search Journey

When your AI agent starts a new session and asks agentmemory for context, here is what happens behind the scenes:

1
SessionStart hook fires

The agent starts a new conversation. The hook loads the project profile — top concepts, frequently edited files, known patterns.

2
Triple-stream search executes

BM25 scans for keyword matches. Vector index finds semantically similar memories. Graph traverses relationships. All three run in parallel.

3
RRF fusion merges results

Three ranked lists are combined into one. Session diversification limits each session to max 3 results — no single session dominates.

4
Token budget enforcement

The top results are trimmed to fit within 2,000 tokens. The agent gets the most relevant context without overflowing its context window.

5
Context injection

The compressed context is written to stdout, which the AI agent treats as additional conversation context. The agent now “knows” your project history.

The Embedding Options

Vector search needs an embedding model. agentmemory supports six providers, and auto-detects which one to use:

🏠

Local (Recommended)

Uses all-MiniLM-L6-v2. Runs on your machine. Free. Offline. +8 percentage points recall over BM25-only.

🌐

Gemini

Free tier, 100+ languages, supports 3072 dimensions. Good for multilingual codebases.

🏆

OpenAI

Highest quality embeddings at $0.02 per million tokens. Best if accuracy matters more than cost.

Check Your Understanding

Scenario

You search for “how do tokens refresh” but the stored memory is titled “Implemented JWT rotation middleware in auth.ts.” Which search engine finds this?

05

How Memories Age

Like the human brain, agentmemory has a consolidation process, a decay curve, and an auto-forget mechanism.

Why Your Brain Forgets Things (On Purpose)

Your brain does not store everything forever — that would be overwhelming. Instead, it has a system:

  • During the day, you form short-term memories — what you had for lunch, what your coworker said.
  • While you sleep, your brain consolidates those memories — extracting the important parts and filing them into long-term storage.
  • Memories you rarely access fade over time (the Ebbinghaus forgetting curve).
  • Memories you use frequently get strengthened — they become easier to recall.

agentmemory implements this exact pattern in code. Four tiers, automatic consolidation, exponential decay, and importance-based eviction.

The 4-Tier Memory Model

1
Working Memory

The “what am I doing right now” layer. Raw observations from tool use, stored in real time. Gets 30% of the token budget when injecting context. Entries that score low get demoted to archival.

2
Episodic Memory

“What happened” — compressed session summaries. When the Stop hook fires, the entire session gets summarized into a structured narrative. Stored in the mem:summaries KV scope.

3
Semantic Memory

“What I know” — extracted facts and patterns. When 5+ session summaries exist, the consolidation pipeline feeds them to the LLM to extract generalized facts with confidence scores.

4
Procedural Memory

“How to do it” — workflows and decision patterns. When patterns recur across sessions (same type, frequency >= 2), the LLM synthesizes step-by-step procedures with trigger conditions.

The Consolidation Pipeline

Consolidation is the “sleep” phase — it happens periodically, not on every observation. The pipeline in src/functions/consolidation-pipeline.ts runs through each tier:

A

Check thresholds
5+ summaries for semantic, 2+ recurring patterns for procedural

B

Extract facts
LLM processes summaries, outputs <fact confidence="0.8"> XML entries

C

Deduplicate
Case-insensitive match against existing semantic memories

D

Apply decay
Existing memories lose strength: strength *= 0.9^periods

TYPESCRIPT — Consolidation Decay

// Exponential decay applied to memories
strength *= 0.9 ** number_of_decay_periods;

// Decay period: configurable, default 30 days
// Floor at 0.1 — never fully zero
if (strength < 0.1) strength = 0.1;
            
PLAIN ENGLISH

Every 30 days (by default), each memory loses 10% of its strength. After 30 days: 90%. After 60 days: 81%. After a year: 6%.

But memories that get accessed get reinforced — their strength goes back up. The more you use a memory, the longer it survives.

The floor at 0.1 means no memory fully disappears through decay alone. It takes active eviction to remove it.

The Retention Score

Not all memories are created equal. A retention score determines each memory's fate:

THE FORMULA

score = salience * exp(-lambda * days)
      + sigma * reinforcementBoost

// Salience weights by memory type:
architecture = 0.9  // structural decisions
bug          = 0.7  // bugs and fixes
pattern      = 0.8  // recurring patterns
preference   = 0.85 // user preferences

// Tier classification:
hot    >= 0.7   // always keep
warm   >= 0.4   // keep but watch
cold   >= 0.15  // candidate for eviction
evict  < 0.15   // remove on next sweep
            
PLAIN ENGLISH

The score has two parts: how important the memory inherently is (salience), and how recently it was accessed (reinforcement).

Architecture decisions start with a high base score (0.9) because they are fundamental. Bugs start lower (0.7) because they get fixed.

Each time a memory is accessed, it gets a reinforcement boost. Frequently accessed memories stay “hot” even as they age.

The four tiers determine what to keep and what to evict. “Hot” memories always survive. “Evict” memories get cleaned up.

Three Ways to Forget

Auto-forget runs every 60 minutes with three strategies:

TTL Expiry

Memories with a forgetAfter date past the current time are deleted. Like a self-destruct timer — useful for temporary observations that should not persist.

⚔️
Contradiction Detection

When two memories share concepts and have a Jaccard similarity above 0.9, the older one gets marked isLatest=false. The newer version wins. Like Wikipedia — the article gets updated, not duplicated.

🗑️
Low-Value Pruning

Observations older than 180 days with importance <= 2 are deleted. These are the “I searched for a file” type observations that have no lasting value.

📌
Forgetting Is a Feature

Without forgetting, your memory store grows unbounded and search quality degrades — more results means more noise. Smart forgetting keeps the signal-to-noise ratio high. The system actively maintains the quality of your memory, not just its quantity.

The Complete Lifecycle

Putting it all together — from capture to eventual forgetting:

1

Capture
PostToolUse hook

2

Working
Raw observation stored

3

Episodic
Session summarized

4

Semantic
Facts extracted

5

Procedural
Patterns synthesized

6

Reinforce
Access = strength up

7

Decay
No access = fade

8

Evict
Score < 0.15 = remove

Final Check

Scenario

Your agent remembers that “user prefers jose over jsonwebtoken for JWT.” Six months pass and the memory is never accessed. What happens?

🎓
What You Now Know

You understand how agentmemory captures, compresses, indexes, searches, consolidates, and forgets. These patterns — privacy filtering, content-addressable dedup, zero-LLM compression, hybrid search with RRF fusion, tiered consolidation, and importance-based eviction — appear in production systems everywhere. Knowing them helps you debug, configure, and extend agentmemory. More importantly, you can now recognize these patterns in other tools and ask for them by name.