Memory v2 Enhancement Guide: Associative Traversal, Salience Weighting, and Access-Based Forgetting
Architectural guide for extending OpenClaw's Memory v2 with entity co-occurrence traversal, salience-weighted retention, and access-based decay to improve retrieval precision in long-running agent deployments.
🔍 Symptoms
Current Memory v2 Retrieval Limitations
Agents running for extended periods (days to weeks) exhibit degraded contextual coherence when using existing retrieval mechanisms. The following symptoms manifest in production deployments:
Symptom 1: Shallow Lexical Retrieval
When querying for conceptually-related information across time, the agent retrieves only surface-level matches:
$ openclaw memory recall "app performance improvements"
---
RETRIEVED FACTS (3):
- W(s=0.3) @config: Updated heartbeat interval from 5m to 30m.
- W(s=0.3) @config: Increased worker pool size to 4.
- W(s=0.3) @api: Added rate limiting middleware.
EXPECTED: Connection to Week 2 debugging session about slow database queries
ACTUAL: Generic config changes onlyThe agent cannot traverse the implicit chain: “performance” → “slow endpoint” → “database query” → “Sarah’s expertise.”
Symptom 2: Equal Weighting of Disparate Memories
All stored facts compete equally for context budget regardless of significance:
$ openclaw memory recall "any recent updates"
---
RETRIEVED (k=10, context budget: 4KB):
1. W(s=0.3) @config: Updated heartbeat interval from 5m to 30m.
2. W(s=0.3) @config: Increased worker pool size to 4.
3. W(s=0.3) @config: Set log level to INFO.
4. W(s=0.3) @config: Disabled telemetry opt-in.
5. B(s=0.3) @Sarah @project: Sarah announced she's leaving next month.
6. B(s=0.3) @user @identity: User prefers morning standups.
...
CRITICAL GAP: No salience differentiation. Sarah's departure competes equally with log level changes.Symptom 3: Unbounded Index Growth Without Decay
After 30+ days of continuous operation:
$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM facts;"
487
$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM facts WHERE last_accessed > datetime('now', '-7 days');"
12Only 2.5% of facts were accessed in the past week, yet all 487 compete in retrieval scoring. The reflect job must process an ever-growing set with no prioritization signal.
Symptom 4: Hub Node Pollution (Reference from CLS-M Benchmark)
Entities appearing across many facts absorb retrieval activation:
$ sqlite3 ~/.openclaw/memory.db "SELECT entity, COUNT(*) as cnt FROM fact_entities GROUP BY entity ORDER BY cnt DESC LIMIT 5;"
entity|cnt
@Peter|203
@heartbeat|57
@api|89
@config|112
@system|78Direct entity traversal through @Peter (203 facts) dilutes signal for specific, relevant connections.
🧠 Root Cause
Architectural Gaps in Current Memory v2 Design
The current retrieval system lacks three critical mechanisms that are essential for maintaining precision in long-running deployments:
Gap 1: Single-Hop Entity Retrieval
The existing entity-aware retrieval model returns facts directly tagged with the query entity but does not recursively traverse co-occurring entities:
-- Current query (single-hop)
SELECT f.content, f.salience
FROM facts f
JOIN fact_entities fe ON f.id = fe.fact_id
JOIN entities e ON fe.entity_id = e.id
WHERE e.name = 'performance';
-- Returns only: facts explicitly tagged @performance
-- Misses: facts about @database that co-occur with @performance across the corpusThis is architecturally correct for exact entity lookup (“tell me about X”) but insufficient for exploratory queries where the agent discovers implicit connections.
Gap 2: Absence of Salience Tracking at Retain Time
The Letta control loop’s fundamental insight is that the agent that has the experience must decide what to retain. However, without a salience parameter on retain calls, this decision is binary (keep/discard) rather than graduated:
-- Current (binary)
openclaw memory retain "Sarah is leaving the company next month"
-- Missing salience metadata that would distinguish:
-- A config file tweak (s=0.2)
-- A critical team change (s=0.95)Without salience, the reflect job cannot distinguish signal from noise—it must proxy importance via recency or access frequency, which are poor proxies for actual significance.
Gap 3: No Access-Based Decay Mechanism
The current design treats all historical facts as equally retrievable regardless of engagement patterns:
-- No temporal or access-based scoring
SELECT content FROM facts
ORDER BY created_at DESC -- Only recency, not relevance
LIMIT 10;This creates three cascading problems:
- Precision degradation: As the index grows, the ratio of relevant-to-irrelevant facts decreases
- Reflect job inefficiency: The reflection processor must evaluate an ever-larger corpus with no prioritization
- Hub noise amplification: High-degree entities (appearing in 100+ facts) dominate traversal without decay
Root Cause Analysis from CLS-M Prototype
The CLS-M prototype (132 nodes, 802 edges) validated these gaps empirically:
- Recall was acceptable (65%) but precision was poor (35%)—meaning 65% of retrieved content was noise
- Hub nodes destroyed precision: The
heartbeatnode had 57 edges, absorbing activation that should have gone to specific nodes - Time-based decay failed: A fact from 3 months ago that is accessed weekly should remain prominent; age alone is not a relevance signal
The fix is not to build a separate knowledge graph but to extend the existing SQLite index with:
- Entity co-occurrence tracking via inverse document frequency (IDF) weighting
- Salience as a first-class parameter on retain operations
- Access-based decay that resets on retrieval (not pure age-based decay)
🛠️ Step-by-Step Fix
Phase 1: Schema Extensions for SQLite Index
Add salience and access tracking columns to the existing schema:
-- Migration: add_salience_and_access_tracking.sql
-- 1. Add salience column (0.0 to 1.0, default 0.5)
ALTER TABLE facts ADD COLUMN salience REAL DEFAULT 0.5;
-- 2. Add access tracking columns
ALTER TABLE facts ADD COLUMN last_accessed_at DATETIME DEFAULT NULL;
ALTER TABLE facts ADD COLUMN access_count INTEGER DEFAULT 0;
-- 3. Create index for access-based queries
CREATE INDEX idx_facts_last_accessed ON facts(last_accessed_at);
CREATE INDEX idx_facts_salience ON facts(salience);
-- 4. Precompute entity frequencies for IDF weighting
CREATE TABLE entity_stats AS
SELECT
e.id,
e.name,
COUNT(fe.fact_id) as fact_count,
1.0 / LOG(COUNT(fe.fact_id) + 1) as idf_weight
FROM entities e
LEFT JOIN fact_entities fe ON e.id = fe.entity_id
GROUP BY e.id;
CREATE INDEX idx_entity_stats_fact_count ON entity_stats(fact_count);Phase 2: Entity Co-Occurrence Table
Build co-occurrence matrix from existing fact index:
-- Migration: build_entity_cooccurrence.sql
-- 1. Create co-occurrence table
CREATE TABLE entity_cooccurrence (
entity_id_1 INTEGER NOT NULL,
entity_id_2 INTEGER NOT NULL,
cooccur_count INTEGER DEFAULT 1,
cooccur_weight REAL DEFAULT 0.0,
PRIMARY KEY (entity_id_1, entity_id_2),
FOREIGN KEY (entity_id_1) REFERENCES entities(id),
FOREIGN KEY (entity_id_2) REFERENCES entities(id)
);
-- 2. Populate from existing fact_entities (facts with 2+ entities)
INSERT INTO entity_cooccurrence (entity_id_1, entity_id_2, cooccur_count)
SELECT
fe1.entity_id,
fe2.entity_id,
COUNT(DISTINCT fe1.fact_id)
FROM fact_entities fe1
JOIN fact_entities fe2 ON fe1.fact_id = fe2.fact_id
WHERE fe1.entity_id < fe2.entity_id -- Avoid duplicates
GROUP BY fe1.entity_id, fe2.entity_id;
-- 3. Compute weighted co-occurrence using IDF
UPDATE entity_cooccurrence SET cooccur_weight = (
SELECT
CAST(cooccur_count AS REAL) *
(SELECT idf_weight FROM entity_stats WHERE idf_weight = entity_id_1) *
(SELECT idf_weight FROM entity_stats WHERE entity_stats.id = entity_id_2)
WHERE entity_cooccurrence.entity_id_1 = entity_id_1
AND entity_cooccurrence.entity_id_2 = entity_id_2
);
-- 4. Create index for fast co-occurrence lookups
CREATE INDEX idx_cooccur_lookup ON entity_cooccurrence(entity_id_1, cooccur_weight DESC);Phase 3: CLI Command Updates
Extend the retain command with salience parameter:
# Before
openclaw memory retain "Sarah is leaving the company next month"
# After (with salience)
openclaw memory retain "Sarah is leaving the company next month" \
--type B \
--entity Sarah \
--entity project \
--salience 0.95Extend the recall command with salience filter and associative traversal:
# Before
openclaw memory recall "performance improvements"
# After (with enhanced options)
openclaw memory recall "performance improvements" \
--k 10 \
--min-salience 0.3 \
--associative-depth 2 \
--activation-decay 0.5Phase 4: Associative Traversal Algorithm
Implement depth-limited traversal with activation decay:
def associative_traverse(seed_entities: list[str], depth: int = 2, decay: float = 0.5) -> dict:
"""
Traverse entity co-occurrence graph with depth limiting and activation decay.
Returns:
dict: {entity_name: accumulated_activation_score}
"""
activation = {}
visited = set()
# Initialize seed entities with full activation
for entity_name in seed_entities:
activation[entity_name] = 1.0
visited.add(entity_name)
current_entities = seed_entities
current_activation = 1.0
for hop in range(depth):
next_entities = []
next_activation = current_activation * decay
for entity_name in current_entities:
# Query co-occurring entities with IDF weighting
cooccurring = query("""
SELECT e.name, c.cooccur_weight, es.idf_weight
FROM entity_cooccurrence c
JOIN entities e ON c.entity_id_2 = e.id
JOIN entity_stats es ON e.id = es.id
WHERE c.entity_id_1 = (
SELECT id FROM entities WHERE name = ?
)
AND e.name NOT IN ({}),
ORDER BY c.cooccur_weight * es.idf_weight DESC
LIMIT 10
""", entity_name)
for coentity_name, cooccur_weight, idf_weight in cooccurring:
if coentity_name not in visited:
contribution = next_activation * cooccur_weight * idf_weight
activation[coentity_name] = activation.get(coentity_name, 0) + contribution
next_entities.append(coentity_name)
visited.add(coentity_name)
current_entities = next_entities
current_activation = next_activation
return activationPhase 5: Access-Based Decay Implementation
Implement power-law decay on retrieval score:
def compute_retrieval_score(fact: dict, query_entities: list[str],
now: datetime = None) -> float:
"""
Compute composite retrieval score including salience and access-based decay.
Components:
- Base match score (lexical/semantic/associative)
- Salience weight (from retain call)
- Access decay (power-law, reset on retrieval)
"""
if now is None:
now = datetime.utcnow()
base_score = compute_base_match_score(fact, query_entities)
salience_score = fact.get('salience', 0.5)
# Access-based decay (power-law, halves every 7 days)
last_accessed = fact.get('last_accessed_at')
if last_accessed:
days_since_access = (now - last_accessed).days
access_decay = 0.5 ** (days_since_access / 7.0)
else:
access_decay = 0.25 # Never-accessed facts start quieter
# Boost for frequent access (logarithmic to prevent hub dominance)
access_count = fact.get('access_count', 0)
access_boost = 1.0 + (0.1 * math.log1p(access_count))
composite_score = (
base_score * 0.4 +
salience_score * 0.35 +
access_decay * access_boost * 0.25
)
return composite_score
def on_fact_retrieved(fact_id: int) -> None:
"""Update access tracking when a fact is retrieved."""
execute("""
UPDATE facts
SET last_accessed_at = ?,
access_count = access_count + 1
WHERE id = ?
""", (datetime.utcnow(), fact_id))Phase 6: Reflect Loop Integration
Update the reflect job to prioritize recently-accessed facts:
# In reflect job processor
def reflect_on_memories(agent_id: str, core_memory_max_tokens: int = 2048) -> None:
# Query recently-accessed facts weighted by salience
recent_facts = query("""
SELECT f.*,
COALESCE(f.salience, 0.5) *
(1.0 + 0.1 * LOG1P(COALESCE(f.access_count, 0))) as priority_score
FROM facts f
WHERE f.agent_id = ?
AND (
f.last_accessed_at > datetime('now', '-30 days')
OR f.salience > 0.8
)
ORDER BY priority_score DESC, f.last_accessed_at DESC
LIMIT 100
""", agent_id)
# Existing reflect logic operates on priority-filtered set
consolidated = consolidate_memories(recent_facts)
update_core_memory(consolidated, max_tokens=core_memory_max_tokens)🧪 Verification
Verification Test Suite
Execute the following commands to validate each enhancement:
Test 1: Schema Migration
$ sqlite3 ~/.openclaw/memory.db ".schema facts"
--- Expected output ---
CREATE TABLE facts (
...
salience REAL DEFAULT 0.5,
last_accessed_at DATETIME,
access_count INTEGER DEFAULT 0
);
$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM entity_cooccurrence;"
--- Expected output ---
> 0 (before population) or > 100 (after population with populated index)Test 2: Salience-Aware Retain and Recall
# Retain with salience
$ openclaw memory retain "Sarah is leaving the company next month" \
--type B \
--entity Sarah \
--entity project \
--salience 0.95
--- Expected output ---
✓ Retained: B(s=0.95) @Sarah @project: Sarah is leaving...
# Verify in database
$ sqlite3 ~/.openclaw/memory.db \
"SELECT content, salience FROM facts WHERE content LIKE '%Sarah%';"
--- Expected output ---
Sarah is leaving the company next month|0.95Test 3: Access Tracking
# Query a fact (simulated)
$ openclaw memory recall "heartbeat configuration"
# Verify access tracking updated
$ sqlite3 ~/.openclaw/memory.db \
"SELECT content, last_accessed_at, access_count FROM facts ORDER BY access_count DESC LIMIT 3;"
--- Expected output ---
Updated heartbeat interval from 5m to 30m.|2025-01-15 10:30:00|5
Increased worker pool size to 4.|2025-01-15 09:15:00|3
Rate limiting middleware added.|2025-01-14 14:22:00|1Test 4: Associative Traversal Query
# Query with associative depth
$ openclaw memory recall "app performance" \
--associative-depth 2 \
--min-salience 0.3
--- Expected output ---
RETRIEVED (associative, depth=2):
Direct matches:
- W(s=0.2) @config: Updated heartbeat interval from 5m to 30m.
2-hop connections:
- B(s=0.95) @Sarah @project: Sarah is leaving... (via @database → @slow-endpoint)
- W(s=0.3) @api: Rate limiting middleware added. (via @slow-endpoint)
# Verify traversal path in debug mode
$ openclaw memory recall "app performance" --associative-depth 2 --debug
--- Expected output ---
Traversal: performance → {database, slow-endpoint, api}
→ database → {Sarah, PostgreSQL, indexing}
→ Final activation: {Sarah: 0.42, indexing: 0.31, ...}Test 5: Composite Scoring Validation
$ python3 -c "
from openclaw.memory.scoring import compute_retrieval_score
import datetime
test_fact = {
'content': 'Sarah is leaving next month',
'salience': 0.95,
'last_accessed_at': datetime.datetime.now() - datetime.timedelta(days=2),
'access_count': 5
}
score = compute_retrieval_score(test_fact, query_entities=['personnel'])
print(f'Composite score: {score:.3f}')
print(f' - Salience contribution: {0.95 * 0.35:.3f}')
print(f' - Access decay (2 days): {0.5 ** (2/7) * 1.15 * 0.25:.3f}')
"
--- Expected output ---
Composite score: 0.573
- Salience contribution: 0.333
- Access decay (2 days): 0.240Test 6: Reflect Job Prioritization
# Run reflect with debug output
$ openclaw memory reflect --agent-id test-agent --debug
--- Expected output ---
Processing 47 facts (filtered from 487 total by priority)
Top priority facts:
1. B(s=0.95) @Sarah @project: Sarah is leaving... (priority: 1.23)
2. B(s=0.9) @user @identity: User prefers morning standups... (priority: 1.19)
3. W(s=0.8) @Peter @deadline: Q1 deadline is March 15... (priority: 1.08)
Core memory updated: 1,847 tokens (was 2,103)⚠️ Common Pitfalls
Implementation Traps and Environment-Specific Considerations
Pitfall 1: Hub Node Dominance Without IDF Weighting
Symptom: Associative traversal returns nearly identical results regardless of query—high-degree entities (Peter, config, system) dominate all paths.
Cause: Raw co-occurrence counts without inverse entity frequency weighting.
Fix: Ensure the entity_stats.idf_weight = 1 / log(entity_fact_count) formula is applied in all co-occurrence queries:
-- Wrong (hub dominance)
SELECT e.name FROM entities e
JOIN fact_entities fe ON e.id = fe.entity_id
WHERE fe.fact_id IN (
SELECT fact_id FROM fact_entities WHERE entity_id = ?
)
ORDER BY COUNT(*) DESC
-- Correct (IDF-weighted)
SELECT e.name FROM entities e
JOIN entity_stats es ON e.id = es.id
JOIN fact_entities fe ON e.id = fe.entity_id
WHERE fe.fact_id IN (
SELECT fact_id FROM fact_entities WHERE entity_id = ?
)
ORDER BY es.idf_weight * COUNT(*) DESCPitfall 2: Confusing Time-Based and Access-Based Decay
Symptom: Old but frequently-accessed facts receive low scores; fresh but never-accessed facts receive high scores.
Cause: Using last_accessed_at age alone instead of access-based decay with boost.
Rule: Access-based decay (reset on retrieval) outperforms time-based decay. A 3-month-old fact accessed weekly should outrank a 1-day-old fact never accessed:
# Wrong: Pure age decay
score = salience * (0.5 ** (age_in_days / 30))
# Correct: Access-based decay with boost
access_decay = 0.5 ** (days_since_last_access / 7) # Halves every 7 days
access_boost = 1.0 + (0.1 * log1p(access_count)) # Logarithmic, prevents hub dominance
score = salience * access_decay * access_boostPitfall 3: Associative Depth Too Deep
Symptom: Retrieval latency exceeds 500ms; output contains seemingly random facts.
Cause: Depth > 3 without activation cutoff floods the traversal.
Fix: Implement both depth limit AND minimum activation threshold:
MAX_DEPTH = 3
MIN_ACTIVATION = 0.05
INITIAL_ACTIVATION = 1.0
DECAY_PER_HOP = 0.5
# Traversal stops when:
# - Depth limit reached, OR
# - No entities exceed MIN_ACTIVATION thresholdPitfall 4: Salience Estimation Failure at Retain Time
Symptom: All facts receive similar salience scores (0.4-0.6); differentiation is lost.
Cause: LLM estimation is too conservative; defaults to middle values.
Fix: Implement prompt-based salience estimation with explicit anchors:
SYSTEM_PROMPT = """
Estimate salience (0.0-1.0) for this memory:
- 0.9-1.0: Identity-defining, relationship-changing, career-affecting
- 0.7-0.9: Important project decisions, team changes, deadlines
- 0.4-0.7: Routine work, configurations, bug fixes
- 0.1-0.4: Minor preferences, temp states, easily reconstructed
Memory: {fact_content}
Respond ONLY with a number between 0.0 and 1.0.
"""Always allow human override via --salience CLI flag or direct file editing.
Pitfall 5: Docker/Container Environment Permissions
Symptom: sqlite3: unable to open database file when running in Docker.
Cause: SQLite database mounted at volume with incorrect permissions or path.
Fix: Ensure volume mount preserves directory structure:
# Wrong
docker run -v /host/memory:/container/memory image
# Correct (bind mount the parent directory)
docker run -v /host/.openclaw:/root/.openclaw image
# Verify permissions
docker exec container ls -la /root/.openclaw/memory.db
# Should show: -rw-r--r-- 1 root root ...Pitfall 6: Raspberry Pi 5 Resource Constraints
Symptom: Associative traversal causes memory pressure on ARM device.
Cause: Python dictionaries for activation tracking + recursive queries exceed available RAM.
Fix: Limit traversal scope and use cursor-based iteration:
# Limit activation dict size
MAX_ACTIVATION_ENTITIES = 50
# Use generator for memory efficiency
def associative_traverse_stream(seed, depth, decay):
frontier = {seed: 1.0}
visited = {seed}
for _ in range(depth):
next_frontier = {}
for entity, activation in frontier.items():
if activation < MIN_ACTIVATION:
continue
for coentity in fetch_cooccurring(entity, limit=5):
if coentity not in visited:
next_frontier[coentity] = next_frontier.get(coentity, 0) + \
activation * decay
visited.add(coentity)
frontier = next_frontier
yield from frontier.items()🔗 Related Errors
Contextually Connected Issues and Historical Reference
Related Design Documents
- Workspace Memory v2 Research Doc — The baseline architecture this guide extends. Key sections: "Entity-Aware Retrieval," "Incremental Indexing," "Reflect Loop"
- Hindsight × Letta Integration — Typed facts with confidence-bearing opinions provide the substrate for salience weighting
- CLS-M Prototype Analysis — Empirical validation (132 nodes, 802 edges, F1=44%) demonstrating precision challenges with naive spreading activation
Common Error Codes in Memory Systems
| Error Code | Description | Related To |
|---|---|---|
E2BIG | Context assembled exceeds token budget; reflect job cannot compress | Salience weighting, access decay |
ENOENTITY | Entity lookup returns empty but semantic search finds results | Entity extraction gap, FTS fallback |
EDUPFACTS | Near-duplicate facts accumulated without consolidation | Reflect loop limitations |
EHUBNODES | Retrieval dominated by high-frequency entities (Peter, system, config) | IDF weighting absence |
ECOLDSTART | New deployment has insufficient fact density for associative traversal | Entity co-occurrence density threshold |
EDECAYTOOFAST | Time-based decay erases useful old memories prematurely | Access-based vs. time-based decay |
Historical Context from CLS-M
The CLS-M prototype identified failure modes that informed these recommendations:
- F1=44% on 45-query benchmark — Precision (35%) was the bottleneck, not recall (65%)
- Hub noise kill:
heartbeatnode with 57 edges absorbed 15% of total activation on every query - Delegation failure: Sub-agent memory extraction failed consistently; the experiencing agent must own retention
- Spread too thin: Activation across 800+ edges diluted signal below useful thresholds
These findings validate the incremental approach: start with FTS5, add embeddings, then entity co-occurrence only after sufficient index density is reached.
OpenClaw Version Compatibility
| Version | Required Features | Migration Path |
|---|---|---|
| v0.11.x | Basic fact storage, FTS5 | Apply Phase 1-2 migrations |
| v0.12.0 | Entity extraction, salience fields | Apply Phase 1-6 incrementally |
| v0.13.0 (planned) | Associative traversal, access tracking | Full implementation |