[关联遍历、显著性加权与基于访问的遗忘] - Memory v2 Enhancement Guide: Associative Traversal, Salience Weighting, and Access-Based Forgetting
架构指南,介绍如何扩展 OpenClaw 的 Memory v2,包括实体共现遍历、显著性加权保留和基于访问的衰减,以提高长时间运行的代理部署中的检索精度。
🔍 症状
当前 Memory v2 检索限制
长时间运行(数天到数周)的代理在使用现有检索机制时表现出上下文连贯性下降。以下症状在生产环境中出现:
症状 1:浅层词汇检索
跨时间查询概念相关信息时,代理仅检索到表面匹配:
$ openclaw memory recall "app performance improvements"
---
RETRIEVED FACTS (3):
- W(s=0.3) @config: Updated heartbeat interval from 5m to 30m.
- W(s=0.3) @config: Increased worker pool size to 4.
- W(s=0.3) @api: Added rate limiting middleware.
EXPECTED: Connection to Week 2 debugging session about slow database queries
ACTUAL: Generic config changes only代理无法遍历隐式链:“performance” → “slow endpoint” → “database query” → “Sarah’s expertise.”
症状 2:不同记忆的等权重
所有存储的事实无论重要性如何,都平等地竞争上下文预算:
$ openclaw memory recall "any recent updates"
---
RETRIEVED (k=10, context budget: 4KB):
1. W(s=0.3) @config: Updated heartbeat interval from 5m to 30m.
2. W(s=0.3) @config: Increased worker pool size to 4.
3. W(s=0.3) @config: Set log level to INFO.
4. W(s=0.3) @config: Disabled telemetry opt-in.
5. B(s=0.3) @Sarah @project: Sarah announced she's leaving next month.
6. B(s=0.3) @user @identity: User prefers morning standups.
...
CRITICAL GAP: No salience differentiation. Sarah's departure competes equally with log level changes.症状 3:无衰减的无限索引增长
持续运行 30+ 天后:
$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM facts;"
487
$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM facts WHERE last_accessed > datetime('now', '-7 days');"
12只有 2.5% 的事实在过去一周被访问,但全部 487 条在检索评分中竞争。reflect 任务必须处理不断增长的集合,但没有优先级信号。
症状 4:枢纽节点污染(参考 CLS-M 基准)
出现在许多事实中的实体吸收了检索激活:
$ sqlite3 ~/.openclaw/memory.db "SELECT entity, COUNT(*) as cnt FROM fact_entities GROUP BY entity ORDER BY cnt DESC LIMIT 5;"
entity|cnt
@Peter|203
@heartbeat|57
@api|89
@config|112
@system|78通过 @Peter(203 条事实)直接遍历实体,稀释了特定相关连接的信号。
🧠 根因分析
当前 Memory v2 设计中的架构缺陷
当前检索系统缺少三个关键机制,这些机制对于在长时间运行部署中保持精确性至关重要:
缺陷 1:单跳实体检索
现有实体感知检索模型返回直接标记了查询实体的知识,但不会递归遍历共现实体:
-- Current query (single-hop)
SELECT f.content, f.salience
FROM facts f
JOIN fact_entities fe ON f.id = fe.fact_id
JOIN entities e ON fe.entity_id = e.id
WHERE e.name = 'performance';
-- Returns only: facts explicitly tagged @performance
-- Misses: facts about @database that co-occur with @performance across the corpus这对于精确实体查询(“告诉我关于 X 的信息”)在架构上是正确的,但对于探索性查询(代理发现隐式连接)来说是不够的。
缺陷 2:保留时缺少显著性追踪
Letta 控制循环的基本洞察是拥有体验的代理必须决定保留什么。然而,由于 retain 调用没有 salience 参数,这个决定是二进制的(保留/丢弃)而不是渐进的:
-- Current (binary)
openclaw memory retain "Sarah is leaving the company next month"
-- Missing salience metadata that would distinguish:
-- A config file tweak (s=0.2)
-- A critical team change (s=0.95)没有显著性,reflect 任务无法区分信号和噪声——它必须通过最近访问或访问频率来代理重要性,而这些都是实际重要性的糟糕代理。
缺陷 3:没有基于访问的衰减机制
当前设计将所有历史知识视为同等可检索的,无论参与模式如何:
-- No temporal or access-based scoring
SELECT content FROM facts
ORDER BY created_at DESC -- Only recency, not relevance
LIMIT 10;这造成了三个级联问题:
- 精确性下降:随着索引增长,相关与不相关知识的比例降低
- Reflect 任务效率低下:反射处理器必须评估越来越大的语料库,没有优先级
- 枢纽噪声放大:出现在 100+ 事实中的高阶实体在遍历中占主导地位,没有衰减
CLS-M 原型的根因分析
CLS-M 原型(132 个节点,802 条边)通过实证验证了这些缺陷:
- 召回率可接受(65%)但精确率很差(35%)——这意味着 65% 的检索内容是噪声
- 枢纽节点破坏了精确率:
heartbeat节点有 57 条边,吸收了本应流向特定节点的激活 - 基于时间的衰减失败:一个 3 个月前但每周被访问的知识应该保持突出;年龄本身不是相关性信号
修复方案不是构建独立的知识图谱,而是通过以下方式扩展现有 SQLite 索引:
- 通过逆文档频率(IDF)加权进行实体共现追踪
- 显著性作为 retain 操作的一等参数
- 基于访问的衰减(在检索时重置,而不是纯基于年龄的衰减)
🛠️ 逐步修复
阶段 1:SQLite 索引的架构扩展
向现有架构添加显著性和访问追踪列:
-- Migration: add_salience_and_access_tracking.sql
-- 1. Add salience column (0.0 to 1.0, default 0.5)
ALTER TABLE facts ADD COLUMN salience REAL DEFAULT 0.5;
-- 2. Add access tracking columns
ALTER TABLE facts ADD COLUMN last_accessed_at DATETIME DEFAULT NULL;
ALTER TABLE facts ADD COLUMN access_count INTEGER DEFAULT 0;
-- 3. Create index for access-based queries
CREATE INDEX idx_facts_last_accessed ON facts(last_accessed_at);
CREATE INDEX idx_facts_salience ON facts(salience);
-- 4. Precompute entity frequencies for IDF weighting
CREATE TABLE entity_stats AS
SELECT
e.id,
e.name,
COUNT(fe.fact_id) as fact_count,
1.0 / LOG(COUNT(fe.fact_id) + 1) as idf_weight
FROM entities e
LEFT JOIN fact_entities fe ON e.id = fe.entity_id
GROUP BY e.id;
CREATE INDEX idx_entity_stats_fact_count ON entity_stats(fact_count);阶段 2:实体共现表
从现有事实索引构建共现矩阵:
-- Migration: build_entity_cooccurrence.sql
-- 1. Create co-occurrence table
CREATE TABLE entity_cooccurrence (
entity_id_1 INTEGER NOT NULL,
entity_id_2 INTEGER NOT NULL,
cooccur_count INTEGER DEFAULT 1,
cooccur_weight REAL DEFAULT 0.0,
PRIMARY KEY (entity_id_1, entity_id_2),
FOREIGN KEY (entity_id_1) REFERENCES entities(id),
FOREIGN KEY (entity_id_2) REFERENCES entities(id)
);
-- 2. Populate from existing fact_entities (facts with 2+ entities)
INSERT INTO entity_cooccurrence (entity_id_1, entity_id_2, cooccur_count)
SELECT
fe1.entity_id,
fe2.entity_id,
COUNT(DISTINCT fe1.fact_id)
FROM fact_entities fe1
JOIN fact_entities fe2 ON fe1.fact_id = fe2.fact_id
WHERE fe1.entity_id < fe2.entity_id -- Avoid duplicates
GROUP BY fe1.entity_id, fe2.entity_id;
-- 3. Compute weighted co-occurrence using IDF
UPDATE entity_cooccurrence SET cooccur_weight = (
SELECT
CAST(cooccur_count AS REAL) *
(SELECT idf_weight FROM entity_stats WHERE idf_weight = entity_id_1) *
(SELECT idf_weight FROM entity_stats WHERE entity_stats.id = entity_id_2)
WHERE entity_cooccurrence.entity_id_1 = entity_id_1
AND entity_cooccurrence.entity_id_2 = entity_id_2
);
-- 4. Create index for fast co-occurrence lookups
CREATE INDEX idx_cooccur_lookup ON entity_cooccurrence(entity_id_1, cooccur_weight DESC);阶段 3:CLI 命令更新
用显著性参数扩展 retain 命令:
# Before
openclaw memory retain "Sarah is leaving the company next month"
# After (with salience)
openclaw memory retain "Sarah is leaving the company next month" \
--type B \
--entity Sarah \
--entity project \
--salience 0.95用显著性过滤和关联遍历扩展 recall 命令:
# Before
openclaw memory recall "performance improvements"
# After (with enhanced options)
openclaw memory recall "performance improvements" \
--k 10 \
--min-salience 0.3 \
--associative-depth 2 \
--activation-decay 0.5阶段 4:关联遍历算法
实现深度限制和激活衰减的遍历:
def associative_traverse(seed_entities: list[str], depth: int = 2, decay: float = 0.5) -> dict:
"""
Traverse entity co-occurrence graph with depth limiting and activation decay.
Returns:
dict: {entity_name: accumulated_activation_score}
"""
activation = {}
visited = set()
# Initialize seed entities with full activation
for entity_name in seed_entities:
activation[entity_name] = 1.0
visited.add(entity_name)
current_entities = seed_entities
current_activation = 1.0
for hop in range(depth):
next_entities = []
next_activation = current_activation * decay
for entity_name in current_entities:
# Query co-occurring entities with IDF weighting
cooccurring = query("""
SELECT e.name, c.cooccur_weight, es.idf_weight
FROM entity_cooccurrence c
JOIN entities e ON c.entity_id_2 = e.id
JOIN entity_stats es ON e.id = es.id
WHERE c.entity_id_1 = (
SELECT id FROM entities WHERE name = ?
)
AND e.name NOT IN ({}),
ORDER BY c.cooccur_weight * es.idf_weight DESC
LIMIT 10
""", entity_name)
for coentity_name, cooccur_weight, idf_weight in cooccurring:
if coentity_name not in visited:
contribution = next_activation * cooccur_weight * idf_weight
activation[coentity_name] = activation.get(coentity_name, 0) + contribution
next_entities.append(coentity_name)
visited.add(coentity_name)
current_entities = next_entities
current_activation = next_activation
return activation阶段 5:基于访问的衰减实现
对检索分数实现幂律衰减:
def compute_retrieval_score(fact: dict, query_entities: list[str],
now: datetime = None) -> float:
"""
Compute composite retrieval score including salience and access-based decay.
Components:
- Base match score (lexical/semantic/associative)
- Salience weight (from retain call)
- Access decay (power-law, reset on retrieval)
"""
if now is None:
now = datetime.utcnow()
base_score = compute_base_match_score(fact, query_entities)
salience_score = fact.get('salience', 0.5)
# Access-based decay (power-law, halves every 7 days)
last_accessed = fact.get('last_accessed_at')
if last_accessed:
days_since_access = (now - last_accessed).days
access_decay = 0.5 ** (days_since_access / 7.0)
else:
access_decay = 0.25 # Never-accessed facts start quieter
# Boost for frequent access (logarithmic to prevent hub dominance)
access_count = fact.get('access_count', 0)
access_boost = 1.0 + (0.1 * math.log1p(access_count))
composite_score = (
base_score * 0.4 +
salience_score * 0.35 +
access_decay * access_boost * 0.25
)
return composite_score
def on_fact_retrieved(fact_id: int) -> None:
"""Update access tracking when a fact is retrieved."""
execute("""
UPDATE facts
SET last_accessed_at = ?,
access_count = access_count + 1
WHERE id = ?
""", (datetime.utcnow(), fact_id))阶段 6:Reflect 循环集成
更新 reflect 任务以优先处理最近访问的知识:
# In reflect job processor
def reflect_on_memories(agent_id: str, core_memory_max_tokens: int = 2048) -> None:
# Query recently-accessed facts weighted by salience
recent_facts = query("""
SELECT f.*,
COALESCE(f.salience, 0.5) *
(1.0 + 0.1 * LOG1P(COALESCE(f.access_count, 0))) as priority_score
FROM facts f
WHERE f.agent_id = ?
AND (
f.last_accessed_at > datetime('now', '-30 days')
OR f.salience > 0.8
)
ORDER BY priority_score DESC, f.last_accessed_at DESC
LIMIT 100
""", agent_id)
# Existing reflect logic operates on priority-filtered set
consolidated = consolidate_memories(recent_facts)
update_core_memory(consolidated, max_tokens=core_memory_max_tokens)🧪 验证
验证测试套件
执行以下命令以验证每个增强功能:
测试 1:架构迁移
$ sqlite3 ~/.openclaw/memory.db ".schema facts"
--- Expected output ---
CREATE TABLE facts (
...
salience REAL DEFAULT 0.5,
last_accessed_at DATETIME,
access_count INTEGER DEFAULT 0
);
$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM entity_cooccurrence;"
--- Expected output ---
> 0 (before population) or > 100 (after population with populated index)测试 2:显著性感知 Retain 和 Recall
# Retain with salience
$ openclaw memory retain "Sarah is leaving the company next month" \
--type B \
--entity Sarah \
--entity project \
--salience 0.95
--- Expected output ---
✓ Retained: B(s=0.95) @Sarah @project: Sarah is leaving...
# Verify in database
$ sqlite3 ~/.openclaw/memory.db \
"SELECT content, salience FROM facts WHERE content LIKE '%Sarah%';"
--- Expected output ---
Sarah is leaving the company next month|0.95测试 3:访问追踪
# Query a fact (simulated)
$ openclaw memory recall "heartbeat configuration"
# Verify access tracking updated
$ sqlite3 ~/.openclaw/memory.db \
"SELECT content, last_accessed_at, access_count FROM facts ORDER BY access_count DESC LIMIT 3;"
--- Expected output ---
Updated heartbeat interval from 5m to 30m.|2025-01-15 10:30:00|5
Increased worker pool size to 4.|2025-01-15 09:15:00|3
Rate limiting middleware added.|2025-01-14 14:22:00|1测试 4:关联遍历查询
# Query with associative depth
$ openclaw memory recall "app performance" \
--associative-depth 2 \
--min-salience 0.3
--- Expected output ---
RETRIEVED (associative, depth=2):
Direct matches:
- W(s=0.2) @config: Updated heartbeat interval from 5m to 30m.
2-hop connections:
- B(s=0.95) @Sarah @project: Sarah is leaving... (via @database → @slow-endpoint)
- W(s=0.3) @api: Rate limiting middleware added. (via @slow-endpoint)
# Verify traversal path in debug mode
$ openclaw memory recall "app performance" --associative-depth 2 --debug
--- Expected output ---
Traversal: performance → {database, slow-endpoint, api}
→ database → {Sarah, PostgreSQL, indexing}
→ Final activation: {Sarah: 0.42, indexing: 0.31, ...}测试 5:复合评分验证
$ python3 -c "
from openclaw.memory.scoring import compute_retrieval_score
import datetime
test_fact = {
'content': 'Sarah is leaving next month',
'salience': 0.95,
'last_accessed_at': datetime.datetime.now() - datetime.timedelta(days=2),
'access_count': 5
}
score = compute_retrieval_score(test_fact, query_entities=['personnel'])
print(f'Composite score: {score:.3f}')
print(f' - Salience contribution: {0.95 * 0.35:.3f}')
print(f' - Access decay (2 days): {0.5 ** (2/7) * 1.15 * 0.25:.3f}')
"
--- Expected output ---
Composite score: 0.573
- Salience contribution: 0.333
- Access decay (2 days): 0.240测试 6:Reflect 任务优先级
# Run reflect with debug output
$ openclaw memory reflect --agent-id test-agent --debug
--- Expected output ---
Processing 47 facts (filtered from 487 total by priority)
Top priority facts:
1. B(s=0.95) @Sarah @project: Sarah is leaving... (priority: 1.23)
2. B(s=0.9) @user @identity: User prefers morning standups... (priority: 1.19)
3. W(s=0.8) @Peter @deadline: Q1 deadline is March 15... (priority: 1.08)
Core memory updated: 1,847 tokens (was 2,103)⚠️ 常见陷阱
实现陷阱和环境特定注意事项
陷阱 1:没有 IDF 加权的枢纽节点占主导
**症状:**关联遍历返回的结果几乎相同,不管查询如何——高阶实体(Peter、config、system)主导所有路径。
**原因:**没有逆实体频率加权的原始共现计数。
**修复:**确保在所有共现查询中应用 entity_stats.idf_weight = 1 / log(entity_fact_count) 公式:
-- Wrong (hub dominance)
SELECT e.name FROM entities e
JOIN fact_entities fe ON e.id = fe.entity_id
WHERE fe.fact_id IN (
SELECT fact_id FROM fact_entities WHERE entity_id = ?
)
ORDER BY COUNT(*) DESC
-- Correct (IDF-weighted)
SELECT e.name FROM entities e
JOIN entity_stats es ON e.id = es.id
JOIN fact_entities fe ON e.id = fe.entity_id
WHERE fe.fact_id IN (
SELECT fact_id FROM fact_entities WHERE entity_id = ?
)
ORDER BY es.idf_weight * COUNT(*) DESC陷阱 2:混淆基于时间和基于访问的衰减
**症状:**旧但频繁访问的知识得分低;新鲜但从未访问的知识得分高。
**原因:**单独使用 last_accessed_at 年龄,而不是带增强的基于访问的衰减。
**规则:**基于访问的衰减(在检索时重置)优于基于时间的衰减。每周访问一次的 3 个月前的知识应该排名高于 1 天前但从未访问的知识:
# Wrong: Pure age decay
score = salience * (0.5 ** (age_in_days / 30))
# Correct: Access-based decay with boost
access_decay = 0.5 ** (days_since_last_access / 7) # Halves every 7 days
access_boost = 1.0 + (0.1 * log1p(access_count)) # Logarithmic, prevents hub dominance
score = salience * access_decay * access_boost陷阱 3:关联深度过深
**症状:**检索延迟超过 500ms;输出包含看似随机的知识。
**原因:**深度 > 3 且没有激活截止,洪水般淹没遍历。
**修复:**实现深度限制和最小激活阈值:
MAX_DEPTH = 3
MIN_ACTIVATION = 0.05
INITIAL_ACTIVATION = 1.0
DECAY_PER_HOP = 0.5
# Traversal stops when:
# - Depth limit reached, OR
# - No entities exceed MIN_ACTIVATION threshold陷阱 4:Retain 时显著性估计失败
**症状:**所有知识获得相似的显著性分数(0.4-0.6);差异化丢失。
**原因:**LLM 估计过于保守;默认为中间值。
**修复:**使用显式锚点实现基于提示的显著性估计:
SYSTEM_PROMPT = """
Estimate salience (0.0-1.0) for this memory:
- 0.9-1.0: Identity-defining, relationship-changing, career-affecting
- 0.7-0.9: Important project decisions, team changes, deadlines
- 0.4-0.7: Routine work, configurations, bug fixes
- 0.1-0.4: Minor preferences, temp states, easily reconstructed
Memory: {fact_content}
Respond ONLY with a number between 0.0 and 1.0.
"""始终允许通过 --salience CLI 标志或直接文件编辑进行人工覆盖。
陷阱 5:Docker/容器环境权限
**症状:**在 Docker 中运行时出现 sqlite3: unable to open database file。
**原因:**SQLite 数据库以错误权限或路径挂载到卷。
**修复:**确保卷挂载保留目录结构:
# Wrong
docker run -v /host/memory:/container/memory image
# Correct (bind mount the parent directory)
docker run -v /host/.openclaw:/root/.openclaw image
# Verify permissions
docker exec container ls -la /root/.openclaw/memory.db
# Should show: -rw-r--r-- 1 root root ...陷阱 6:Raspberry Pi 5 资源限制
**症状:**关联遍历在 ARM 设备上导致内存压力。
**原因:**用于激活追踪的 Python 字典 + 递归查询超出可用 RAM。
**修复:**限制遍历范围并使用游标迭代:
# Limit activation dict size
MAX_ACTIVATION_ENTITIES = 50
# Use generator for memory efficiency
def associative_traverse_stream(seed, depth, decay):
frontier = {seed: 1.0}
visited = {seed}
for _ in range(depth):
next_frontier = {}
for entity, activation in frontier.items():
if activation < MIN_ACTIVATION:
continue
for coentity in fetch_cooccurring(entity, limit=5):
if coentity not in visited:
next_frontier[coentity] = next_frontier.get(coentity, 0) + \
activation * decay
visited.add(coentity)
frontier = next_frontier
yield from frontier.items()🔗 相关错误
上下文相关问题和历史参考
相关设计文档
- Workspace Memory v2 Research Doc — 本指南扩展的基线架构。关键章节:"Entity-Aware Retrieval," "Incremental Indexing," "Reflect Loop"
- Hindsight × Letta Integration — 带置信度的类型化事实为显著性加权提供了底层基础
- CLS-M Prototype Analysis — 实证验证(132 个节点,802 条边,F1=44%)证明朴素传播激活存在精确性挑战
内存系统中的常见错误代码
| 错误代码 | 描述 | 相关问题 |
|---|---|---|
E2BIG | 组装的上下文超出令牌预算;reflect 任务无法压缩 | 显著性加权,访问衰减 |
ENOENTITY | 实体查询返回空但语义搜索找到结果 | 实体提取差距,FTS 回退 |
EDUPFACTS | 累积了近似重复的知识而没有整合 | Reflect 循环限制 |
EHUBNODES | 检索被高频实体(Peter、system、config)主导 | 缺少 IDF 加权 |
ECOLDSTART | 新部署没有足够的知识密度进行关联遍历 | 实体共现密度阈值 |
EDECAYTOOFAST | 基于时间的衰减过早擦除有用的旧记忆 | 基于访问与基于时间的衰减 |
CLS-M 的历史背景
CLS-M 原型识别出的失败模式为这些建议提供了依据:
- 45 查询基准上 F1=44% — 精确率(35%)是瓶颈,而不是召回率(65%)
- 枢纽噪声杀死:
heartbeat节点有 57 条边,在每次查询时吸收了 15% 的总激活 - 委托失败:子代理记忆提取持续失败;体验代理必须拥有保留权
- 分散太薄:跨 800+ 条边的激活将信号稀释到有用阈值以下
这些发现验证了渐进式方法:从 FTS5 开始,添加嵌入,然后在达到足够索引密度后才添加实体共现。
OpenClaw 版本兼容性
| 版本 | 必需功能 | 迁移路径 |
|---|---|---|
| v0.11.x | 基本事实存储,FTS5 | 应用阶段 1-2 迁移 |
| v0.12.0 | 实体提取,显著性字段 | 增量应用阶段 1-6 |
| v0.13.0(计划中) | 关联遍历,访问追踪 | 完整实现 |