April 22, 2026 • 版本: 2026.4.9

[模型回退重试时原始用户提示被丢弃] - resolveFallbackRetryPrompt Discards Original User Prompt on Model Fallback Retry

当模型调用失败并触发回退重试时，resolveFallbackRetryPrompt() 函数会用通用消息替换整个原始用户提示，导致智能体丢失任务上下文。

🔍 症状

可观察行为

当模型调用失败或超时且 OpenClaw 触发回退重试时，代理仅收到一条通用消息，而不是原始任务指令。这表现为两种不同的方式：

1. 会话日志证据

会话日志显示一条没有发送者元数据的用户消息，该消息替换了原始任务：

// Session log excerpt (session ec19b29b)
Line 25: { customType: "model-snapshot", provider: "xiaomi", model: "mimo-v2-pro" }
Line 26: { role: "user", content: "Continue where you left off. The previous model attempt failed or timed out.", sender: null }
Line 27: { role: "assistant", model: "mimo-v2-pro" }  // fallback retry response

2. 子代理场景中的上下文丢失

当使用特定任务指令生成子代理时（例如 [Subagent Task]: RECORD時系列整列），原始任务被完全丢弃：

// Original prompt sent to subagent
"[Subagent Task]: RECORD時系列整列"

// Prompt received after fallback retry
"Continue where you left off. The previous model attempt failed or timed out."

代理必须仅从会话历史中推断任务，这可能导致：

基于不完整上下文的不正确操作选择
无法完成预期任务
可能生成意外输出

3. 诊断指标

可以通过检查 resolveFallbackRetryPrompt() 函数的行为来识别此问题：

// Current implementation (dist/agent-command-*.js)
function resolveFallbackRetryPrompt(params) {
    if (!params.isFallbackRetry) return params.body;
    if (!params.sessionHasHistory) return params.body;
    return "Continue where you left off. The previous model attempt failed or timed out.";
}

🧠 根因分析

技术分析

根本原因在于 resolveFallbackRetryPrompt() 的条件逻辑，它使用早期返回替换策略而不是拼接策略。

故障序列：

代理发送包含原始用户提示的 `params.body` 到模型
模型调用失败或超时（例如 xiaomi/mimo-v2-pro）
OpenClaw 设置 `params.isFallbackRetry = true`
OpenClaw 检查 `params.sessionHasHistory = true`（会话包含先前消息）

函数执行替换路径：

return "Continue where you left off. The previous model attempt failed or timed out.";

原始 `params.body` 被完全丢弃

架构不一致：

该函数混淆了两个不同的需求：

需求 A：通知模型发生了重试
需求 B：保留原始任务指令

当前实现满足需求 A，同时完全违反需求 B。

代码路径分析：

javascript // In runAgentAttempt(): const effectivePrompt = resolveFallbackRetryPrompt({ body: params.body, isFallbackRetry: params.isFallbackRetry, sessionHasHistory: params.sessionHasHistory });

当 isFallbackRetry 为 true 且 sessionHasHistory 为 true 时，函数返回一个固定字符串，而不是与 params.body 组合。

影响链：

Original Prompt (params.body)
    ↓
Model Call Fails
    ↓
isFallbackRetry = true
    ↓
sessionHasHistory = true
    ↓
resolveFallbackRetryPrompt() returns FIXED_STRING
    ↓
effectivePrompt = FIXED_STRING  ← Original task LOST
    ↓
Fallback model receives no task context

🛠️ 逐步修复

解决策略

修改 resolveFallbackRetryPrompt() 以将重试通知前置到原始提示，而不是替换它。

代码变更

文件： dist/agent-command-*.js（因版本而异）

修改前：

function resolveFallbackRetryPrompt(params) {
    if (!params.isFallbackRetry) return params.body;
    if (!params.sessionHasHistory) return params.body;
    return "Continue where you left off. The previous model attempt failed or timed out.";
}

修改后：

function resolveFallbackRetryPrompt(params) {
    if (!params.isFallbackRetry) return params.body;
    if (!params.sessionHasHistory) return params.body;
    return "[System: Previous model attempt failed or timed out. Continuing from where you left off.]\n\n" + params.body;
}

替代实现（更详细）

对于需要更清晰分离的环境：

function resolveFallbackRetryPrompt(params) {
    if (!params.isFallbackRetry) return params.body;
    if (!params.sessionHasHistory) return params.body;
    
    const retryNotice = "[System: Previous model attempt failed or timed out. Continuing from where you left off.]\n\n";
    const originalPrompt = params.body;
    
    return retryNotice + originalPrompt;
}

部署步骤

定位受影响的文件：

find /path/to/openclaw -name "agent-command-*.js" -type f

备份原始文件：

cp /path/to/agent-command-*.js /path/to/agent-command-*.js.bak

使用 sed 应用修复：

sed -i 's/return "Continue where you left off. The previous model attempt failed or timed out.";/return "[System: Previous model attempt failed or timed out. Continuing from where you left off.]\\n\\n" + params.body;/g' /path/to/agent-command-*.js

验证变更：

grep -A3 "function resolveFallbackRetryPrompt" /path/to/agent-command-*.js

重启 OpenClaw 服务：
```
sudo systemctl restart openclaw
```

🧪 验证

验证方法

要确认修复，模拟模型故障并验证回退重试提示包含原始任务。

测试步骤

1. 启用调试日志：

export OPENCLAW_LOG_LEVEL=debug
export DEBUG=openclaw:agent:*

2. 触发回退场景：

创建带有故意失败模型的测试代理配置：

// test-fallback-prompt.json
{
  "agent": {
    "model": "intentionally-invalid-model-for-testing",
    "fallbackModel": "gpt-4o-mini",
    "prompt": "[Test Task]: Identify the color of the sky"
  }
}

3. 执行代理：

openclaw run --config test-fallback-prompt.json --session test-fallback-$(date +%s)

4. 检查会话日志：

openclaw session log --session-id <session-id> --format json | jq '.messages[] | select(.role == "user") | {content, sender, metadata}'

5. 验证预期输出：

修复前，输出显示：

{
  "content": "Continue where you left off. The previous model attempt failed or timed out.",
  "sender": null,
  "metadata": {}
}

修复后，输出应显示：

{
  "content": "[System: Previous model attempt failed or timed out. Continuing from where you left off.]\n\n[Test Task]: Identify the color of the sky",
  "sender": "system",
  "metadata": {
    "isFallbackRetry": true
  }
}

自动化验证脚本

#!/bin/bash
SESSION_ID=$(openclaw session list --limit 1 --format json | jq -r '.[0].id')
FALLBACK_USER_MSG=$(openclaw session log --session-id "$SESSION_ID" --format json | jq -r '.messages[] | select(.role == "user" and .sender == null) | .content')

if echo "$FALLBACK_USER_MSG" | grep -q "\[System: Previous model attempt"; then
    if echo "$FALLBACK_USER_MSG" | grep -q "\[Test Task\]"; then
        echo "✅ VERIFIED: Original prompt preserved in fallback retry"
        exit 0
    else
        echo "❌ FAILED: System message present but original prompt missing"
        exit 1
    fi
else
    echo "❌ FAILED: Generic message still being used (fix not applied)"
    exit 1
fi

⚠️ 常见陷阱

边缘情况和环境特定陷阱

1. 空提示主体（params.body 为空字符串）

如果 params.body 是空字符串，修复后的实现将产生仅包含系统通知的提示：

// Result when params.body = ""
"[System: Previous model attempt failed...]\n\n"
// ← Empty original task (may be valid if session history is sufficient)

缓解措施： 在拼接前验证 params.body 非空：

function resolveFallbackRetryPrompt(params) {
    if (!params.isFallbackRetry) return params.body;
    if (!params.sessionHasHistory) return params.body;
    
    const retryNotice = "[System: Previous model attempt failed or timed out. Continuing from where you left off.]\n\n";
    return params.body ? retryNotice + params.body : retryNotice.trim();
}

2. 非常长的原始提示

与重试通知拼接的长提示可能超过模型上下文限制。

当原始提示接近模型限制时监控令牌使用情况
考虑在 `params.body` 超过阈值时进行截断（例如 4000 个令牌）

3. 原始提示中的非 ASCII 字符

原始提示中的日文字符（如报告的问题中 RECORD時系列整列）必须正确保留：

// Verify encoding is maintained
const testPrompt = "[Subagent Task]: RECORD時系列整列";
const fixed = "[System: ...]\n\n" + testPrompt;
console.log(fixed.includes("RECORD時系列整列")); // Must be true

4. Docker 容器缓存

如果 OpenClaw 在 Docker 中运行，缓存的 JavaScript 文件可能会保留：

# Rebuild container to ensure new code is deployed
docker-compose down
docker-compose build --no-cache openclaw
docker-compose up -d

5. 多次连续回退重试

如果模型连续多次失败，每次重试都可能前置额外的系统通知：

// After 3 retries, prompt becomes:
"[System: ...]\n\n[System: ...]\n\n[System: ...]\n\n[Test Task]: ..."
// ← Duplicate notices accumulate

缓解措施： 在前置前检查原始提示是否已包含重试通知：

function resolveFallbackRetryPrompt(params) {
    if (!params.isFallbackRetry) return params.body;
    if (!params.sessionHasHistory) return params.body;
    if (params.body.includes("[System: Previous model attempt")) return params.body;
    
    return "[System: Previous model attempt failed...]\n\n" + params.body;
}

6. 版本兼容性

函数签名或调用位置可能在不同版本之间发生变化：

版本	文件模式	状态
2026.4.9	`agent-command-8TL7BESJ.js`	受影响
2026.4.11	`agent-command-BUw17dbz.js`	受影响

始终验证已部署版本中的确切函数位置和参数名称。

🔗 相关错误

上下文关联的问题

以下错误和历史问题与回退重试提示行为相关：

1. 模型超时错误

E_TIMEOUT: 模型调用超过时间限制
E_MODEL_UNAVAILABLE: 模型端点不可达
E_RATE_LIMIT: 回退尝试期间超出 API 速率限制

这些错误触发 isFallbackRetry 标志，激活有问题的代码路径。

2. 上下文窗口错误

E_CONTEXT_LENGTH: 组合提示 + 历史记录超出模型上下文限制
如果在修复后重试通知 + 原始提示 + 历史记录超过限制，可能会发生

3. 历史问题

问题 ID	描述	状态
GH-XXXX	初始报告：子代理在重试时丢失任务上下文	待处理
GH-YYYY	会话日志显示带有 null 发送者元数据的用户消息	相关

4. 相关配置参数

// These parameters control the fallback behavior
interface FallbackConfig {
    enabled: boolean;           // Enable/disable fallback retry
    maxRetries: number;         // Maximum retry attempts
    retryDelay: number;         // Delay between retries (ms)
    retryModels: string[];      // List of fallback models to try
    preserveOriginalPrompt: boolean;  // NEW: Flag to preserve original prompt
}

5. 代码库中的类似模式

可能表现出类似替换与拼接问题的其他函数：

`resolveSystemPrompt()` - 可能覆盖系统指令
`injectContextSummary()` - 可能替换而不是追加上下文
`formatHistoryForModel()` - 可能截断历史而不是总结

6. 监控建议

为回退重试场景实施遥测：

// Suggested metrics to track
metrics.increment('openclaw.fallback.retry.count');
metrics.gauge('openclaw.fallback.prompt.length', effectivePrompt.length);
metrics.histogram('openclaw.fallback.prompt.original_length', params.body.length);