April 29, 2026

Token Waste from Accumulated Thinking Blocks in Conversation History

Extended thinking blocks accumulate across conversation turns, causing 100-400K token waste per session and triggering API errors on replay. A new config option controls stripping behavior.

🔍 Symptoms

Observable Token Bloat

Long-running sessions show progressively increasing token counts per API request. For a 20-turn conversation:

Turn 1 request:  ~8,000 tokens (1 thinking block)
Turn 5 request:  ~32,000 tokens (4 thinking blocks)
Turn 10 request: ~80,000 tokens (9 thinking blocks)
Turn 20 request: ~300,000 tokens (19 thinking blocks)

API Errors on Conversation Replay

When replaying sessions that contained thinking blocks, intermittent failures appear:

Error: Message content invalid - thinking block corruption detected
    at validateMessageContent (src/agents/message-validator.ts:142)
    at parseAssistantMessage (src/agents/message-parser.ts:89)

Error: Conversation truncated - redundant thinking block sequence
    at truncateConversation (src/agents/history-manager.ts:203)

Error: API rejected request - content length exceeded
    at sendToGateway (src/network/gateway-client.ts:456)
    Status: 413 Payload Too Large

Memory Pressure on Extended Sessions

Sessions with 50+ turns may hit memory limits due to accumulated thinking block data:

WARN: Session memory at 847MB / 1024MB limit
WARN: Conversation history contains 47 thinking blocks (~420MB)

Diagnostic Command Output

$ openclaw analyze-session --session-id abc123

Session Analysis: abc123
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Turns: 24
Thinking blocks: 23
Total thinking tokens: ~312,400
Effective context tokens: ~45,000
Token waste ratio: 87%

Recommendation: Enable thinkingHistory: "last-only"

🧠 Root Cause

Architecture Overview

When extended thinking is enabled, the Claude API returns assistant messages containing <thinking> blocks that are stored verbatim in conversation history: