Token Waste from Accumulated Thinking Blocks in Conversation History
Extended thinking blocks accumulate across conversation turns, causing 100-400K token waste per session and triggering API errors on replay. A new config option controls stripping behavior.
π Symptoms
Observable Token Bloat
Long-running sessions show progressively increasing token counts per API request. For a 20-turn conversation:
Turn 1 request: ~8,000 tokens (1 thinking block)
Turn 5 request: ~32,000 tokens (4 thinking blocks)
Turn 10 request: ~80,000 tokens (9 thinking blocks)
Turn 20 request: ~300,000 tokens (19 thinking blocks)
API Errors on Conversation Replay
When replaying sessions that contained thinking blocks, intermittent failures appear:
Error: Message content invalid - thinking block corruption detected
at validateMessageContent (src/agents/message-validator.ts:142)
at parseAssistantMessage (src/agents/message-parser.ts:89)
Error: Conversation truncated - redundant thinking block sequence
at truncateConversation (src/agents/history-manager.ts:203)
Error: API rejected request - content length exceeded
at sendToGateway (src/network/gateway-client.ts:456)
Status: 413 Payload Too Large
Memory Pressure on Extended Sessions
Sessions with 50+ turns may hit memory limits due to accumulated thinking block data:
WARN: Session memory at 847MB / 1024MB limit
WARN: Conversation history contains 47 thinking blocks (~420MB)
Diagnostic Command Output
$ openclaw analyze-session --session-id abc123
Session Analysis: abc123
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Turns: 24
Thinking blocks: 23
Total thinking tokens: ~312,400
Effective context tokens: ~45,000
Token waste ratio: 87%
Recommendation: Enable thinkingHistory: "last-only"
π§ Root Cause
Architecture Overview
When extended thinking is enabled, the Claude API returns assistant messages containing <thinking> blocks that are stored verbatim in conversation history: