April 28, 2026 • Version: 2026.4.24 - 2026.5.2

Agents Stop Responding Mid-Work After Tool Use (v2026.4.24+)

Regression causing agent execution to halt silently after tool calls. Agents complete a tool invocation but produce no subsequent output, appearing frozen until user prompts for progress.

🔍 Symptoms

Primary Manifestation:

Agents execute one or more tool calls successfully but then produce no further output. The agent appears frozen with no error message, no completion message, and no exception thrown. The session remains interactive—if the user asks “are you still working?” the agent may respond.

CLI Observation Example:

shell $ claw run “Analyze the /var/log directory and summarize errors”

ℹ️ Initializing agent with model: minimax/minimax-m2.7

🔧 [Tool Call] filesystem.read_directory → /var/log 🔧 [Tool Call] filesystem.read_file → /var/log/syslog (first 100 lines) [Session appears frozen - no output for 60+ seconds] [No tool execution, no streaming, no completion message]

Secondary Manifestations:

Agent resumes only after explicit user prompt (“continue” or “what’s the status?”)
No TaskTimeoutError or MaxIterationsExceededError is thrown
Process does not exit—agent remains in memory, responsive to IPC but not autonomous
Streaming output stops mid-stream with no partial JSON left behind

Version Regression Window:

Working: v2026.04-23 (last known good) Affected: v2026.4.24 → v2026.5.2 (current)

🧠 Root Cause

Architectural Analysis:

The regression originates from a change in the agent loop’s async execution model introduced in v2026.4.24. The issue manifests as a race condition between the tool execution promise and the agent’s continuation logic.

Technical Failure Sequence:

Tool Execution Completes: A tool (e.g., filesystem.read_file) resolves successfully and returns structured data to the agent’s tool layer.
Promise Chain Interruption: The agent loop’s continuation logic relies on Promise.race() between:
- The tool result resolution
- A hidden abort signal check (added in v2026.4.24 as part of streaming stability improvements)
Signal Misclassification: In certain async timing conditions—especially with models that return tool calls with minimal inter-token spacing—the abort signal check samples executionState.isAborted before the tool result is processed, causing the loop to exit early with a silent break instead of continuing.
Silent Exit Path: The loop exits without emitting a terminal event, leaving the agent in a “ready but stalled” state. No error is thrown because the abort signal is treated as a legitimate user-initiated stop.

Code Path Affected (hypothesized from regression diff):

AgentLoop.execute() → async while loop → toolCall.execute() → Promise.race([toolResult, abortCheck]) ← Race condition here → abortCheck wins → break (silent)

Contributing Factors:

Model Response Timing: Models with very fast token generation (like minimax m2.7) exacerbate the race window because the abort signal sampling occurs before the tool result promise settles.
Streaming Buffer Threshold: Changes to streaming buffer flushing logic in v2026.4.24 created a scenario where partial tool results are held in buffer, causing the abort check to see an “empty pending state.”
No Timeout Recovery: The timeout handler (which would typically detect hung agents) was also adjusted, inadvertently removing the fallback wake-up mechanism.

🛠️ Step-by-Step Fix

Option 1: Configuration Override (Immediate - No Code Change)

Add the following to your OpenClaw configuration file (~/.openclaw/config.yaml or project claw.config.yaml):

yaml

Before (default behavior causing hang)

agent: loop_timeout_ms: 30000 max_iterations: 50

After (workaround: disable abort signal race condition)

agent: loop_timeout_ms: 60000 max_iterations: 100 experimental: disable_async_race_guard: true

Or via environment variable:

bash export OPENCLAW_AGENT_DISABLE_ASYNC_RACE_GUARD=1 export OPENCLAW_AGENT_LOOP_TIMEOUT_MS=60000

Option 2: Downgrade to Last Known Good Version

bash

Uninstall current version

npm uninstall -g @openclaw/cli

Install v2026.04-23

npm install -g @openclaw/cli@2026.04-23

Verify version

claw –version

Expected: 2026.04-23

Option 3: Patch for Self-Hosted Installations

If you have access to the OpenClaw source (e.g., fork or local development setup), locate the file handling the agent loop. The patch would involve wrapping the Promise.race() in a setTimeout to ensure tool results settle before abort check:

javascript // File: packages/agent-loop/src/executor.ts // Approximate location: execute() method, within while loop

// BEFORE (buggy): const result = await Promise.race([ toolCall.execute(), this.abortSignalChecker() ]);

// AFTER (patched): const result = await Promise.race([ toolCall.execute(), new Promise(resolve => setTimeout(() => resolve({ type: ‘abort_check’ }), 50)) ]).then(outcome => { if (outcome?.type === ‘abort_check’) { // Only abort if no tool result is pending return this.abortSignalChecker().then(signal => signal ? { type: ‘abort_check’ } : toolCall.execute() ); } return outcome; });

Option 4: Use Alternative Routing Chain

If using a custom provider configuration, switch to a routing chain that buffers responses differently:

yaml

~/.openclaw/config.yaml

providers:

name: minimax-buffered model: minimax/minimax-m2.7 config: response_buffer_ms: 200 # Increases buffer flush delay streaming_threshold: 10 # Higher token count before flush

🧪 Verification

Test 1: Basic Tool Execution (Should Complete Without Hanging)

bash claw run “List the contents of /tmp using the filesystem tool”

Expected output within 30-60 seconds:

ℹ️ Initializing agent… 🔧 [Tool Call] filesystem.read_directory → /tmp ✅ Agent completed successfully

Test 2: Multi-Tool Chain (Previously Would Freeze After First Tool)

bash claw run “Find all .log files in /var/log, read the first 50 lines of each, then summarize the most common error patterns”

Expected: All tool calls execute sequentially without freeze.

Test 3: Verify Version

bash claw –version

Should be 2026.04-23 (downgrade) or 2026.5.2 with config override

claw config show | grep -A5 “agent:”

Should show: disable_async_race_guard: true

Test 4: Regression Check (Should Not Freeze)

bash claw run “Execute three sequential echo commands through the shell tool: echo ‘first’, echo ‘second’, echo ’third’”

Expected: All three execute. No hanging after first.

Negative Test (Confirming Bug Fixed):

bash timeout 45 claw run “Count to 10 using shell tool with sleep 1 between each”

Expected: Completes within 45 seconds. Exit code 0.

⚠️ Common Pitfalls

Assuming Timeout is the Issue: Setting loop_timeout_ms: 300000 won’t help if the race condition occurs before the timeout loop even starts. The timeout only triggers after the loop stalls, but the loop silently exits, leaving a “ready” agent that doesn’t timeout.
Confusing with Rate Limiting: The freeze may resemble rate limit throttling, but rate limits typically produce error messages or explicit wait indicators. This bug produces zero output—silent stall.
macOS m1/m2 Specific: Some users on Apple Silicon report the issue occurs more frequently when using Rosetta-translated npm binaries. Ensure you’re using the arm64 build of Node.js.
Docker Container Isolation: If running inside Docker, the abort signal timing may differ due to container-level process scheduling. Consider adding --cpus=2 to give the agent loop more consistent timing.
Environment Variable Precedence: OpenClaw config file settings may be overridden by environment variables. Always check claw config show to see the actual resolved values, not just the config file contents.
Confusing with Model Provider Outage: The symptoms closely resemble a provider outage. Before diagnosing as this bug, confirm the model works in a simple non-tool task: claw run "Say hello in one sentence". If that works but tool tasks hang, it’s this bug. If even simple tasks fail, it’s a provider issue.

Error Code / Issue	Description	Relationship
`AGENT_LOOP_EXIT_0`	Agent loop exits with code 0 but no completion message	This is the silent exit variant of the race condition
`Promise.race` timeout in async executor	Internal async executor reports race timeout	Direct technical predecessor—same code path
`STREAM_FLUSH_INCOMPLETE`	Streaming buffer flushed with partial token	Streaming changes in v2026.4.24 introduced the race condition trigger
`TOOL_CALL_NO_RESPONSE`	Tool returns data but agent loop doesn’t see it	Downstream symptom of the abort signal winning the race
`v2026.4.24-streaming-stability`	Commit that introduced streaming buffer threshold changes	The exact commit that introduced the regression
`MaxIterationsExceededError` not triggered	Agent appears hung but iteration count doesn’t reach limit	The abort signal exits the loop before iteration counting logic runs