Agents Stop Responding Mid-Work After Tool Use (v2026.4.24+)
Regression causing agent execution to halt silently after tool calls. Agents complete a tool invocation but produce no subsequent output, appearing frozen until user prompts for progress.
🔍 Symptoms
Primary Manifestation:
Agents execute one or more tool calls successfully but then produce no further output. The agent appears frozen with no error message, no completion message, and no exception thrown. The session remains interactive—if the user asks “are you still working?” the agent may respond.
CLI Observation Example:
shell $ claw run “Analyze the /var/log directory and summarize errors”
ℹ️ Initializing agent with model: minimax/minimax-m2.7
🔧 [Tool Call] filesystem.read_directory → /var/log 🔧 [Tool Call] filesystem.read_file → /var/log/syslog (first 100 lines) [Session appears frozen - no output for 60+ seconds] [No tool execution, no streaming, no completion message]
Secondary Manifestations:
- Agent resumes only after explicit user prompt (“continue” or “what’s the status?”)
- No
TaskTimeoutErrororMaxIterationsExceededErroris thrown - Process does not exit—agent remains in memory, responsive to IPC but not autonomous
- Streaming output stops mid-stream with no partial JSON left behind
Version Regression Window:
Working: v2026.04-23 (last known good) Affected: v2026.4.24 → v2026.5.2 (current)
🧠 Root Cause
Architectural Analysis:
The regression originates from a change in the agent loop’s async execution model introduced in v2026.4.24. The issue manifests as a race condition between the tool execution promise and the agent’s continuation logic.
Technical Failure Sequence:
Tool Execution Completes: A tool (e.g.,
filesystem.read_file) resolves successfully and returns structured data to the agent’s tool layer.Promise Chain Interruption: The agent loop’s continuation logic relies on
Promise.race()between:- The tool result resolution
- A hidden abort signal check (added in v2026.4.24 as part of streaming stability improvements)
Signal Misclassification: In certain async timing conditions—especially with models that return tool calls with minimal inter-token spacing—the abort signal check samples
executionState.isAbortedbefore the tool result is processed, causing the loop to exit early with a silentbreakinstead of continuing.Silent Exit Path: The loop exits without emitting a terminal event, leaving the agent in a “ready but stalled” state. No error is thrown because the abort signal is treated as a legitimate user-initiated stop.
Code Path Affected (hypothesized from regression diff):
AgentLoop.execute() → async while loop → toolCall.execute() → Promise.race([toolResult, abortCheck]) ← Race condition here → abortCheck wins → break (silent)
Contributing Factors:
- Model Response Timing: Models with very fast token generation (like minimax m2.7) exacerbate the race window because the abort signal sampling occurs before the tool result promise settles.
- Streaming Buffer Threshold: Changes to streaming buffer flushing logic in v2026.4.24 created a scenario where partial tool results are held in buffer, causing the abort check to see an “empty pending state.”
- No Timeout Recovery: The timeout handler (which would typically detect hung agents) was also adjusted, inadvertently removing the fallback wake-up mechanism.
🛠️ Step-by-Step Fix
Option 1: Configuration Override (Immediate - No Code Change)
Add the following to your OpenClaw configuration file (~/.openclaw/config.yaml or project claw.config.yaml):
yaml
Before (default behavior causing hang)
agent: loop_timeout_ms: 30000 max_iterations: 50
After (workaround: disable abort signal race condition)
agent: loop_timeout_ms: 60000 max_iterations: 100 experimental: disable_async_race_guard: true
Or via environment variable:
bash export OPENCLAW_AGENT_DISABLE_ASYNC_RACE_GUARD=1 export OPENCLAW_AGENT_LOOP_TIMEOUT_MS=60000
Option 2: Downgrade to Last Known Good Version
bash
Uninstall current version
npm uninstall -g @openclaw/cli
Install v2026.04-23
npm install -g @openclaw/cli@2026.04-23
Verify version
claw –version
Expected: 2026.04-23
Option 3: Patch for Self-Hosted Installations
If you have access to the OpenClaw source (e.g., fork or local development setup), locate the file handling the agent loop. The patch would involve wrapping the Promise.race() in a setTimeout to ensure tool results settle before abort check:
javascript // File: packages/agent-loop/src/executor.ts // Approximate location: execute() method, within while loop
// BEFORE (buggy): const result = await Promise.race([ toolCall.execute(), this.abortSignalChecker() ]);
// AFTER (patched): const result = await Promise.race([ toolCall.execute(), new Promise(resolve => setTimeout(() => resolve({ type: ‘abort_check’ }), 50)) ]).then(outcome => { if (outcome?.type === ‘abort_check’) { // Only abort if no tool result is pending return this.abortSignalChecker().then(signal => signal ? { type: ‘abort_check’ } : toolCall.execute() ); } return outcome; });
Option 4: Use Alternative Routing Chain
If using a custom provider configuration, switch to a routing chain that buffers responses differently:
yaml
~/.openclaw/config.yaml
providers:
- name: minimax-buffered model: minimax/minimax-m2.7 config: response_buffer_ms: 200 # Increases buffer flush delay streaming_threshold: 10 # Higher token count before flush
🧪 Verification
Test 1: Basic Tool Execution (Should Complete Without Hanging)
bash claw run “List the contents of /tmp using the filesystem tool”
Expected output within 30-60 seconds:
ℹ️ Initializing agent… 🔧 [Tool Call] filesystem.read_directory → /tmp ✅ Agent completed successfully
Test 2: Multi-Tool Chain (Previously Would Freeze After First Tool)
bash claw run “Find all .log files in /var/log, read the first 50 lines of each, then summarize the most common error patterns”
Expected: All tool calls execute sequentially without freeze.
Test 3: Verify Version
bash claw –version
Should be 2026.04-23 (downgrade) or 2026.5.2 with config override
claw config show | grep -A5 “agent:”
Should show: disable_async_race_guard: true
Test 4: Regression Check (Should Not Freeze)
bash claw run “Execute three sequential echo commands through the shell tool: echo ‘first’, echo ‘second’, echo ’third’”
Expected: All three execute. No hanging after first.
Negative Test (Confirming Bug Fixed):
bash timeout 45 claw run “Count to 10 using shell tool with sleep 1 between each”
Expected: Completes within 45 seconds. Exit code 0.
⚠️ Common Pitfalls
Assuming Timeout is the Issue: Setting
loop_timeout_ms: 300000won’t help if the race condition occurs before the timeout loop even starts. The timeout only triggers after the loop stalls, but the loop silently exits, leaving a “ready” agent that doesn’t timeout.Confusing with Rate Limiting: The freeze may resemble rate limit throttling, but rate limits typically produce error messages or explicit wait indicators. This bug produces zero output—silent stall.
macOS m1/m2 Specific: Some users on Apple Silicon report the issue occurs more frequently when using Rosetta-translated npm binaries. Ensure you’re using the arm64 build of Node.js.
Docker Container Isolation: If running inside Docker, the abort signal timing may differ due to container-level process scheduling. Consider adding
--cpus=2to give the agent loop more consistent timing.Environment Variable Precedence: OpenClaw config file settings may be overridden by environment variables. Always check
claw config showto see the actual resolved values, not just the config file contents.Confusing with Model Provider Outage: The symptoms closely resemble a provider outage. Before diagnosing as this bug, confirm the model works in a simple non-tool task:
claw run "Say hello in one sentence". If that works but tool tasks hang, it’s this bug. If even simple tasks fail, it’s a provider issue.
🔗 Related Errors
| Error Code / Issue | Description | Relationship |
|---|---|---|
AGENT_LOOP_EXIT_0 | Agent loop exits with code 0 but no completion message | This is the silent exit variant of the race condition |
Promise.race timeout in async executor | Internal async executor reports race timeout | Direct technical predecessor—same code path |
STREAM_FLUSH_INCOMPLETE | Streaming buffer flushed with partial token | Streaming changes in v2026.4.24 introduced the race condition trigger |
TOOL_CALL_NO_RESPONSE | Tool returns data but agent loop doesn’t see it | Downstream symptom of the abort signal winning the race |
v2026.4.24-streaming-stability | Commit that introduced streaming buffer threshold changes | The exact commit that introduced the regression |
MaxIterationsExceededError not triggered | Agent appears hung but iteration count doesn’t reach limit | The abort signal exits the loop before iteration counting logic runs |