May 05, 2026 β€’ Version: 2026.5.7

Heartbeat Death Loop β€” pendingFinalDelivery Stuck on Agent Main Session

When a heartbeat run returns any non-empty text to a session with origin.to set to the pseudo-target 'heartbeat', the main session enters a permanent pendingFinalDelivery:true state that blocks all future heartbeat executions indefinitely.

πŸ” Symptoms

Primary Manifestation

The OpenClaw gateway exhibits complete heartbeat silence despite clean startup logs. The scheduler fires on schedule, but no actual heartbeat runs execute. This can persist for days.

Diagnostic Command Output

Inspect the main session state directly from the sessions registry:

bash python3 -c " import json, time sessions_path = ‘/home/openclaw/.openclaw/agents//sessions/sessions.json’ with open(sessions_path) as f: sessions = json.load(f) main_session = sessions.get(‘agent::main’, {}) critical_fields = [ ‘pendingFinalDelivery’, ‘pendingFinalDeliveryAttemptCount’, ‘pendingFinalDeliveryLastError’, ‘updatedAt’, ‘origin’ ] for field in critical_fields: print(f’{field}: {main_session.get(field)}’)"

Expected stuck-state output: json pendingFinalDelivery: true pendingFinalDeliveryAttemptCount: 64 pendingFinalDeliveryLastError: null updatedAt: 1746702780000 origin: {“label”:“heartbeat”,“from”:“heartbeat”,“to”:“heartbeat”}

Note: pendingFinalDeliveryAttemptCount increments each heartbeat interval. A count > 1 with pendingFinalDeliveryLastError: null is the definitive signature.

Gateway Log Behavior

Boot-time log β€” appears clean

[gateway] heartbeat: started intervalMs: 3600000

Subsequent ticks β€” no run executes, no errors logged

[gateway] <silence for 64+ hours>

CLI Diagnostic Commands Return No Alerts

bash

These commands surface nothing about the stuck state

openclaw cron list openclaw doctor openclaw system heartbeat last

All return normal/empty output despite heartbeat deadlock

Triggering Condition

The bug activates when:

  1. An agent’s heartbeat fires with no configured delivery target (defaults to "none")
  2. The heartbeat run produces output β€” even the bare HEARTBEAT_OK token qualifies
  3. The session origin is set to origin.to = "heartbeat" (auto-reply pseudo-target)

After the first heartbeat run with output, the session accumulates the heartbeat origin, and subsequent heartbeats enter the retry loop.

🧠 Root Cause

Architectural Overview

The heartbeat system involves three distinct layers:

  1. Heartbeat Runner (heartbeat-runner-DpQCcYf2.js) β€” Schedules and executes heartbeat ticks
  2. Agent Runner Runtime (agent-runner.runtime-DQsCsHUA.js) β€” Produces heartbeat output and writes session state
  3. Dispatch System (dispatch-8E8vi2HV.js) β€” Routes output to delivery channels

Bug A β€” Pending-Delivery Flag Set on Effective-No-Output

In agent-runner.runtime-DQsCsHUA.js (lines 4093-4095):

javascript // Current implementation β€” non-empty pendingText always triggers if (pendingText) { session.pendingFinalDelivery = true; session.pendingFinalDeliveryText = pendingText; session.pendingFinalDeliveryCreatedAt = now; }

For heartbeat sessions, when the agent returns HEARTBEAT_OK, the pendingText is populated with this token. The runner has no special handling for heartbeat content β€” it treats HEARTBEAT_OK as a legitimate output requiring delivery confirmation.

The stripHeartbeatToken function in heartbeat-Dynyl6hI.js (lines 52-87) runs after the pending-delivery state is written to the session, not before. Therefore, even stripped-to-empty output still triggers the pending queue.

Bug B β€” Silent Retry Against Pseudo-Target

In dispatch-8E8vi2HV.js (lines 227-246), the success handler clearPendingFinalDeliveryAfterSuccess only clears the flag on success. There is no corresponding failure handler that captures the error into pendingFinalDeliveryLastError.

When delivery.to === "heartbeat":

javascript // dispatch attempts delivery to “heartbeat” pseudo-target // No channel adapter resolves this target // dispatch returns silently without error capture // pendingFinalDelivery stays true, updatedAt gets bumped to now

The retry path:

  1. Heartbeat fires β†’ pendingFinalDelivery is true
  2. Dispatch attempts delivery β†’ silent failure
  3. updatedAt = now on every attempt
  4. 30-second skip window check passes (updatedAt is now, not old)
  5. Next heartbeat interval fires β†’ same sequence repeats

The Compounding Effect

The skip window in runHeartbeatOnce (lines 866-870):

javascript if (recentSessionEntry?.pendingFinalDelivery === true && recentSessionEntry?.updatedAt && startedAt - recentSessionEntry.updatedAt < 3e4) { return SKIP_REQUESTS_IN_FLIGHT; }

This logic is correct in isolation β€” it prevents overlapping heartbeat runs. However, combined with dispatch failures that bump updatedAt = now on each silent failure, the condition evaluates as now - now < 30000 (always true), causing perpetual skips.

Why Fresh Sessions Don’t Deadlock

When a new session is created (no persisted state), origin is null and lastTo is null. The dispatch path has no delivery.to to route against, so it clears pending as “nothing to deliver.” The cosmetic pendingFinalDelivery: true remains but updatedAt is not bumped, breaking the retry loop.

πŸ› οΈ Step-by-Step Fix

Option 1: Workaround (Immediate β€” No Code Change)

Use when: You cannot restart the gateway or apply the code fix immediately.

Steps:

  1. Stop the gateway gracefully:

bash sudo systemctl stop openclaw-gateway

or

openclaw gateway stop

  1. Locate the agent's main session entry and associated files:

bash AGENT_ID="" SESSION_DIR="/home/openclaw/.openclaw/agents/${AGENT_ID}/sessions" MAIN_SESSION_KEY=“agent:${AGENT_ID}:main”

List related files

ls -la “${SESSION_DIR}/” | grep “${AGENT_ID}:main”

Expected: sessions.json, .jsonl, .trajectory.jsonl

  1. Remove the main session entry and files:

bash

Backup before modification

cp “${SESSION_DIR}/sessions.json” “${SESSION_DIR}/sessions.json.bak.$(date +%s)”

Use python to remove the main session entry

python3 -c " import json

agent_id = ‘’ session_path = f’/home/openclaw/.openclaw/agents/{agent_id}/sessions/sessions.json’

with open(session_path, ‘r’) as f: sessions = json.load(f)

main_key = f’agent:{agent_id}:main’ if main_key in sessions: print(f’Removing session: {main_key}’) del sessions[main_key] with open(session_path, ‘w’) as f: json.dump(sessions, f, indent=2) print(‘Session removed successfully’) else: print(f’Session {main_key} not found’) "

  1. Remove associated session files:

bash

Identify and remove .jsonl and .trajectory.jsonl files for the main session

cd “/home/openclaw/.openclaw/agents//sessions” rm -v *.jsonl *.trajectory.jsonl 2>/dev/null || true

  1. Restart the gateway:

bash sudo systemctl start openclaw-gateway

Verify startup

sudo journalctl -u openclaw-gateway -f –lines=50

After workaround: The gateway creates a fresh main session on the next heartbeat tick. The new session has origin: null, breaking the dispatch retry loop.


Option 2: Permanent Fix (Code Changes)

Fix A β€” Gate Pending-Delivery Write on Effectively Empty Heartbeat Content

File: agent-runner.runtime-DQsCsHUA.js
Location: Around line 4093-4095

Before: javascript if (pendingText) { session.pendingFinalDelivery = true; session.pendingFinalDeliveryText = pendingText; session.pendingFinalDeliveryCreatedAt = now; }

After: javascript // For heartbeat sessions, check if stripped output is effectively empty const isHeartbeat = session?.origin?.to === ‘heartbeat’; const strippedContent = isHeartbeat ? stripHeartbeatToken(pendingText).text : pendingText; const isEffectivelyEmpty = !strippedContent || strippedContent.trim() === ‘’;

if (pendingText && !isEffectivelyEmpty) { session.pendingFinalDelivery = true; session.pendingFinalDeliveryText = pendingText; session.pendingFinalDeliveryCreatedAt = now; }

Fix B-1 β€” Capture Dispatch Failures into pendingFinalDeliveryLastError

File: dispatch-8E8vi2HV.js
Location: After line 246 (after clearPendingFinalDeliveryAfterSuccess)

Add new function: javascript function recordPendingFinalDeliveryFailure(session, errorMessage) { session.pendingFinalDeliveryLastError = errorMessage || ‘Unknown delivery failure’; session.pendingFinalDeliveryLastErrorAt = Date.now(); saveSession(session); }

Call in dispatch failure path: javascript // In the delivery failure handler if (session.pendingFinalDelivery) { recordPendingFinalDeliveryFailure(session, Delivery failed: ${error?.message || 'No adapter resolved for target: ' + delivery?.to} ); }

Fix B-2 β€” Treat Pseudo-Target Heartbeat as Immediate Success

File: dispatch-8E8vi2HV.js
Location: Before attempting delivery (around line 227)

Add check: javascript // If delivery target is the heartbeat pseudo-channel and no adapter resolves, // treat as immediate success β€” the heartbeat acknowledges by reaching the target if (delivery.to === ‘heartbeat’) { clearPendingFinalDeliveryAfterSuccess(session); log.debug(‘Heartbeat delivery acknowledged (pseudo-target)’); return { deliverySucceeded: true }; }

Fix C β€” Harden openclaw doctor

File: doctor.js or diagnostics module
Add check for: javascript // Warn when pendingFinalDelivery is stuck with no error captured const ONE_HOUR_MS = 60 * 60 * 1000; if (session.pendingFinalDelivery === true && session.pendingFinalDeliveryLastError === null && session.pendingFinalDeliveryCreatedAt && (Date.now() - session.pendingFinalDeliveryCreatedAt) > ONE_HOUR_MS) { warnings.push({ severity: ‘HIGH’, code: ‘STUCK_PENDING_DELIVERY’, message: Session ${sessionKey} has pendingFinalDelivery stuck for >1h with no error captured, sessionKey }); }

πŸ§ͺ Verification

Immediate Verification (After Workaround)

bash

1. Confirm gateway is running

openclaw gateway status

Expected: “Gateway running” or similar

2. Check for new main session creation

sleep 5 python3 -c " import json, time with open(’/home/openclaw/.openclaw/agents//sessions/sessions.json’) as f: sessions = json.load(f) main = sessions.get(‘agent::main’, {}) print(‘Session exists:’, bool(main)) print(‘Origin:’, main.get(‘origin’)) print(‘pendingFinalDelivery:’, main.get(‘pendingFinalDelivery’)) "

Expected: origin is null, pendingFinalDelivery is false or null

3. Verify heartbeat fires within 1 minute

openclaw system heartbeat last

Expected: Recent heartbeat with HEARTBEAT_OK, no pending flags

Post-Fix Verification (After Code Changes)

bash

1. Build and restart with fixes

npm run build sudo systemctl restart openclaw-gateway

2. Monitor for 2+ heartbeat intervals (test with short interval first)

Set heartbeat to 2 minutes for testing:

openclaw config set heartbeat.every “2m”

3. Check session state after multiple heartbeat cycles

python3 -c " import json, time

with open(’/home/openclaw/.openclaw/agents//sessions/sessions.json’) as f: sessions = json.load(f)

main = sessions.get(‘agent::main’, {}) print(’=== Main Session State ===’) print(f’pendingFinalDelivery: {main.get("pendingFinalDelivery")}’) print(f’pendingFinalDeliveryAttemptCount: {main.get("pendingFinalDeliveryAttemptCount", 0)}’) print(f’pendingFinalDeliveryLastError: {main.get("pendingFinalDeliveryLastError")}’) print(f’updatedAt: {main.get("updatedAt")}’) print(f’origin.to: {main.get("origin", {}).get("to")}’) print() print(‘SUCCESS: No stuck pendingFinalDelivery’ if not main.get(‘pendingFinalDelivery’) else ‘WARNING: Still stuck’) "

Expected after fix: pendingFinalDelivery is false OR

(if true) pendingFinalDeliveryLastError contains error string

Stress Test β€” Force Heartbeat Output

bash

Create a temporary agent with heartbeat that outputs text

Then trigger heartbeat manually

Option A: Via CLI

openclaw system event –mode now –text “force heartbeat”
–url ws://127.0.0.1:18789
–token $OPENCLAW_GATEWAY_TOKEN

Option B: Wait for scheduled heartbeat

Check state immediately after

sleep 2 python3 -c " import json with open(’/home/openclaw/.openclaw/agents//sessions/sessions.json’) as f: main = json.load(f).get(‘agent::main’, {}) print(f’pendingFinalDelivery: {main.get("pendingFinalDelivery")}’) print(f’pendingFinalDeliveryAttemptCount: {main.get("pendingFinalDeliveryAttemptCount", 0)}’) print(f’pendingFinalDeliveryLastError: {main.get("pendingFinalDeliveryLastError")}’) "

Expected: If fix B-2 applied, pendingFinalDelivery should clear immediately

If only fix A applied, pendingFinalDelivery should not be set on heartbeat output

Doctor Command Verification

bash

After applying Fix C

openclaw doctor

Expected output should include warning if any session has:

pendingFinalDelivery: true AND now - pendingFinalDeliveryCreatedAt > 1h

AND pendingFinalDeliveryLastError === null

⚠️ Common Pitfalls

Pitfall 1 β€” Partial Session Cleanup

Problem: Removing only pendingFinalDelivery* fields without removing the session entry.

Why it fails: bash

This does NOT fix the issue

python3 -c " import json with open(‘sessions.json’) as f: sessions = json.load(f) main = sessions[‘agent::main’]

Only clearing flags β€” origin.to still ‘heartbeat’

main.pop(‘pendingFinalDelivery’, None) main.pop(‘pendingFinalDeliveryText’, None) main.pop(‘pendingFinalDeliveryAttemptCount’, None) main.pop(‘pendingFinalDeliveryCreatedAt’, None)

origin.to still “heartbeat” β€” dispatch will re-trigger immediately

"

Correct approach: Delete the entire session entry, not just the pending flags.


Pitfall 2 β€” Not Restarting Gateway Before Session Cleanup

Problem: Modifying sessions.json while the gateway is running.

Why it fails: The gateway maintains an in-memory copy of sessions. File-system changes are overwritten on next session save.

Correct approach: bash

Always stop gateway first

sudo systemctl stop openclaw-gateway

Then modify sessions.json

Then restart

sudo systemctl start openclaw-gateway


Pitfall 3 β€” Misidentifying the Affected Session

Problem: Looking for the wrong session key.

Details: The main session key format is agent:<agent-id>:main. If you have multiple agents or a non-standard installation, the path may differ.

Verification: bash

List all session keys

python3 -c " import json with open(’/home/openclaw/.openclaw/agents//sessions/sessions.json’) as f: sessions = json.load(f) for key in sessions.keys(): print(key) "

Look for patterns like:

agent:orchestrator:main

agent:reasoner:main

agent:your-agent:main


Pitfall 4 β€” Test Environment vs Production State

Problem: Testing fix in a fresh session (which doesn’t deadlock) and concluding the fix works for existing stuck sessions.

Details: New sessions have origin: null, so they don’t trigger the dispatch retry loop regardless of fix status. The fix validation must occur on existing sessions that have accumulated heartbeat origin.

Correct validation approach:

  1. Apply fixes to gateway
  2. Stop gateway
  3. Manually inject the stuck state into a fresh session: set `origin.to = "heartbeat"` and `pendingFinalDelivery = true`
  4. Restart gateway and observe behavior

Pitfall 5 β€” Docker/Container Environment Path Differences

Problem: Assuming paths based on non-container documentation.

Details: bash

In Docker container, paths may be:

- Environment variable based: $OPENCLAW_HOME

- Default: /app/.openclaw

Find the correct path

docker exec find / -name “sessions.json” 2>/dev/null | grep openclaw

Or inspect environment

docker exec env | grep -i claw


Pitfall 6 β€” Heartbeat Interval Too Long for Testing

Problem: Setting 60-minute heartbeat interval and not waiting to verify fix.

Details: After applying fixes, wait at least 2 full heartbeat intervals to confirm no new pending state accumulation.

Test interval configuration: bash

Use short interval for testing

openclaw config set heartbeat.every “2m”

After verification, restore production interval

openclaw config set heartbeat.every “60m”

  • #59710 β€” Heartbeat silently stops after ~20h Same underlying cause: session state corruption preventing heartbeat execution. This issue's diagnostic mechanism identifies the root cause that #59710 only observed symptomatically.
  • #78187 β€” Heartbeat polling silently stops after SIGUSR1 gateway restart Same symptom family: heartbeat scheduler running but no actual runs executing. Likely shares session state persistence issues.
  • #74257 β€” HEARTBEAT_OK/internal text leak Inverse symptom of the same path: heartbeat output leaking to delivery channels when it should be suppressed. Confirms heartbeat token handling is inconsistent.
  • #78532 (CLOSED 2026-05-07) β€” deliverySucceeded=true when no adapter invoked Sibling issue: same telemetry-vs-state mismatch family. Addressed success-side of dispatch; this issue addresses the failure-side.
  • #55882 β€” Agent can drop promised outputs after task switching Broader pending-deliverables queue durability issue. The heartbeat deadlock is a specific case of the general pending-delivery state machine bug.
  • #65498 (CLOSED) β€” Main-session user task can lose final reply after heartbeat or exec-completion interrupt Related fix area: session lifecycle management during concurrent heartbeat and user task execution.
  • `heartbeat: { every: "60m" }` with no `target`** The default `target: "none"` configuration combined with heartbeat creates the deadlock condition when heartbeat output is non-empty.
  • `pendingFinalDelivery: true` + `origin.to: "heartbeat"`** The dangerous combination: pending flag set against a pseudo-target that no adapter resolves.
  • `pendingFinalDeliveryAttemptCount` climbing without `pendingFinalDeliveryLastError`** Diagnostic signature for silent dispatch failures across the codebase.

Error Code Reference

CodeDescriptionConnection
STUCK_PENDING_DELIVERYpendingFinalDelivery stuck >1h with null errorThis issue’s proposed doctor check
HEARTBEAT_DEATH_LOOPSession blocks heartbeat indefinitelyPrimary symptom
DISPATCH_NO_ADAPTERNo channel adapter resolves targetRoot cause (Bug B)
DELIVERY_SILENT_FAILUREDelivery fails without error captureRoot cause (Bug B-1)

Evidence & Sources

This troubleshooting guide was automatically synthesized by the FixClaw Intelligence Pipeline from community discussions.