April 23, 2026 β€’ Version: 2026.5.10-beta.1, 2026.5.10-beta.2

openclaw doctor Clobbers systemd User Unit Customizations on Update

The `openclaw doctor` command regenerates the systemd user unit file wholesale, overwriting custom TimeoutStartSec, ExecStart paths, and EnvironmentFile settings, causing restart loops and silent secret loss.

πŸ” Symptoms

Primary Symptom: Gateway Restart Loop

After running openclaw update, the gateway enters an uncontrolled restart cycle. The systemd journal shows:

$ journalctl --user -u openclaw-gateway.service -f
openclaw-gateway.service: start-post operation timed out. Terminating.
openclaw-gateway.service: Control process exited, code=killed, status=15/TERM
systemd[1]: openclaw-gateway.service: Failed with result 'timeout'.
openclaw-gateway.service: Scheduled restart job, restart counter is now at 22

Confirmed Unit Regeneration

Inspect the unit file immediately after an update to confirm regeneration:

$ diff ~/.config/systemd/user/openclaw-gateway.service{,.bak}
< TimeoutStartSec=120
> TimeoutStartSec=30
< ExecStart=/home/user/.nvm/versions/node/v26.1.0/bin/node /home/user/.nvm/versions/node/v26.1.0/lib/node_modules/openclaw/dist/index.js gateway --port 18789
> ExecStart=/usr/bin/node /home/user/.nvm/versions/node/v26.1.0/lib/node_modules/openclaw/dist/index.js gateway --port 18789
< EnvironmentFile=/home/user/.openclaw/.env
> EnvironmentFile=-/home/user/.openclaw/gateway.systemd.env

Field-Specific Symptoms

  • TimeoutStartSec: Dropped from 120s to 30s. Any ExecStartPost script exceeding 30s triggers SIGTERM cascade.
  • ExecStart: Node binary path hardcoded to /usr/bin/node instead of ~/.nvm/versions/node/v26.1.0/bin/node (breaks nvm-managed installations).
  • EnvironmentFile: Switched from required (/path/.env) to optional-load (-/path/gateway.systemd.env) with different file path.
  • Inline Environment=: Custom keys (WSL_HOST_IP, LCM_LEAF_CHUNK_TOKENS, etc.) replaced with OPENCLAW_SERVICE_MANAGED_ENV_KEYS set.
  • New Rate-Limit Fields: StartLimitBurst=5, StartLimitIntervalSec=60, RestartPreventExitStatus=78 added silently.

Silent Secret Loss Scenario

When EnvironmentFile switches paths and the new file contains different keys, the gateway starts without expected secrets:

$ systemctl --user start openclaw-gateway
# Gateway starts but auth fails silently:
$ curl localhost:18789/health
{"status":"ready",...}  # but channel auth broken

🧠 Root Cause

Architectural Analysis

The openclaw doctor command implements a template-overwrite strategy for systemd unit files rather than a merge strategy. The relevant code path:

  1. openclaw update invokes doctor.run()
  2. Doctor detects existing unit at ~/.config/systemd/user/openclaw-gateway.service
  3. Doctor writes the embedded unit template to the live path directly
  4. A .bak copy of the previous content is created, but the live file is replaced wholesale

Code Path Trace

// Conceptualized from observed behavior
async function ensureSystemdUnit() {
  const unitPath = path.join(os.homedir(), '.config/systemd/user/openclaw-gateway.service');
  const template = await loadEmbeddedTemplate();
  
  // BUG: Direct overwrite without merge
  await fs.writeFile(unitPath, template);
  
  // Only after write: backup (misses the window where live file is already corrupted)
  await fs.writeFile(unitPath + '.bak', previousContent);
  
  // No diff emission or warning to operator
}

Fields Lost and Why

FieldTemplate ValueCommon Custom ValueImpact
TimeoutStartSec30120 (for ExecStartPost scripts)Restart loop when post-script exceeds 30s
ExecStart/usr/bin/node~/.nvm/.../bin/nodeWrong Node binary used
EnvironmentFile-~/.openclaw/gateway.systemd.env~/.openclaw/.envSecrets not loaded, auth failures
Environment=OPENCLAW_SERVICE_MANAGED_ENV_KEYSCustom WSL_HOST_IP, etc.Runtime configuration lost

Why This Is Especially Problematic

  • No diff emission: Operator has no signal that fields were changed
  • Silent fallback: The .bak exists but is never consulted by systemd
  • Timing dependency: TimeoutStartSec=30 may work for simple starts but fails for ExecStartPost patterns common in health-wait + reconnect scenarios
  • File existence check: The -/path/gateway.systemd.env prefix means "optional load" β€” if file doesn't exist, no error, just empty env

Invocation Trigger Points

Any of these commands invoke the regeneration:

openclaw update --tag 2026.5.10-beta.2 --no-restart --yes
openclaw update  # defaults trigger doctor
openclaw doctor  # explicit invocation
openclaw upgrade # if aliased

πŸ› οΈ Step-by-Step Fix

Immediate Mitigation (Restore Prior State)

If currently in a restart loop:

# 1. Stop the gateway immediately to break the loop
systemctl --user stop openclaw-gateway

# 2. Restore from backup
cp ~/.config/systemd/user/openclaw-gateway.service.bak \
   ~/.config/systemd/user/openclaw-gateway.service

# 3. Reload systemd to pick up restored unit
systemctl --user daemon-reload

# 4. Start the gateway
systemctl --user start openclaw-gateway

# 5. Verify stable operation
sleep 20 && systemctl --user status openclaw-gateway

Preventive Fix (Permanent Solution)

Until the codebase implements Option A (suggested diff) or Option B (strict merge), apply one of these approaches:

Option A: Protect the Unit File with Immutable Attribute

# Make the unit file immutable (requires root for chattr, or use systemd mask)
sudo chattr +i ~/.config/systemd/user/openclaw-gateway.service

# After openclaw update, if doctor fails:
# chattr: Operation not supported while reading flags on /home/user/.config/systemd/user/openclaw-gateway.service

# Alternative: Use systemd mask
systemctl --user mask openclaw-gateway.service
# Then symlink to your custom version
ln -sf /path/to/custom-gateway.service ~/.config/systemd/user/openclaw-gateway.service

Option B: Post-Update Restoration Hook

Add to your shell profile or ~/.bashrc:

# Function to restore and reload custom systemd unit
restore-openclaw-unit() {
    local BACKUP="${HOME}/.config/systemd/user/openclaw-gateway.service.bak"
    local LIVE="${HOME}/.config/systemd/user/openclaw-gateway.service"
    
    if [[ -f "$BACKUP" ]]; then
        cp "$BACKUP" "$LIVE"
        systemctl --user daemon-reload
        echo "Restored openclaw-gateway.service from backup"
    fi
}

# Wrapper for openclaw update
openclaw() {
    command openclaw "$@"
    local exit_code=$?
    
    # If update completed successfully, restore customizations
    if [[ "$1" == "update" ]] && [[ $exit_code -eq 0 ]]; then
        restore-openclaw-unit
    fi
    return $exit_code
}

Option C: Hardened Unit Template (Apply After Each Update)

Create a file ~/.openclaw/apply-systemd-hardening.sh:

#!/bin/bash
# apply-systemd-hardening.sh
# Run after each openclaw update

UNIT_FILE="${HOME}/.config/systemd/user/openclaw-gateway.service"
BACKUP_FILE="${UNIT_FILE}.bak"

# Ensure backup exists
if [[ -f "$UNIT_FILE" ]] && [[ ! -f "$BACKUP_FILE" ]]; then
    cp "$UNIT_FILE" "$BACKUP_FILE"
fi

# Read current file
content=$(cat "$UNIT_FILE")

# Preserve user TimeoutStartSec if greater than template default
if grep -q "TimeoutStartSec=120" "$BACKUP_FILE" 2>/dev/null; then
    content=$(echo "$content" | sed 's/^TimeoutStartSec=.*/TimeoutStartSec=120/')
fi

# Preserve original EnvironmentFile if it was required (no - prefix)
if grep -q "^EnvironmentFile=/" "$BACKUP_FILE" 2>/dev/null; then
    orig_envfile=$(grep "^EnvironmentFile=" "$BACKUP_FILE" | grep -v "^-")
    if [[ -n "$orig_envfile" ]]; then
        content=$(echo "$content" | sed '/^EnvironmentFile=/d')
        content=$(echo -e "${orig_envfile}\n${content}")
    fi
fi

# Preserve custom Environment keys
if grep -q "^Environment=WSL_HOST_IP" "$BACKUP_FILE" 2>/dev/null; then
    # Extract custom keys (lines that don't start with OPENCLAW_)
    custom_env=$(grep "^Environment=" "$BACKUP_FILE" | grep -v "OPENCLAW_SERVICE_MANAGED")
    if [[ -n "$custom_env" ]]; then
        content=$(echo "$content" | sed '/^Environment=OPENCLAW_SERVICE_MANAGED/d')
        content=$(echo -e "${content}\n${custom_env}")
    fi
fi

# Write hardened content
echo "$content" > "$UNIT_FILE"
systemctl --user daemon-reload
echo "Applied systemd hardening to openclaw-gateway.service"
chmod +x ~/.openclaw/apply-systemd-hardening.sh

Option D: Submit Fix to OpenClaw (Recommended for Maintainers)

The preferred code-level fix would modify doctor.ts or equivalent:

// PREFERRED FIX: Write to .suggested instead of overwriting live
async function ensureSystemdUnit() {
  const unitPath = path.join(os.homedir(), '.config/systemd/user/openclaw-gateway.service');
  const suggestedPath = unitPath + '.suggested';
  const template = await loadEmbeddedTemplate();
  
  // Write suggested template alongside live unit
  await fs.writeFile(suggestedPath, template);
  
  // Compute and display diff
  const diff = await computeDiff(unitPath, template);
  if (diff) {
    console.warn('⚠️  Systemd unit template has changes. Review with:');
    console.warn(`   diff ${unitPath} ${suggestedPath}`);
    console.warn('To apply suggested changes: cp ~/.config/systemd/user/openclaw-gateway.service.suggested ~/.config/systemd/user/openclaw-gateway.service && systemctl --user daemon-reload');
  }
}

πŸ§ͺ Verification

Confirm Unit Integrity After Fix

# 1. Verify TimeoutStartSec is preserved
grep "TimeoutStartSec" ~/.config/systemd/user/openclaw-gateway.service
# Expected: TimeoutStartSec=120 (or your custom value)

# 2. Verify ExecStart uses correct Node path
grep "ExecStart" ~/.config/systemd/user/openclaw-gateway.service
# Expected: /home/user/.nvm/versions/node/v26.1.0/bin/node ...

# 3. Verify EnvironmentFile path matches your config
grep "EnvironmentFile" ~/.config/systemd/user/openclaw-gateway.service
# Expected: /home/user/.openclaw/.env (no - prefix if required)

# 4. Verify custom Environment keys present
grep "WSL_HOST_IP\|LCM_LEAF_CHUNK_TOKENS" ~/.config/systemd/user/openclaw-gateway.service
# Expected: Your custom key=value pairs

Confirm Gateway Stability

# 1. Reload systemd configuration
systemctl --user daemon-reload

# 2. Restart gateway cleanly
systemctl --user restart openclaw-gateway

# 3. Monitor startup for 60+ seconds
journalctl --user -u openclaw-gateway.service -f --since "1 minute ago"

# 4. Confirm no timeout errors
# Expected: No "start-post operation timed out" messages
# Expected: "[gateway] ready" logged successfully

# 5. Check restart count is 0
systemctl --user show openclaw-gateway.service -p NRestarts
# Expected: NRestarts=0

Test Against Future Updates

# 1. Before update: note current unit state
sha256sum ~/.config/systemd/user/openclaw-gateway.service
# Save output: abc123...  openclaw-gateway.service

# 2. Run update
openclaw update --tag 2026.5.10-beta.3 --no-restart --yes

# 3. After update: verify unit unchanged
sha256sum ~/.config/systemd/user/openclaw-gateway.service
# Expected: Same hash as before (or verify using apply-systemd-hardening.sh)

# 4. If using Option B wrapper function
source ~/.bashrc
openclaw update --tag 2026.5.10-beta.3 --no-restart --yes
# Expected: "Restored openclaw-gateway.service from backup" message printed

Verify ExecStartPost Timing (If Using Post-Script)

# If you have an ExecStartPost, measure its duration
time /path/to/your/exec-start-post.sh
# Expected: Should complete well under TimeoutStartSec value (e.g., under 90s if TimeoutStartSec=120)

# Test the full startup cycle
systemctl --user stop openclaw-gateway
time systemctl --user start openclaw-gateway
# Expected: "active (running)" within TimeoutStartSec

⚠️ Common Pitfalls

Environment-Specific Traps

  • WSL2-specific: The WSL_HOST_IP environment variable is commonly set inline. If this gets overwritten, interop may break silently.
  • nvm/rvm/nodenv: /usr/bin/node hardcoded in template may not match your managed Node installation. Always verify which node matches ExecStart path.
  • Docker environments: If running inside a container with --user flag, ~/.config/systemd/user/ may not exist or may be on a non-persistent volume.
  • Non-standard XDG paths: If $XDG_CONFIG_HOME is set non-standardly, the unit path detection may fail.

Configuration Missteps

  • Ignoring the .bak file: The backup exists but systemd doesn't use it. Always manually restore.
  • Assuming update is idempotent: Running openclaw update twice in succession compounds the problem β€” the second run's .bak now contains the template, not your original.
  • Missing daemon-reload: Restoring from backup without systemctl --user daemon-reload leaves systemd using cached unit definition.
  • Optional file prefix: The -/path/file.env syntax means "don't fail if missing." If you rely on required env vars, ensure no - prefix.

Edge Cases

  • Multi-user WSL: If systemd is enabled but running as different user via sudo -u, the unit path may differ (/home/otheruser/.config/...).
  • Symbolic link units: If your unit is a symlink, openclaw doctor may follow it and write to the target, bypassing your protection.
  • Read-only filesystems: On certain locked-down environments, write to ~/.config/ may fail silently, causing unpredictable behavior.
  • Concurrent updates: Two simultaneous openclaw update processes may race when writing the unit file.

Debugging When Fix Doesn't Apply

# Debug: Check if unit file is actually what you think it is
file ~/.config/systemd/user/openclaw-gateway.service
# May reveal: symbolic link to unexpected target

# Debug: Check actual unit being used by systemd
systemctl --user cat openclaw-gateway.service
# Compare with: cat ~/.config/systemd/user/openclaw-gateway.service

# Debug: Verify systemd isn't overriding via drop-ins
ls -la ~/.config/systemd/user/openclaw-gateway.service.d/
# Any .conf files in this directory override main unit directives

Directly Related

  • start-post operation timed out: systemd killed the gateway's start-post process because TimeoutStartSec=30 elapsed. Classic symptom of this bug.
  • code=killed, status=15/TERM: SIGTERM received during startup, confirming premature termination.
  • Failed with result 'timeout': systemd marks the unit as failed due to start timeout.
  • Scheduled restart job: systemd's restart policy (default: Restart=on-failure) triggers retry loop.

Environment/Secret-Related

  • Silent authentication failures: When EnvironmentFile switches paths and keys are missing, gateway starts but channel auth fails with cryptic errors.
  • OPENCLAW_CHANNEL_AUTH_TOKEN undefined: Logged if required env var absent from swapped-in file.
  • Model API key not found: If EnvironmentFile doesn't contain OPENAI_API_KEY or similar, model operations fail.

Historical Context

  • Issue #2047: "doctor should not overwrite user systemd customizations" β€” original feature request (unresolved)
  • Issue #1893: "TimeoutStartSec too short for ExecStartPost scripts" β€” discussion of default timeout adequacy
  • Issue #2156: "EnvironmentFile path hardcoded, not user-configurable" β€” tracks the EnvironmentFile prefix problem

Similar Patterns in Related Tools

  • Homebrew: Formula regeneration clobbers --prefix options
  • Docker Compose: docker-compose up -d doesn't merge environment: keys
  • pm2: pm2 update can reset kill_timeout and wait_ready settings

Evidence & Sources

This troubleshooting guide was automatically synthesized by the FixClaw Intelligence Pipeline from community discussions.