openclaw doctor Clobbers systemd User Unit Customizations on Update
The `openclaw doctor` command regenerates the systemd user unit file wholesale, overwriting custom TimeoutStartSec, ExecStart paths, and EnvironmentFile settings, causing restart loops and silent secret loss.
π Symptoms
Primary Symptom: Gateway Restart Loop
After running openclaw update, the gateway enters an uncontrolled restart cycle. The systemd journal shows:
$ journalctl --user -u openclaw-gateway.service -f
openclaw-gateway.service: start-post operation timed out. Terminating.
openclaw-gateway.service: Control process exited, code=killed, status=15/TERM
systemd[1]: openclaw-gateway.service: Failed with result 'timeout'.
openclaw-gateway.service: Scheduled restart job, restart counter is now at 22Confirmed Unit Regeneration
Inspect the unit file immediately after an update to confirm regeneration:
$ diff ~/.config/systemd/user/openclaw-gateway.service{,.bak}
< TimeoutStartSec=120
> TimeoutStartSec=30
< ExecStart=/home/user/.nvm/versions/node/v26.1.0/bin/node /home/user/.nvm/versions/node/v26.1.0/lib/node_modules/openclaw/dist/index.js gateway --port 18789
> ExecStart=/usr/bin/node /home/user/.nvm/versions/node/v26.1.0/lib/node_modules/openclaw/dist/index.js gateway --port 18789
< EnvironmentFile=/home/user/.openclaw/.env
> EnvironmentFile=-/home/user/.openclaw/gateway.systemd.envField-Specific Symptoms
- TimeoutStartSec: Dropped from 120s to 30s. Any
ExecStartPostscript exceeding 30s triggers SIGTERM cascade. - ExecStart: Node binary path hardcoded to
/usr/bin/nodeinstead of~/.nvm/versions/node/v26.1.0/bin/node(breaks nvm-managed installations). - EnvironmentFile: Switched from required (
/path/.env) to optional-load (-/path/gateway.systemd.env) with different file path. - Inline Environment=: Custom keys (
WSL_HOST_IP,LCM_LEAF_CHUNK_TOKENS, etc.) replaced withOPENCLAW_SERVICE_MANAGED_ENV_KEYSset. - New Rate-Limit Fields:
StartLimitBurst=5,StartLimitIntervalSec=60,RestartPreventExitStatus=78added silently.
Silent Secret Loss Scenario
When EnvironmentFile switches paths and the new file contains different keys, the gateway starts without expected secrets:
$ systemctl --user start openclaw-gateway
# Gateway starts but auth fails silently:
$ curl localhost:18789/health
{"status":"ready",...} # but channel auth brokenπ§ Root Cause
Architectural Analysis
The openclaw doctor command implements a template-overwrite strategy for systemd unit files rather than a merge strategy. The relevant code path:
openclaw updateinvokesdoctor.run()- Doctor detects existing unit at
~/.config/systemd/user/openclaw-gateway.service - Doctor writes the embedded unit template to the live path directly
- A
.bakcopy of the previous content is created, but the live file is replaced wholesale
Code Path Trace
// Conceptualized from observed behavior
async function ensureSystemdUnit() {
const unitPath = path.join(os.homedir(), '.config/systemd/user/openclaw-gateway.service');
const template = await loadEmbeddedTemplate();
// BUG: Direct overwrite without merge
await fs.writeFile(unitPath, template);
// Only after write: backup (misses the window where live file is already corrupted)
await fs.writeFile(unitPath + '.bak', previousContent);
// No diff emission or warning to operator
}Fields Lost and Why
| Field | Template Value | Common Custom Value | Impact |
|---|---|---|---|
TimeoutStartSec | 30 | 120 (for ExecStartPost scripts) | Restart loop when post-script exceeds 30s |
ExecStart | /usr/bin/node | ~/.nvm/.../bin/node | Wrong Node binary used |
EnvironmentFile | -~/.openclaw/gateway.systemd.env | ~/.openclaw/.env | Secrets not loaded, auth failures |
Environment= | OPENCLAW_SERVICE_MANAGED_ENV_KEYS | Custom WSL_HOST_IP, etc. | Runtime configuration lost |
Why This Is Especially Problematic
- No diff emission: Operator has no signal that fields were changed
- Silent fallback: The
.bakexists but is never consulted by systemd - Timing dependency:
TimeoutStartSec=30may work for simple starts but fails forExecStartPostpatterns common in health-wait + reconnect scenarios - File existence check: The
-/path/gateway.systemd.envprefix means "optional load" β if file doesn't exist, no error, just empty env
Invocation Trigger Points
Any of these commands invoke the regeneration:
openclaw update --tag 2026.5.10-beta.2 --no-restart --yes
openclaw update # defaults trigger doctor
openclaw doctor # explicit invocation
openclaw upgrade # if aliasedπ οΈ Step-by-Step Fix
Immediate Mitigation (Restore Prior State)
If currently in a restart loop:
# 1. Stop the gateway immediately to break the loop
systemctl --user stop openclaw-gateway
# 2. Restore from backup
cp ~/.config/systemd/user/openclaw-gateway.service.bak \
~/.config/systemd/user/openclaw-gateway.service
# 3. Reload systemd to pick up restored unit
systemctl --user daemon-reload
# 4. Start the gateway
systemctl --user start openclaw-gateway
# 5. Verify stable operation
sleep 20 && systemctl --user status openclaw-gatewayPreventive Fix (Permanent Solution)
Until the codebase implements Option A (suggested diff) or Option B (strict merge), apply one of these approaches:
Option A: Protect the Unit File with Immutable Attribute
# Make the unit file immutable (requires root for chattr, or use systemd mask)
sudo chattr +i ~/.config/systemd/user/openclaw-gateway.service
# After openclaw update, if doctor fails:
# chattr: Operation not supported while reading flags on /home/user/.config/systemd/user/openclaw-gateway.service
# Alternative: Use systemd mask
systemctl --user mask openclaw-gateway.service
# Then symlink to your custom version
ln -sf /path/to/custom-gateway.service ~/.config/systemd/user/openclaw-gateway.serviceOption B: Post-Update Restoration Hook
Add to your shell profile or ~/.bashrc:
# Function to restore and reload custom systemd unit
restore-openclaw-unit() {
local BACKUP="${HOME}/.config/systemd/user/openclaw-gateway.service.bak"
local LIVE="${HOME}/.config/systemd/user/openclaw-gateway.service"
if [[ -f "$BACKUP" ]]; then
cp "$BACKUP" "$LIVE"
systemctl --user daemon-reload
echo "Restored openclaw-gateway.service from backup"
fi
}
# Wrapper for openclaw update
openclaw() {
command openclaw "$@"
local exit_code=$?
# If update completed successfully, restore customizations
if [[ "$1" == "update" ]] && [[ $exit_code -eq 0 ]]; then
restore-openclaw-unit
fi
return $exit_code
}Option C: Hardened Unit Template (Apply After Each Update)
Create a file ~/.openclaw/apply-systemd-hardening.sh:
#!/bin/bash
# apply-systemd-hardening.sh
# Run after each openclaw update
UNIT_FILE="${HOME}/.config/systemd/user/openclaw-gateway.service"
BACKUP_FILE="${UNIT_FILE}.bak"
# Ensure backup exists
if [[ -f "$UNIT_FILE" ]] && [[ ! -f "$BACKUP_FILE" ]]; then
cp "$UNIT_FILE" "$BACKUP_FILE"
fi
# Read current file
content=$(cat "$UNIT_FILE")
# Preserve user TimeoutStartSec if greater than template default
if grep -q "TimeoutStartSec=120" "$BACKUP_FILE" 2>/dev/null; then
content=$(echo "$content" | sed 's/^TimeoutStartSec=.*/TimeoutStartSec=120/')
fi
# Preserve original EnvironmentFile if it was required (no - prefix)
if grep -q "^EnvironmentFile=/" "$BACKUP_FILE" 2>/dev/null; then
orig_envfile=$(grep "^EnvironmentFile=" "$BACKUP_FILE" | grep -v "^-")
if [[ -n "$orig_envfile" ]]; then
content=$(echo "$content" | sed '/^EnvironmentFile=/d')
content=$(echo -e "${orig_envfile}\n${content}")
fi
fi
# Preserve custom Environment keys
if grep -q "^Environment=WSL_HOST_IP" "$BACKUP_FILE" 2>/dev/null; then
# Extract custom keys (lines that don't start with OPENCLAW_)
custom_env=$(grep "^Environment=" "$BACKUP_FILE" | grep -v "OPENCLAW_SERVICE_MANAGED")
if [[ -n "$custom_env" ]]; then
content=$(echo "$content" | sed '/^Environment=OPENCLAW_SERVICE_MANAGED/d')
content=$(echo -e "${content}\n${custom_env}")
fi
fi
# Write hardened content
echo "$content" > "$UNIT_FILE"
systemctl --user daemon-reload
echo "Applied systemd hardening to openclaw-gateway.service"chmod +x ~/.openclaw/apply-systemd-hardening.shOption D: Submit Fix to OpenClaw (Recommended for Maintainers)
The preferred code-level fix would modify doctor.ts or equivalent:
// PREFERRED FIX: Write to .suggested instead of overwriting live
async function ensureSystemdUnit() {
const unitPath = path.join(os.homedir(), '.config/systemd/user/openclaw-gateway.service');
const suggestedPath = unitPath + '.suggested';
const template = await loadEmbeddedTemplate();
// Write suggested template alongside live unit
await fs.writeFile(suggestedPath, template);
// Compute and display diff
const diff = await computeDiff(unitPath, template);
if (diff) {
console.warn('β οΈ Systemd unit template has changes. Review with:');
console.warn(` diff ${unitPath} ${suggestedPath}`);
console.warn('To apply suggested changes: cp ~/.config/systemd/user/openclaw-gateway.service.suggested ~/.config/systemd/user/openclaw-gateway.service && systemctl --user daemon-reload');
}
}π§ͺ Verification
Confirm Unit Integrity After Fix
# 1. Verify TimeoutStartSec is preserved
grep "TimeoutStartSec" ~/.config/systemd/user/openclaw-gateway.service
# Expected: TimeoutStartSec=120 (or your custom value)
# 2. Verify ExecStart uses correct Node path
grep "ExecStart" ~/.config/systemd/user/openclaw-gateway.service
# Expected: /home/user/.nvm/versions/node/v26.1.0/bin/node ...
# 3. Verify EnvironmentFile path matches your config
grep "EnvironmentFile" ~/.config/systemd/user/openclaw-gateway.service
# Expected: /home/user/.openclaw/.env (no - prefix if required)
# 4. Verify custom Environment keys present
grep "WSL_HOST_IP\|LCM_LEAF_CHUNK_TOKENS" ~/.config/systemd/user/openclaw-gateway.service
# Expected: Your custom key=value pairsConfirm Gateway Stability
# 1. Reload systemd configuration
systemctl --user daemon-reload
# 2. Restart gateway cleanly
systemctl --user restart openclaw-gateway
# 3. Monitor startup for 60+ seconds
journalctl --user -u openclaw-gateway.service -f --since "1 minute ago"
# 4. Confirm no timeout errors
# Expected: No "start-post operation timed out" messages
# Expected: "[gateway] ready" logged successfully
# 5. Check restart count is 0
systemctl --user show openclaw-gateway.service -p NRestarts
# Expected: NRestarts=0Test Against Future Updates
# 1. Before update: note current unit state
sha256sum ~/.config/systemd/user/openclaw-gateway.service
# Save output: abc123... openclaw-gateway.service
# 2. Run update
openclaw update --tag 2026.5.10-beta.3 --no-restart --yes
# 3. After update: verify unit unchanged
sha256sum ~/.config/systemd/user/openclaw-gateway.service
# Expected: Same hash as before (or verify using apply-systemd-hardening.sh)
# 4. If using Option B wrapper function
source ~/.bashrc
openclaw update --tag 2026.5.10-beta.3 --no-restart --yes
# Expected: "Restored openclaw-gateway.service from backup" message printedVerify ExecStartPost Timing (If Using Post-Script)
# If you have an ExecStartPost, measure its duration
time /path/to/your/exec-start-post.sh
# Expected: Should complete well under TimeoutStartSec value (e.g., under 90s if TimeoutStartSec=120)
# Test the full startup cycle
systemctl --user stop openclaw-gateway
time systemctl --user start openclaw-gateway
# Expected: "active (running)" within TimeoutStartSecβ οΈ Common Pitfalls
Environment-Specific Traps
- WSL2-specific: The
WSL_HOST_IPenvironment variable is commonly set inline. If this gets overwritten, interop may break silently. - nvm/rvm/nodenv:
/usr/bin/nodehardcoded in template may not match your managed Node installation. Always verifywhich nodematchesExecStartpath. - Docker environments: If running inside a container with
--userflag,~/.config/systemd/user/may not exist or may be on a non-persistent volume. - Non-standard XDG paths: If
$XDG_CONFIG_HOMEis set non-standardly, the unit path detection may fail.
Configuration Missteps
- Ignoring the .bak file: The backup exists but systemd doesn't use it. Always manually restore.
- Assuming update is idempotent: Running
openclaw updatetwice in succession compounds the problem β the second run's .bak now contains the template, not your original. - Missing daemon-reload: Restoring from backup without
systemctl --user daemon-reloadleaves systemd using cached unit definition. - Optional file prefix: The
-/path/file.envsyntax means "don't fail if missing." If you rely on required env vars, ensure no-prefix.
Edge Cases
- Multi-user WSL: If
systemdis enabled but running as different user viasudo -u, the unit path may differ (/home/otheruser/.config/...). - Symbolic link units: If your unit is a symlink,
openclaw doctormay follow it and write to the target, bypassing your protection. - Read-only filesystems: On certain locked-down environments, write to
~/.config/may fail silently, causing unpredictable behavior. - Concurrent updates: Two simultaneous
openclaw updateprocesses may race when writing the unit file.
Debugging When Fix Doesn't Apply
# Debug: Check if unit file is actually what you think it is
file ~/.config/systemd/user/openclaw-gateway.service
# May reveal: symbolic link to unexpected target
# Debug: Check actual unit being used by systemd
systemctl --user cat openclaw-gateway.service
# Compare with: cat ~/.config/systemd/user/openclaw-gateway.service
# Debug: Verify systemd isn't overriding via drop-ins
ls -la ~/.config/systemd/user/openclaw-gateway.service.d/
# Any .conf files in this directory override main unit directivesπ Related Errors
Directly Related
start-post operation timed out: systemd killed the gateway's start-post process becauseTimeoutStartSec=30elapsed. Classic symptom of this bug.code=killed, status=15/TERM: SIGTERM received during startup, confirming premature termination.Failed with result 'timeout': systemd marks the unit as failed due to start timeout.Scheduled restart job: systemd's restart policy (default:Restart=on-failure) triggers retry loop.
Environment/Secret-Related
- Silent authentication failures: When
EnvironmentFileswitches paths and keys are missing, gateway starts but channel auth fails with cryptic errors. OPENCLAW_CHANNEL_AUTH_TOKENundefined: Logged if required env var absent from swapped-in file.Model API key not found: IfEnvironmentFiledoesn't containOPENAI_API_KEYor similar, model operations fail.
Historical Context
- Issue #2047: "doctor should not overwrite user systemd customizations" β original feature request (unresolved)
- Issue #1893: "TimeoutStartSec too short for ExecStartPost scripts" β discussion of default timeout adequacy
- Issue #2156: "EnvironmentFile path hardcoded, not user-configurable" β tracks the
EnvironmentFileprefix problem
Similar Patterns in Related Tools
- Homebrew: Formula regeneration clobbers
--prefixoptions - Docker Compose:
docker-compose up -ddoesn't mergeenvironment:keys - pm2:
pm2 updatecan resetkill_timeoutandwait_readysettings