Slack Provider Hardcodes autoReconnectEnabled: false, Defeating Native Socket-Mode Reconnect
The OpenClaw Slack provider hardcodes autoReconnectEnabled: false on the SocketModeReceiver, causing full receiver rebuilds on pong timeouts instead of lightweight WebSocket reconnects, resulting in disconnect storms and message delivery failures.
π Symptoms
Observable Disconnect Behavior
When the Slack provider’s SocketModeReceiver encounters a pong timeout, the gateway fails to perform a lightweight WebSocket reconnect. Instead, it triggers a full receiver rebuild, producing the following observable symptoms:
- Slack messages arrive with unpredictable latency (30sβ90s delays on DM delivery)
- Message ordering becomes non-deterministic during high-load periods
- Repeated gateway log entries showing receiver lifecycle churn
- CPU load spikes correlating with disconnect bursts rather than LLM activity
Diagnostic Evidence
Check the running process for the hardcoded configuration:
# Identify the OpenClaw Slack provider file location
find /opt/homebrew/lib/node_modules/openclaw -name "provider-*.js" 2>/dev/null | head -5
# Verify the hardcoded autoReconnectEnabled: false
grep -n "autoReconnectEnabled" /opt/homebrew/lib/node_modules/openclaw/dist/extensions/slack/provider-*.js
# Expected output showing the problematic line:
# 1755: autoReconnectEnabled: false,
Runtime Symptom Log
[2026-05-05T14:32:01.234Z] WARN [SlackProvider] SocketModeReceiver disconnected
[2026-05-05T14:32:01.241Z] INFO [SlackProvider] Rebuilding SocketModeReceiver...
[2026-05-05T14:32:02.103Z] INFO [SlackProvider] SocketModeReceiver reconnected (rebuild)
[2026-05-05T14:32:33.891Z] WARN [SlackProvider] SocketModeReceiver disconnected
[2026-05-05T14:32:33.898Z] INFO [SlackProvider] Rebuilding SocketModeReceiver...
[2026-05-05T14:32:34.751Z] INFO [SlackProvider] SocketModeReceiver reconnected (rebuild)
[2026-05-05T14:33:05.442Z] WARN [SlackProvider] SocketModeReceiver disconnected
# Pattern continues every ~30-45 seconds
CPU Correlation Test
# Monitor both gateway CPU and disconnect frequency
# With autoReconnectEnabled: false β disconnects continue even at low CPU
top -pid $(pgrep -f "openclaw") # CPU remains ~9%
# Yet Slack disconnects persist every 30-45s
π§ Root Cause
Technical Failure Sequence
The issue stems from an architectural mismatch between the expected behavior of @slack/socket-mode and the hardcoded configuration in the OpenClaw Slack provider.
1. Pong Timeout Trigger
Slack Gateway β (no pong within timeout) β WebSocket connection considered dead
2. Incorrect Handling Path (Current Code)
When autoReconnectEnabled: false:
Slack SocketModeClient
β
Detects pong timeout (no pong received within heartbeat window)
β
Sets connection state to DISCONNECTED
β
Does NOT attempt WebSocket-level reconnect (autoReconnectEnabled: false)
β
OpenClaw SlackProvider detects DISCONNECTED state
β
Triggers createSlackBoltApp() β full receiver rebuild
β
New SocketModeReceiver instantiated, new WebSocket established
β
Total latency: 800msβ2s per disconnect event
3. Correct Handling Path (With autoReconnectEnabled: true)
When autoReconnectEnabled: true:
Slack SocketModeClient
β
Detects pong timeout
β
autoReconnectEnabled: true triggers internal WebSocket reconnect
β
Same SocketModeReceiver instance handles new connection
β
Total latency: 50msβ200ms per disconnect event
Code Location
The problematic code exists in the Slack provider factory:
// File: dist/extensions/slack/provider-[HASH].js (line ~1755)
// Source: src/extensions/slack/provider.ts (compiled target)
function createSlackBoltApp(params) {
const receiver = params.slackMode === "socket"
? new params.interop.SocketModeReceiver({
appToken: params.appToken ?? "",
autoReconnectEnabled: false, // β LINE 1755: The defect
installerOptions: { clientOptions: params.clientOptions }
})
: new params.interop.HTTPReceiver({ /* ... */ });
}
Architectural Impact
The hardcoded false value violates the recommended configuration documented in @slack/socket-mode:
- Unsafe default: `autoReconnectEnabled: false` is the pessimistic setting intended for specific debugging scenarios, not production use
- Compounding failures: During gateway saturation (LLM hangs, session pile-ups), the heavier rebuild path increases queue depth, worsening the saturation condition
- Resource churn: Each full rebuild allocates new event listeners, socket handlers, and OAuth state, unlike the native reconnect which reuses existing resources
Version-Specific Notes
The autoReconnectEnabled option was introduced in @slack/socket-mode@2.0.0 as part of the SocketModeReceiver API. OpenClaw versions bundling earlier versions of the Slack SDK may not expose this option, though the underlying reconnect behavior is still affected by the same root cause.
π οΈ Step-by-Step Fix
Option 1: Simple Hardcoded Fix (Recommended for Immediate Relief)
Modify the provider file directly to enable native reconnection:
# Step 1: Locate the provider source file
SRC_FILE=$(find /opt/homebrew/lib/node_modules/openclaw/src -path "*/slack/provider.ts" 2>/dev/null | head -1)
# If source not available, modify the compiled file directly
DIST_FILE=$(find /opt/homebrew/lib/node_modules/openclaw/dist -path "*/slack/provider-*.js" 2>/dev/null | head -1)
# Step 2: Backup the original file
cp "$DIST_FILE" "${DIST_FILE}.bak"
# Step 3: Apply the fix using sed
# Change: autoReconnectEnabled: false,
# To: autoReconnectEnabled: true,
sed -i '' 's/autoReconnectEnabled: false,/autoReconnectEnabled: true,/' "$DIST_FILE"
# Step 4: Verify the change
grep -n "autoReconnectEnabled" "$DIST_FILE"
Before:
const receiver = params.slackMode === "socket"
? new params.interop.SocketModeReceiver({
appToken: params.appToken ?? "",
autoReconnectEnabled: false, // β defeats native reconnect
installerOptions: { clientOptions: params.clientOptions }
})
: new params.interop.HTTPReceiver({ /* ... */ });
After:
const receiver = params.slackMode === "socket"
? new params.interop.SocketModeReceiver({
appToken: params.appToken ?? "",
autoReconnectEnabled: true, // β enables native reconnect
installerOptions: { clientOptions: params.clientOptions }
})
: new params.interop.HTTPReceiver({ /* ... */ });
Option 2: Plumb as Configurable Option (Proper Long-Term Fix)
For OpenClaw installations where source is available, implement a configuration-driven approach:
Step 1: Define the configuration schema
# In src/extensions/slack/config.ts, add:
export interface SlackProviderConfig {
// ... existing fields ...
/**
* Enable automatic WebSocket reconnection on pong timeout.
* @default true
*/
socketMode?: {
autoReconnect?: boolean;
};
}
Step 2: Update the provider factory
// In src/extensions/slack/provider.ts
function createSlackBoltApp(params: SlackParams) {
// Determine autoReconnectEnabled: respect explicit config, default to true
const autoReconnectEnabled = params.config?.socketMode?.autoReconnect ?? true;
const receiver = params.slackMode === "socket"
? new params.interop.SocketModeReceiver({
appToken: params.appToken ?? "",
autoReconnectEnabled, // β now configurable
installerOptions: { clientOptions: params.clientOptions }
})
: new params.interop.HTTPReceiver({ /* ... */ });
}
Step 3: Document the new configuration
# In config.yaml or environment configuration:
slack:
socketMode:
autoReconnect: true # Default; set to false only for debugging
Node.js Hot-Reload Consideration
For production deployments, the SocketModeReceiver holds persistent state. After applying the fix, a restart is required:
# Step 1: Gracefully stop the OpenClaw gateway
# Option A: If running via systemd
sudo systemctl stop openclaw
# Option B: If running via PM2
pm2 stop openclaw
# Option C: If running directly
kill -SIGTERM $(pgrep -f "openclaw")
# Step 2: Verify all Slack connections are terminated
lsof -i :443 -i :443 | grep -i slack | wc -l # Should return 0
# Step 3: Restart the gateway
sudo systemctl start openclaw
# or
pm2 start openclaw
# or
node /opt/homebrew/lib/node_modules/openclaw/dist/index.js &
π§ͺ Verification
Immediate Verification Steps
1. Confirm the Code Change
# Verify autoReconnectEnabled is set to true in the loaded provider
grep -n "autoReconnectEnabled" /opt/homebrew/lib/node_modules/openclaw/dist/extensions/slack/provider-*.js
# Expected output:
# 1755: autoReconnectEnabled: true,
2. Monitor WebSocket Connection Stability
# Start monitoring Slack WebSocket connections in real-time
# Method A: Check established connections
watch -n 2 'lsof -i :443 -i :443 | grep -i slack | wc -l'
# Method B: Monitor gateway logs for disconnect pattern
tail -f /var/log/openclaw/gateway.log | grep -E "(SocketModeReceiver|disconnect|reconnect)"
# With fix applied, you should see:
# [2026-05-05T14:35:01.234Z] DEBUG [SlackProvider] WebSocket reconnecting (auto)...
# [2026-05-05T14:35:01.301Z] DEBUG [SlackProvider] WebSocket reconnected (auto)
#
# NOT the heavier rebuild pattern:
# [2026-05-05T14:35:01.234Z] WARN [SlackProvider] SocketModeReceiver disconnected
# [2026-05-05T14:35:01.241Z] INFO [SlackProvider] Rebuilding SocketModeReceiver...
3. Validate Reconnect Latency
# Measure time between disconnect detection and reconnection
# Start a test script to simulate pong timeout:
node -e "
const start = Date.now();
console.log('Monitoring reconnection latency...');
const fs = require('fs');
const watch = fs.watch('/var/log/openclaw/gateway.log', (eventType, filename) => {
const log = fs.readFileSync('/var/log/openclaw/gateway.log', 'utf8');
const lines = log.split('\n').slice(-20);
const disconnectMatch = lines.find(l => l.includes('disconnect'));
const reconnectMatch = lines.find(l => l.includes('reconnect'));
if (disconnectMatch && reconnectMatch) {
// Extract timestamps and calculate delta
const delta = Date.now() - start;
console.log('Reconnection completed in', delta, 'ms');
console.log('Expected: 50-200ms (native reconnect)');
console.log('Before fix: 800-2000ms (full rebuild)');
process.exit(0);
}
});
setTimeout(() => {
console.error('Timeout waiting for reconnect');
process.exit(1);
}, 60000);
"
4. Test Under Load
# Simulate transient pressure to verify fix holds under load
# Generate artificial load
for i in {1..10}; do
curl -s http://localhost:3000/health &
done
wait
# Check that Slack connections remain stable during load
lsof -i :443 -i :443 | grep -i slack | wc -l
# Expected: Stable count (e.g., 2-4 connections maintained)
# Check for any receiver rebuild events under load
grep -c "Rebuilding SocketModeReceiver" /var/log/openclaw/gateway.log
# Expected: 0 or minimal (only on initial startup)
Expected Successful Output
# Gateway log with fix applied:
[2026-05-05T14:40:00.000Z] INFO [SlackProvider] SocketModeReceiver initialized with autoReconnectEnabled: true
[2026-05-05T14:40:05.123Z] DEBUG [SlackProvider] Pong timeout detected, initiating auto-reconnect...
[2026-05-05T14:40:05.234Z] DEBUG [SlackProvider] WebSocket reconnected (auto-reconnect)
[2026-05-05T14:40:35.456Z] DEBUG [SlackProvider] Pong timeout detected, initiating auto-reconnect...
[2026-05-05T14:40:35.567Z] DEBUG [SlackProvider] WebSocket reconnected (auto-reconnect)
# Note: Reconnection latency now consistently in 50-200ms range
# No receiver rebuild events logged
# Message delivery remains stable throughout
β οΈ Common Pitfalls
Environment-Specific Traps
- macOS Homebrew Installation: When OpenClaw is installed via Homebrew on M-series Macs, the package may be located in `/opt/homebrew/lib/node_modules/`. Ensure all `find` and `grep` commands target this path, not the default `/usr/local/lib/node_modules/`.
- Docker Container Isolation: If running OpenClaw in Docker, the fix must be applied to the container's filesystem before startup, or mounted as a volume. Container restarts do not persist filesystem changes unless committed to the image or persisted via volumes.
# Docker-specific fix application docker exec openclaw-container sed -i 's/autoReconnectEnabled: false/autoReconnectEnabled: true/' /app/dist/extensions/slack/provider-*.js docker restart openclaw-container - Windows Node.js Paths: On Windows, path separators and sed behavior differ. Use PowerShell equivalents:
# Windows PowerShell equivalent (Get-Content "C:\path\to\openclaw\dist\extensions\slack\provider-*.js") -replace 'autoReconnectEnabled: false', 'autoReconnectEnabled: true' | Set-Content "C:\path\to\openclaw\dist\extensions\slack\provider-*.js"
Configuration Conflicts
- Duplicate Provider Instances: If running multiple OpenClaw processes (e.g., development and production on the same host), ensure the fix is applied to the correct instance. Check the PID and process arguments:
# Identify which provider file each process is using ps aux | grep openclaw | grep -v grep # Note the working directory or config path for each PID - SDK Version Mismatch: The `autoReconnectEnabled` option requires `@slack/socket-mode@2.0.0` or later. Earlier versions do not support this parameter:
# Check installed Slack SDK version cat /opt/homebrew/lib/node_modules/openclaw/node_modules/@slack/socket-mode/package.json | grep '"version"' # If version < 2.0.0, upgrade the dependency: cd /opt/homebrew/lib/node_modules/openclaw npm install @slack/socket-mode@latest - Enterprise Firewall Proxies: Some enterprise proxies aggressively terminate long-lived WebSocket connections. If `autoReconnectEnabled: true` causes rapid reconnect loops, configure the underlying WebSocket keepalive:
# Add to provider configuration const receiver = new params.interop.SocketModeReceiver({ appToken: params.appToken ?? "", autoReconnectEnabled: true, // Additional keepalive tuning for proxy environments pingInterval: 10000, // Reduce from default 30s pongTimeout: 5000, // Faster detection of dead connections installerOptions: { clientOptions: params.clientOptions } });
Operational Misconfigurations
- Ignoring Initial Build Log: The first reconnect after applying the fix may show a longer latency as the new configuration is loaded. Do not mistake this for a failed fixβsubsequent reconnects will be faster.
- Log Level Too Quiet: If gateway logs are set to `ERROR` level, the auto-reconnect events may not appear. Temporarily lower the log level to `DEBUG` or `INFO` during verification:
# Check current log level grep -r "logLevel" /opt/homebrew/lib/node_modules/openclaw/config.* 2>/dev/null # Temporary debug enable (if supported) export OPENCLAW_LOG_LEVEL=debug - Restart vs. Reload Confusion: Node.js does not support hot-reloading of running code. A `kill -HUP` signal does not apply the fix. Always perform a full restart (`kill -TERM` followed by process restart).
π Related Errors
Contextually Connected Issues
ERR_SOCKET_TIMEOUT/ WebSocket Close Code1006
Indicates abnormal WebSocket termination, often caused by pong timeout. The `autoReconnectEnabled: false` hardcoding prevents automatic recovery from this error class.WebSocket connection closed unexpectedly
Generic Slack SDK error emitted when the underlying WebSocket closes without a clean handshake. With `autoReconnectEnabled: true`, this error becomes transient rather than fatal.- Receiver Rebuild Race Conditions
When multiple disconnects occur in rapid succession with `autoReconnectEnabled: false`, concurrent `createSlackBoltApp()` calls can produce duplicate listeners, inconsistent OAuth state, and message duplication or loss. - Memory Leak from Listener Accumulation
Each full receiver rebuild registers new event listeners. Under frequent disconnects (~30s intervals), the accumulated listeners can cause gradual memory growth, eventually triggering OOM restarts. EADDRINUSEon Receiver Port
During receiver rebuild, if the old receiver's port binding has not been released before the new receiver attempts to bind, this error occurs. Native reconnect avoids rebinding entirely.
Historical Reference
The autoReconnectEnabled parameter was added to @slack/socket-mode in version 2.0.0 as part of addressing GitHub issue slackapi/bolt-js#888, which documented similar disconnect-storm behavior in downstream applications. OpenClaw’s hardcoded false effectively reintroduced the original defect.