Agents/openai-completions: always send `stream_options.include_usage` on streaming requests, so local and custom OpenAI-compatible backends report real context usage instead of showing 0%.
Troubleshooting guide for fixes introduced in OpenClaw v2026.4.19-beta.2.
Troubleshooting: Context Token Usage Not Tracking on Custom OpenAI-Compatible Backends
Symptoms
If you’re using a custom or non-standard OpenAI-compatible backend (such as llama-cpp, LM Studio, or other local inference servers), you may encounter the following symptoms after upgrading to certain versions:
- Token usage always displays as 0% or 0 tokens in streaming responses
- Context window tracking appears broken even when the model is clearly consuming tokens
- Inconsistent behavior between streaming and non-streaming requests (non-streaming requests show accurate usage while streaming shows 0%)
This issue specifically impacts streaming request responses from alternative backends while standard OpenAI API endpoints continue to report usage correctly.
Root Cause
The buildOpenAICompletionsParams() function in the OpenAI transport layer was only including the stream_options: { include_usage: true } field in request payloads when compat.supportsUsageInStreaming evaluated to true.
For standard OpenAI endpoints, this compatibility flag resolved correctly, allowing the gateway’s resolveIncludeUsageForStreaming() function to properly track usage. However, for custom backends like llama-cpp and LM Studio, this flag resolved to false, causing the field to be omitted entirely. Without this field present in the request, the gateway had no data to process, resulting in context token usage always displaying as zero.
Step-by-Step Fix
Upgrading to version v2026.4.19-beta.2 resolves this issue automatically. No configuration changes are required.
- Upgrade the OpenClaw package to v2026.4.19-beta.2 or later
- Restart your gateway service to load the updated transport layer
- Verify the fix by initiating a streaming request to your custom backend
Verification
To confirm the fix is working:
- Send a streaming completion request to your custom backend (llama-cpp, LM Studio, etc.)
- Monitor the response payload for the
usagefield appearing in streaming chunks - Verify that context token tracking now displays the actual token count instead of 0%
// Example verification: Check that usage appears in streaming response
const response = await openai.chat.completions.create({
model: 'your-model',
messages: [{ role: 'user', content: 'Hello' }],
stream: true,
stream_options: { include_usage: true } // Now always included
});
for await (const chunk of response) {
if (chunk.usage) {
console.log('Usage tracked:', chunk.usage);
}
}
Common Pitfalls
Proxy interference: Some API proxies or middleware may strip unknown fields from request payloads. If you use a custom proxy layer, verify that it forwards
stream_optionsto the backend unchanged.Backend-specific configuration: While the field is now always included and backends that don’t support it will ignore it safely, some older custom backends may have unexpected behavior when encountering unknown fields. Test in a staging environment first.
Mixed environments: If you run multiple gateway instances with different versions, token tracking will be inconsistent across requests. Ensure all instances are upgraded to v2026.4.19-beta.2 for uniform behavior.
Related Errors
- Context window exhaustion not being detected: Because usage was always 0%, the system could not accurately warn users when they were approaching context limits during streaming requests.
- Usage reporting inconsistency: Comparing usage metrics between streaming and non-streaming requests was impossible since streaming always reported zero usage.
Affected Version: v2026.4.19-beta.2
Issue Reference: #68707
Changed Files: src/agents/openai-transport-stream.ts, src/agents/openai-transport-stream.test.ts