Feature Request: api.runtime.llm() β Plugin SDK Inference Method
Implement api.runtime.llm() to enable plugins to make LLM calls through OpenClaw's inference stack, inheriting routing, retries, rate limiting, and key rotation.
π Symptoms
Current State: Absence of Native LLM Inference in Plugin SDK
Plugins requiring large language model capabilities currently exhibit the following manifestations:
1. Direct Provider API Dependencies
Plugin code must contain explicit provider SDK initialization and API calls:
// Current workaround β plugin must manage provider specifics
import OpenAI from 'openai';
class MyPlugin {
async process(input: string): Promise {
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: input }],
});
return response.choices[0].message.content;
}
}
2. Duplicate Inference Paths
When multiple plugins or core runtime make LLM calls, each implements independent:
- API key management
- Retry logic
- Rate limiting
- Timeout handling
3. Missing Abstraction Layer
Reference to api.runtime.llm() produces a compile-time error:
// These exist but llm() does not:
api.runtime.tts({ text: "...", voice: "alloy" });
api.runtime.stt({ audio: buffer });
// This does not exist:
const result = await api.runtime.llm({ prompt: "..." });
// TypeError: api.runtime.llm is not a function
4. Inconsistent Observability
LLM calls from plugins bypass the centralized logging and telemetry that core runtime calls receive.
π§ Root Cause
Architectural Gap in Plugin SDK Design
The absence of api.runtime.llm() stems from an incomplete parity in the runtime abstraction layer.
1. Asymmetric API Surface
The OpenClaw Plugin SDK exposes audio inference methods but lacks text inference equivalents:
| Method | Status | File Reference |
|---|---|---|
api.runtime.tts() | Implemented | packages/plugin-sdk/src/runtime/tts.ts |
api.runtime.stt() | Implemented | packages/plugin-sdk/src/runtime/stt.ts |
api.runtime.llm() | Missing | N/A |
2. Dual Inference Path Problem
The current architecture forces plugins into a pattern that undermines platform consistency:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β OpenClaw Gateway β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β Core Runtime β Plugin Environment β β ββββββββββββ β βββββββββββββββββ β β β Model routing β β Direct provider calls β β β Key rotation β β Statically embedded keys β β β Rate limiting β β No throttling β β β Retry logic β β No backoff β β β Cost aggregation β β Invisible to platform β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. Configuration Fragmentation
When plugins hardcode provider calls, model changes require plugin code modifications rather than config updates:
// Current: Model change requires code edit
const MODEL = 'anthropic/claude-3-5-sonnet-20241022';
// Desired: Model change via config
// openclaw.yaml: runtime.llm.defaultModel: "haiku"
const result = await api.runtime.llm({ prompt: "..." });
4. Missing Interface Contract
The RuntimeContext interface in packages/plugin-sdk/src/runtime/context.ts lacks an LLM method signature:
// Current interface (incomplete)
interface RuntimeContext {
tts(options: TTSOptions): Promise<TTSResult>;
stt(options: STTOptions): Promise<STTResult>;
// llm method missing
}
// Required addition
interface LLMOptions {
prompt: string;
model?: string;
system?: string;
maxTokens?: number;
timeoutMs?: number;
}
interface LLMResult {
text: string;
usage?: {
inputTokens: number;
outputTokens: number;
};
model?: string;
}
π οΈ Step-by-Step Fix
Implementation Plan for api.runtime.llm()
Phase 1: Interface Definition
File: packages/plugin-sdk/src/runtime/types.ts
Add the LLM type definitions:
export interface LLMOptions {
/** The primary input prompt for the LLM */
prompt: string;
/** Model alias or full provider/model string (e.g., "haiku", "claude-3-5-sonnet") */
model?: string;
/** System prompt to prepend to the conversation */
system?: string;
/** Maximum tokens in the response */
maxTokens?: number;
/** Request timeout in milliseconds */
timeoutMs?: number;
/** Temperature for response randomness (0-2) */
temperature?: number;
/** Stop sequences to terminate generation */
stopSequences?: string[];
}
export interface LLMResult {
/** The generated text response */
text: string;
/** Token usage statistics */
usage?: {
inputTokens: number;
outputTokens: number;
totalTokens: number;
};
/** The actual model used (resolved from alias) */
model: string;
/** Latency in milliseconds */
latencyMs: number;
}
export interface LLMError {
code: 'TIMEOUT' | 'RATE_LIMITED' | 'MODEL_UNAVAILABLE' | 'INVALID_REQUEST' | 'AUTH_FAILED';
message: string;
retryable: boolean;
originalError?: Error;
}
Phase 2: Runtime Bridge Implementation
File: packages/plugin-sdk/src/runtime/llm-bridge.ts
Create the bridge module that communicates with the host process:
import { RuntimeBridge } from './bridge';
import { LLMOptions, LLMResult, LLMError } from './types';
export class LLM bridge {
private bridge: RuntimeBridge;
constructor(bridge: RuntimeBridge) {
this.bridge = bridge;
}
async call(options: LLMOptions): Promise<LLMResult> {
const startTime = Date.now();
const response = await this.bridge.send<{
success: boolean;
result?: LLMResult;
error?: LLMError;
}>('runtime:llm', {
prompt: options.prompt,
model: options.model ?? 'default',
system: options.system,
maxTokens: options.maxTokens ?? 1024,
timeoutMs: options.timeoutMs ?? 30000,
temperature: options.temperature,
stopSequences: options.stopSequences,
});
if (!response.success || !response.result) {
const error = response.error ?? {
code: 'INVALID_REQUEST',
message: 'Unknown error from runtime',
retryable: false,
};
throw new LLMError(error);
}
return {
...response.result,
latencyMs: Date.now() - startTime,
};
}
/** Streaming support for real-time output */
async *stream(options: LLMOptions): AsyncGenerator<string> {
const stream = await this.bridge.sendStream('runtime:llm:stream', {
prompt: options.prompt,
model: options.model ?? 'default',
system: options.system,
maxTokens: options.maxTokens ?? 1024,
timeoutMs: options.timeoutMs ?? 30000,
});
for await (const chunk of stream) {
if (chunk.text) {
yield chunk.text;
}
}
}
}
Phase 3: Extend RuntimeContext Interface
File: packages/plugin-sdk/src/runtime/context.ts
import { LLMOptions, LLMResult } from './types';
import { LLM bridge } from './llm-bridge';
export interface RuntimeContext {
// ... existing tts, stt methods ...
/**
* Execute a large language model inference request through OpenClaw's
* inference stack, inheriting routing, retries, rate limiting, and
* key rotation.
*
* @param options - LLM invocation options
* @returns Promise resolving to the LLM response
*
* @example
* const result = await api.runtime.llm({
* prompt: "Extract the company name from: Acme Corp founded in 2020",
* model: "haiku",
* maxTokens: 50,
* });
* console.log(result.text); // "Acme Corp"
*/
llm(options: LLMOptions): Promise<LLMResult>;
}
Phase 4: Plugin SDK Public Export
File: packages/plugin-sdk/src/index.ts
Ensure the runtime context is properly initialized and exported:
export { createPluginContext } from './context';
export type { LLMOptions, LLMResult, LLMError } from './runtime/types';
Phase 5: Host Runtime Handler (Framework Side)
File: packages/core/src/runtime/handlers/llm-handler.ts
The host process handles the actual LLM invocation with full platform features:
import { ModelRouter } from '../routing/model-router';
import { RateLimiter } from '../middleware/rate-limiter';
import { KeyRotator } from '../auth/key-rotation';
import { RetryPolicy } from '../retry/policy';
export class LLMHandler {
constructor(
private router: ModelRouter,
private rateLimiter: RateLimiter,
private keyRotator: KeyRotator,
private retryPolicy: RetryPolicy,
) {}
async handle(request: LLMInvocation): Promise<LLMResult> {
// Resolve model alias to provider/model
const resolved = this.router.resolve(request.model);
// Check rate limits
await this.rateLimiter.check(`llm:${resolved.provider}`, resolved.model);
// Acquire rotated credentials
const credentials = await this.keyRotator.getCredentials(resolved.provider);
// Execute with retry policy
return this.retryPolicy.execute(async () => {
return this.providerClient.complete({
provider: resolved.provider,
model: resolved.model,
prompt: request.prompt,
system: request.system,
maxTokens: request.maxTokens,
timeoutMs: request.timeoutMs,
credentials,
});
});
}
}
π§ͺ Verification
Testing the api.runtime.llm() Implementation
Unit Tests
Test File: packages/plugin-sdk/src/runtime/__tests__/llm-bridge.test.ts
import { describe, it, expect, vi } from 'vitest';
import { LLM bridge } from '../llm-bridge';
describe('LLM bridge', () => {
it('should pass options correctly to bridge', async () => {
const mockBridge = {
send: vi.fn().mockResolvedValue({
success: true,
result: {
text: 'Test response',
model: 'gpt-4o-mini',
usage: { inputTokens: 10, outputTokens: 5, totalTokens: 15 },
},
}),
};
const bridge = new LLM bridge(mockBridge as any);
const result = await bridge.call({
prompt: 'What is 2+2?',
model: 'haiku',
maxTokens: 50,
});
expect(result.text).toBe('Test response');
expect(mockBridge.send).toHaveBeenCalledWith('runtime:llm', expect.objectContaining({
prompt: 'What is 2+2?',
model: 'haiku',
maxTokens: 50,
}));
});
it('should apply default values for optional parameters', async () => {
const mockBridge = {
send: vi.fn().mockResolvedValue({
success: true,
result: { text: 'ok', model: 'default' },
}),
};
const bridge = new LLM bridge(mockBridge as any);
await bridge.call({ prompt: 'test' });
expect(mockBridge.send).toHaveBeenCalledWith('runtime:llm', expect.objectContaining({
model: 'default',
maxTokens: 1024,
timeoutMs: 30000,
}));
});
});
Integration Tests
Test File: packages/plugin-sdk/integration/llm-integration.test.ts
import { createPluginContext } from '../src';
describe('Plugin SDK LLM Integration', () => {
it('should execute LLM call through runtime', async () => {
const context = createPluginContext({
pluginId: 'test-plugin',
manifest: { name: 'test', version: '1.0.0' },
});
const result = await context.runtime.llm({
prompt: 'Say "hello" in exactly one word',
model: 'haiku',
maxTokens: 5,
});
expect(result.text).toBeDefined();
expect(result.text.length).toBeGreaterThan(0);
expect(result.usage).toBeDefined();
expect(result.latencyMs).toBeGreaterThan(0);
});
it('should propagate errors from runtime', async () => {
const context = createPluginContext({
pluginId: 'test-plugin',
manifest: { name: 'test', version: '1.0.0' },
});
await expect(context.runtime.llm({
prompt: '',
model: 'nonexistent-model',
})).rejects.toThrow();
});
});
CLI Verification Commands
After implementation, verify the feature is accessible:
# Check TypeScript compilation includes llm method
npx tsc --noEmit --project packages/plugin-sdk/tsconfig.json
# Verify type exports
grep -n "llm" packages/plugin-sdk/src/runtime/context.ts
# Run SDK unit tests
cd packages/plugin-sdk && npm test -- --grep "llm"
Expected Output:
β LLM bridge should pass options correctly β LLM bridge should apply default values β Plugin SDK LLM Integration should execute call
β οΈ Common Pitfalls
Implementation Considerations and Edge Cases
1. Context Isolation Violations
Plugins must not be able to bypass rate limits or access credentials directly.
// β INCORRECT: Exposing raw provider access
export class RuntimeContext {
llm(options: LLMOptions) {
// Must not expose:
return this.providerClient.complete({
apiKey: this.config.apiKey, // Should never be exposed
...
});
}
}
// β
CORRECT: Wrapper maintains security boundary
export class RuntimeContext {
llm(options: LLMOptions) {
return this.bridge.send('runtime:llm', options);
}
}
2. Serialization Boundary Crossing
Data crossing the plugin/host boundary must be serializable. Avoid passing functions or complex objects.
// β INCORRECT: Function in options
await api.runtime.llm({
prompt: '...',
onToken: (token) => console.log(token), // Cannot serialize!
});
// β
CORRECT: Use streaming for real-time output
for await (const chunk of api.runtime.llm.stream({ prompt: '...' })) {
console.log(chunk);
}
3. Model Alias Resolution Timing
Model aliases must resolve before hitting the provider. Configuration changes should invalidate caches.
// Environment variables for alias mapping
# .env
OPENCLAW_MODEL_HAIKU=anthropic/claude-3-haiku-20240307
OPENCLAW_MODEL_SONNET=anthropic/claude-3-5-sonnet-20241022
OPENCLAW_MODEL_GPT4=openai/gpt-4o
4. Streaming Support Complexity
Implementing stream() requires proper backpressure handling and error recovery.
// Streaming must handle plugin disconnection gracefully
async *stream(options: LLMOptions): AsyncGenerator<string> {
try {
const stream = await this.bridge.sendStream('runtime:llm:stream', options);
for await (const chunk of stream) {
yield chunk.text;
}
} catch (error) {
// Clean up stream resources
this.bridge.abortStream(stream.id);
throw error;
} finally {
// Ensure cleanup even on early return
await this.cleanup();
}
}
5. Timeout Configuration
Default timeouts must balance responsiveness with long-running inference.
| Scenario | Recommended timeoutMs |
|---|---|
| Simple extraction | 5,000ms |
| Classification | 8,000ms |
| Summarization | 15,000ms |
| Complex reasoning | 30,000ms+ |
6. Backward Compatibility
When adding llm() to an existing interface, ensure the addition is non-breaking:
// β
Add optional method to existing interface
interface RuntimeContext {
tts: TTSMethod;
stt: STTMethod;
llm?: LLMMethod; // Optional β existing implementations unaffected
}
// β
Or extend with feature detection
const result = api.runtime.llm
? await api.runtime.llm(options)
: await fallbackToDirectCall(options);
π Related Errors
Logically Connected Issues and Patterns
PLUGIN_SDK_MISSING_METHOD
Runtime method not found in plugin context
Occurs when `api.runtime.llm()` is invoked but the method is not implemented in the host runtime.MODEL_ALIAS_UNRESOLVED
Model alias not found in configuration
Triggered when a plugin specifies a model like"haiku"but no alias mapping exists.INFERENCE_TIMEOUT
LLM request exceeded timeout threshold
The configuredtimeoutMswas insufficient for the model and prompt complexity.RATE_LIMIT_EXCEEDED
Rate limit reached for provider/model
Indicates that OpenClaw's rate limiter triggered; plugin should implement exponential backoff.CREDENTIAL_ROTATION_FAILED
All API keys exhausted during rotation
Key rotation exhausted all available credentials; requires human intervention or additional keys.- GitHub Issue:
#342β Add LLM support to plugin runtime
Original feature request tracking this implementation. - GitHub Issue:
#189β Plugin SDK parity for audio vs text
Historical issue noting the asymmetry betweentts/sttand missingllm. - Design Pattern:
api.runtime.*abstraction
Consistent pattern across runtime methods; ensures plugins remain provider-agnostic.