April 28, 2026

Feature Request: api.runtime.llm() — Plugin SDK Inference Method

Implement api.runtime.llm() to enable plugins to make LLM calls through OpenClaw's inference stack, inheriting routing, retries, rate limiting, and key rotation.

🔍 Symptoms

Current State: Absence of Native LLM Inference in Plugin SDK

Plugins requiring large language model capabilities currently exhibit the following manifestations:

1. Direct Provider API Dependencies

Plugin code must contain explicit provider SDK initialization and API calls:

// Current workaround — plugin must manage provider specifics
import OpenAI from 'openai';

class MyPlugin {
  async process(input: string): Promise {
    const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
    const response = await client.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: input }],
    });
    return response.choices[0].message.content;
  }
}

2. Duplicate Inference Paths

When multiple plugins or core runtime make LLM calls, each implements independent:

API key management
Retry logic
Rate limiting
Timeout handling

3. Missing Abstraction Layer

Reference to api.runtime.llm() produces a compile-time error:

// These exist but llm() does not:
api.runtime.tts({ text: "...", voice: "alloy" });
api.runtime.stt({ audio: buffer });

// This does not exist:
const result = await api.runtime.llm({ prompt: "..." });
// TypeError: api.runtime.llm is not a function

4. Inconsistent Observability

LLM calls from plugins bypass the centralized logging and telemetry that core runtime calls receive.

🧠 Root Cause

Architectural Gap in Plugin SDK Design

The absence of api.runtime.llm() stems from an incomplete parity in the runtime abstraction layer.

1. Asymmetric API Surface

The OpenClaw Plugin SDK exposes audio inference methods but lacks text inference equivalents:

Method	Status	File Reference
`api.runtime.tts()`	Implemented	`packages/plugin-sdk/src/runtime/tts.ts`
`api.runtime.stt()`	Implemented	`packages/plugin-sdk/src/runtime/stt.ts`
`api.runtime.llm()`	Missing	N/A

2. Dual Inference Path Problem

The current architecture forces plugins into a pattern that undermines platform consistency:

┌─────────────────────────────────────────────────────────┐ │ OpenClaw Gateway │ ├─────────────────────────────────────────────────────────┤ │ Core Runtime │ Plugin Environment │ │ ──────────── │ ───────────────── │ │ ✅ Model routing │ ❌ Direct provider calls │ │ ✅ Key rotation │ ❌ Statically embedded keys │ │ ✅ Rate limiting │ ❌ No throttling │ │ ✅ Retry logic │ ❌ No backoff │ │ ✅ Cost aggregation │ ❌ Invisible to platform │ └─────────────────────────────────────────────────────────┘

3. Configuration Fragmentation

When plugins hardcode provider calls, model changes require plugin code modifications rather than config updates:

// Current: Model change requires code edit
const MODEL = 'anthropic/claude-3-5-sonnet-20241022';

// Desired: Model change via config
// openclaw.yaml: runtime.llm.defaultModel: "haiku"
const result = await api.runtime.llm({ prompt: "..." });

4. Missing Interface Contract

The RuntimeContext interface in packages/plugin-sdk/src/runtime/context.ts lacks an LLM method signature:

// Current interface (incomplete)
interface RuntimeContext {
  tts(options: TTSOptions): Promise<TTSResult>;
  stt(options: STTOptions): Promise<STTResult>;
  // llm method missing
}

// Required addition
interface LLMOptions {
  prompt: string;
  model?: string;
  system?: string;
  maxTokens?: number;
  timeoutMs?: number;
}

interface LLMResult {
  text: string;
  usage?: {
    inputTokens: number;
    outputTokens: number;
  };
  model?: string;
}

🛠️ Step-by-Step Fix

Implementation Plan for `api.runtime.llm()`

Phase 1: Interface Definition

File: packages/plugin-sdk/src/runtime/types.ts

Add the LLM type definitions:

export interface LLMOptions {
  /** The primary input prompt for the LLM */
  prompt: string;
  /** Model alias or full provider/model string (e.g., "haiku", "claude-3-5-sonnet") */
  model?: string;
  /** System prompt to prepend to the conversation */
  system?: string;
  /** Maximum tokens in the response */
  maxTokens?: number;
  /** Request timeout in milliseconds */
  timeoutMs?: number;
  /** Temperature for response randomness (0-2) */
  temperature?: number;
  /** Stop sequences to terminate generation */
  stopSequences?: string[];
}

export interface LLMResult {
  /** The generated text response */
  text: string;
  /** Token usage statistics */
  usage?: {
    inputTokens: number;
    outputTokens: number;
    totalTokens: number;
  };
  /** The actual model used (resolved from alias) */
  model: string;
  /** Latency in milliseconds */
  latencyMs: number;
}

export interface LLMError {
  code: 'TIMEOUT' | 'RATE_LIMITED' | 'MODEL_UNAVAILABLE' | 'INVALID_REQUEST' | 'AUTH_FAILED';
  message: string;
  retryable: boolean;
  originalError?: Error;
}

Phase 2: Runtime Bridge Implementation

File: packages/plugin-sdk/src/runtime/llm-bridge.ts

Create the bridge module that communicates with the host process:

import { RuntimeBridge } from './bridge';
import { LLMOptions, LLMResult, LLMError } from './types';

export class LLM bridge {
  private bridge: RuntimeBridge;

  constructor(bridge: RuntimeBridge) {
    this.bridge = bridge;
  }

  async call(options: LLMOptions): Promise<LLMResult> {
    const startTime = Date.now();

    const response = await this.bridge.send<{
      success: boolean;
      result?: LLMResult;
      error?: LLMError;
    }>('runtime:llm', {
      prompt: options.prompt,
      model: options.model ?? 'default',
      system: options.system,
      maxTokens: options.maxTokens ?? 1024,
      timeoutMs: options.timeoutMs ?? 30000,
      temperature: options.temperature,
      stopSequences: options.stopSequences,
    });

    if (!response.success || !response.result) {
      const error = response.error ?? {
        code: 'INVALID_REQUEST',
        message: 'Unknown error from runtime',
        retryable: false,
      };
      throw new LLMError(error);
    }

    return {
      ...response.result,
      latencyMs: Date.now() - startTime,
    };
  }

  /** Streaming support for real-time output */
  async *stream(options: LLMOptions): AsyncGenerator<string> {
    const stream = await this.bridge.sendStream('runtime:llm:stream', {
      prompt: options.prompt,
      model: options.model ?? 'default',
      system: options.system,
      maxTokens: options.maxTokens ?? 1024,
      timeoutMs: options.timeoutMs ?? 30000,
    });

    for await (const chunk of stream) {
      if (chunk.text) {
        yield chunk.text;
      }
    }
  }
}

Phase 3: Extend RuntimeContext Interface

File: packages/plugin-sdk/src/runtime/context.ts

import { LLMOptions, LLMResult } from './types';
import { LLM bridge } from './llm-bridge';

export interface RuntimeContext {
  // ... existing tts, stt methods ...

  /**
   * Execute a large language model inference request through OpenClaw's
   * inference stack, inheriting routing, retries, rate limiting, and
   * key rotation.
   *
   * @param options - LLM invocation options
   * @returns Promise resolving to the LLM response
   *
   * @example
   * const result = await api.runtime.llm({
   *   prompt: "Extract the company name from: Acme Corp founded in 2020",
   *   model: "haiku",
   *   maxTokens: 50,
   * });
   * console.log(result.text); // "Acme Corp"
   */
  llm(options: LLMOptions): Promise<LLMResult>;
}

Phase 4: Plugin SDK Public Export

File: packages/plugin-sdk/src/index.ts

Ensure the runtime context is properly initialized and exported:

export { createPluginContext } from './context';
export type { LLMOptions, LLMResult, LLMError } from './runtime/types';

Phase 5: Host Runtime Handler (Framework Side)

File: packages/core/src/runtime/handlers/llm-handler.ts

The host process handles the actual LLM invocation with full platform features:

import { ModelRouter } from '../routing/model-router';
import { RateLimiter } from '../middleware/rate-limiter';
import { KeyRotator } from '../auth/key-rotation';
import { RetryPolicy } from '../retry/policy';

export class LLMHandler {
  constructor(
    private router: ModelRouter,
    private rateLimiter: RateLimiter,
    private keyRotator: KeyRotator,
    private retryPolicy: RetryPolicy,
  ) {}

  async handle(request: LLMInvocation): Promise<LLMResult> {
    // Resolve model alias to provider/model
    const resolved = this.router.resolve(request.model);

    // Check rate limits
    await this.rateLimiter.check(`llm:${resolved.provider}`, resolved.model);

    // Acquire rotated credentials
    const credentials = await this.keyRotator.getCredentials(resolved.provider);

    // Execute with retry policy
    return this.retryPolicy.execute(async () => {
      return this.providerClient.complete({
        provider: resolved.provider,
        model: resolved.model,
        prompt: request.prompt,
        system: request.system,
        maxTokens: request.maxTokens,
        timeoutMs: request.timeoutMs,
        credentials,
      });
    });
  }
}

🧪 Verification

Testing the `api.runtime.llm()` Implementation

Unit Tests

Test File: packages/plugin-sdk/src/runtime/__tests__/llm-bridge.test.ts

import { describe, it, expect, vi } from 'vitest';
import { LLM bridge } from '../llm-bridge';

describe('LLM bridge', () => {
  it('should pass options correctly to bridge', async () => {
    const mockBridge = {
      send: vi.fn().mockResolvedValue({
        success: true,
        result: {
          text: 'Test response',
          model: 'gpt-4o-mini',
          usage: { inputTokens: 10, outputTokens: 5, totalTokens: 15 },
        },
      }),
    };

    const bridge = new LLM bridge(mockBridge as any);
    const result = await bridge.call({
      prompt: 'What is 2+2?',
      model: 'haiku',
      maxTokens: 50,
    });

    expect(result.text).toBe('Test response');
    expect(mockBridge.send).toHaveBeenCalledWith('runtime:llm', expect.objectContaining({
      prompt: 'What is 2+2?',
      model: 'haiku',
      maxTokens: 50,
    }));
  });

  it('should apply default values for optional parameters', async () => {
    const mockBridge = {
      send: vi.fn().mockResolvedValue({
        success: true,
        result: { text: 'ok', model: 'default' },
      }),
    };

    const bridge = new LLM bridge(mockBridge as any);
    await bridge.call({ prompt: 'test' });

    expect(mockBridge.send).toHaveBeenCalledWith('runtime:llm', expect.objectContaining({
      model: 'default',
      maxTokens: 1024,
      timeoutMs: 30000,
    }));
  });
});

Integration Tests

Test File: packages/plugin-sdk/integration/llm-integration.test.ts

import { createPluginContext } from '../src';

describe('Plugin SDK LLM Integration', () => {
  it('should execute LLM call through runtime', async () => {
    const context = createPluginContext({
      pluginId: 'test-plugin',
      manifest: { name: 'test', version: '1.0.0' },
    });

    const result = await context.runtime.llm({
      prompt: 'Say "hello" in exactly one word',
      model: 'haiku',
      maxTokens: 5,
    });

    expect(result.text).toBeDefined();
    expect(result.text.length).toBeGreaterThan(0);
    expect(result.usage).toBeDefined();
    expect(result.latencyMs).toBeGreaterThan(0);
  });

  it('should propagate errors from runtime', async () => {
    const context = createPluginContext({
      pluginId: 'test-plugin',
      manifest: { name: 'test', version: '1.0.0' },
    });

    await expect(context.runtime.llm({
      prompt: '',
      model: 'nonexistent-model',
    })).rejects.toThrow();
  });
});

CLI Verification Commands

After implementation, verify the feature is accessible:

# Check TypeScript compilation includes llm method
npx tsc --noEmit --project packages/plugin-sdk/tsconfig.json

# Verify type exports
grep -n "llm" packages/plugin-sdk/src/runtime/context.ts

# Run SDK unit tests
cd packages/plugin-sdk && npm test -- --grep "llm"

Expected Output:

✓ LLM bridge should pass options correctly ✓ LLM bridge should apply default values ✓ Plugin SDK LLM Integration should execute call

⚠️ Common Pitfalls

Implementation Considerations and Edge Cases

1. Context Isolation Violations

Plugins must not be able to bypass rate limits or access credentials directly.

// ❌ INCORRECT: Exposing raw provider access
export class RuntimeContext {
  llm(options: LLMOptions) {
    // Must not expose:
    return this.providerClient.complete({
      apiKey: this.config.apiKey,  // Should never be exposed
      ...
    });
  }
}

// ✅ CORRECT: Wrapper maintains security boundary
export class RuntimeContext {
  llm(options: LLMOptions) {
    return this.bridge.send('runtime:llm', options);
  }
}

2. Serialization Boundary Crossing

Data crossing the plugin/host boundary must be serializable. Avoid passing functions or complex objects.

// ❌ INCORRECT: Function in options
await api.runtime.llm({
  prompt: '...',
  onToken: (token) => console.log(token),  // Cannot serialize!
});

// ✅ CORRECT: Use streaming for real-time output
for await (const chunk of api.runtime.llm.stream({ prompt: '...' })) {
  console.log(chunk);
}

3. Model Alias Resolution Timing

Model aliases must resolve before hitting the provider. Configuration changes should invalidate caches.

// Environment variables for alias mapping
# .env
OPENCLAW_MODEL_HAIKU=anthropic/claude-3-haiku-20240307
OPENCLAW_MODEL_SONNET=anthropic/claude-3-5-sonnet-20241022
OPENCLAW_MODEL_GPT4=openai/gpt-4o

4. Streaming Support Complexity

Implementing stream() requires proper backpressure handling and error recovery.

// Streaming must handle plugin disconnection gracefully
async *stream(options: LLMOptions): AsyncGenerator<string> {
  try {
    const stream = await this.bridge.sendStream('runtime:llm:stream', options);
    for await (const chunk of stream) {
      yield chunk.text;
    }
  } catch (error) {
    // Clean up stream resources
    this.bridge.abortStream(stream.id);
    throw error;
  } finally {
    // Ensure cleanup even on early return
    await this.cleanup();
  }
}

5. Timeout Configuration

Default timeouts must balance responsiveness with long-running inference.

Scenario	Recommended `timeoutMs`
Simple extraction	5,000ms
Classification	8,000ms
Summarization	15,000ms
Complex reasoning	30,000ms+

6. Backward Compatibility

When adding llm() to an existing interface, ensure the addition is non-breaking:

// ✅ Add optional method to existing interface
interface RuntimeContext {
  tts: TTSMethod;
  stt: STTMethod;
  llm?: LLMMethod;  // Optional — existing implementations unaffected
}

// ✅ Or extend with feature detection
const result = api.runtime.llm
  ? await api.runtime.llm(options)
  : await fallbackToDirectCall(options);

Logically Connected Issues and Patterns

PLUGIN_SDK_MISSING_METHOD
Runtime method not found in plugin context
Occurs when `api.runtime.llm()` is invoked but the method is not implemented in the host runtime.
MODEL_ALIAS_UNRESOLVED
Model alias not found in configuration
Triggered when a plugin specifies a model like "haiku" but no alias mapping exists.
INFERENCE_TIMEOUT
LLM request exceeded timeout threshold
The configured timeoutMs was insufficient for the model and prompt complexity.
RATE_LIMIT_EXCEEDED
Rate limit reached for provider/model
Indicates that OpenClaw's rate limiter triggered; plugin should implement exponential backoff.
CREDENTIAL_ROTATION_FAILED
All API keys exhausted during rotation
Key rotation exhausted all available credentials; requires human intervention or additional keys.
GitHub Issue: #342 — Add LLM support to plugin runtime
Original feature request tracking this implementation.
GitHub Issue: #189 — Plugin SDK parity for audio vs text
Historical issue noting the asymmetry between tts/stt and missing llm.
Design Pattern: api.runtime.* abstraction
Consistent pattern across runtime methods; ensures plugins remain provider-agnostic.