Fireworks AI

Overview

Fireworks AI is an OpenAI-compatible provider offering the same API interface with identical parameter handling. Key features:

Full OpenAI compatibility - Identical request/response format
Streaming support - Server-Sent Events with delta-based updates
Tool calling - Complete function definition and execution support
Native text completions - Served directly via /v1/completions
Embeddings - Served directly via /v1/embeddings

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/v1/chat/completions`
Responses API	✅	✅	`/v1/responses`
Text Completions	✅	✅	`/v1/completions`
Embeddings	✅	-	`/v1/embeddings`
List Models	✅	-	`/v1/models`
Image Generation	❌	❌	-
Speech (TTS)	❌	❌	-
Transcriptions (STT)	❌	❌	-
Files	❌	❌	-
Batch	❌	❌	-

Configuration

Add Fireworks AI as a provider with a single API key. The default base URL is https://api.fireworks.ai/inference; override it per workspace if you front Fireworks behind a custom gateway.

{
  "provider": "fireworks",
  "keys": [{ "value": "env.FIREWORKS_API_KEY", "models": [], "weight": 1.0 }]
}

Model identifiers use Fireworks’ fully-qualified form, e.g. accounts/fireworks/models/llama-v3p1-70b-instruct.

1. Chat Completions

Request Parameters

Fireworks AI supports all standard OpenAI chat completion parameters. For full parameter reference and behavior, see OpenAI Chat Completions.

2. Responses API

Unlike chat-only OpenAI-compatible providers, Fireworks exposes a native Responses endpoint, so DeepIntShield forwards Responses requests directly to /v1/responses rather than converting them to chat completions.

3. Text Completions

Fireworks supports the legacy completions endpoint natively at /v1/completions, including streaming.

4. Embeddings

Embedding requests are forwarded to /v1/embeddings using the standard OpenAI embedding request/response shape.

5. List Models

Fireworks’ model listing endpoint returns available models with their context lengths and capabilities at /v1/models.