SGLang
Overview
Section titled “Overview”SGL (SGLang) is an OpenAI-compatible local/remote inference engine used for serving models with high throughput. Use it through DeepIntShield with the standard OpenAI-compatible request format. Key features:
- OpenAI API compatibility - Identical request/response format
- Full streaming support - Server-Sent Events with usage tracking
- Tool calling - Complete function definition and execution
- Text embeddings - Support for embedding models
- Parameter filtering - Removes unsupported fields for compatibility
Supported Operations
Section titled “Supported Operations”| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v1/chat/completions |
| Responses API | ✅ | ✅ | /v1/chat/completions |
| Text Completions | ✅ | ✅ | /v1/completions |
| Embeddings | ✅ | - | /v1/embeddings |
| List Models | ✅ | - | /v1/models |
| Image Generation | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
1. Chat Completions
Section titled “1. Chat Completions”Request Parameters
Section titled “Request Parameters”SGL supports all standard OpenAI chat completion parameters. For full parameter reference and behavior, see OpenAI Chat Completions.
The following parameters are not supported by SGL and are ignored: prompt_cache_key, verbosity, store, service_tier.
SGL supports all standard OpenAI message types, tools, responses, and streaming formats. For details on message handling, tools, responses, and streaming, refer to OpenAI Chat Completions.
2. Responses API
Section titled “2. Responses API”SGL supports the Responses API with the same parameter support as Chat Completions.
3. Text Completions
Section titled “3. Text Completions”SGL supports legacy text completion format:
| Parameter | Mapping |
|---|---|
prompt | Direct pass-through |
max_tokens | max_tokens |
temperature, top_p | Direct pass-through |
frequency_penalty, presence_penalty | Supported |
4. Embeddings
Section titled “4. Embeddings”SGL supports text embeddings for vector generation:
| Parameter | Notes |
|---|---|
input | Text or array of texts |
model | Embedding model name |
encoding_format | ”float” or “base64” |
dimensions | Model-specific dimension count |
Response returns embedding vectors with usage information.
5. List Models
Section titled “5. List Models”Lists available models from SGL server with capabilities.
Unsupported Features
Section titled “Unsupported Features”| Feature | Reason |
|---|---|
| Speech/TTS | Not offered by SGL API |
| Transcription/STT | Not offered by SGL API |
| Batch Operations | Not offered by SGL API |
| File Management | Not offered by SGL API |
Caveats
Section titled “Caveats”BaseURL Configuration Required
Severity: High Behavior: BaseURL must be explicitly configured Impact: Requests fail without proper configuration
Cache Control Stripped
Severity: Medium Behavior: Cache control directives are removed from messages Impact: Prompt caching features don’t work
Parameter Filtering
Severity: Low Behavior: OpenAI-specific fields filtered out Impact: prompt_cache_key, verbosity, store removed
User Field Size Limit
Severity: Low Behavior: User field > 64 characters silently dropped Impact: Longer user identifiers are lost