SGLang

Overview

SGL (SGLang) is an OpenAI-compatible local/remote inference engine used for serving models with high throughput. Use it through DeepIntShield with the standard OpenAI-compatible request format. Key features:

OpenAI API compatibility - Identical request/response format
Full streaming support - Server-Sent Events with usage tracking
Tool calling - Complete function definition and execution
Text embeddings - Support for embedding models
Parameter filtering - Removes unsupported fields for compatibility

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/v1/chat/completions`
Responses API	✅	✅	`/v1/chat/completions`
Text Completions	✅	✅	`/v1/completions`
Embeddings	✅	-	`/v1/embeddings`
List Models	✅	-	`/v1/models`
Image Generation	❌	❌	-
Speech (TTS)	❌	❌	-
Transcriptions (STT)	❌	❌	-
Files	❌	❌	-
Batch	❌	❌	-

1. Chat Completions

Request Parameters

SGL supports all standard OpenAI chat completion parameters. For full parameter reference and behavior, see OpenAI Chat Completions.

The following parameters are not supported by SGL and are ignored: prompt_cache_key, verbosity, store, service_tier.

SGL supports all standard OpenAI message types, tools, responses, and streaming formats. For details on message handling, tools, responses, and streaming, refer to OpenAI Chat Completions.

2. Responses API

SGL supports the Responses API with the same parameter support as Chat Completions.

3. Text Completions

SGL supports legacy text completion format:

Parameter	Mapping
`prompt`	Direct pass-through
`max_tokens`	max_tokens
`temperature`, `top_p`	Direct pass-through
`frequency_penalty`, `presence_penalty`	Supported

4. Embeddings

SGL supports text embeddings for vector generation:

Parameter	Notes
`input`	Text or array of texts
`model`	Embedding model name
`encoding_format`	”float” or “base64”
`dimensions`	Model-specific dimension count

Response returns embedding vectors with usage information.

5. List Models

Lists available models from SGL server with capabilities.

Unsupported Features

Feature	Reason
Speech/TTS	Not offered by SGL API
Transcription/STT	Not offered by SGL API
Batch Operations	Not offered by SGL API
File Management	Not offered by SGL API

Caveats

BaseURL Configuration Required

Severity: High Behavior: BaseURL must be explicitly configured Impact: Requests fail without proper configuration

Cache Control Stripped

Severity: Medium Behavior: Cache control directives are removed from messages Impact: Prompt caching features don’t work

Parameter Filtering

Severity: Low Behavior: OpenAI-specific fields filtered out Impact: prompt_cache_key, verbosity, store removed

User Field Size Limit

Severity: Low Behavior: User field > 64 characters silently dropped Impact: Longer user identifiers are lost