Hugging Face
The Hugging Face provider lets you call models hosted across many inference backends (like hf-inference, fal-ai, cerebras, sambanova, etc.) through a single, unified DeepIntShield endpoint.
Overview
Section titled “Overview”Through the Hugging Face provider you get:
- Multiple inference backends: Route requests to 19+ different inference providers
- Dynamic model aliasing: Address any backend model with a single composite model ID
- Heterogeneous request formats: JSON, raw binary, and base64-encoded payloads are handled for you
- Provider-specific constraints: Varying payload limits and format restrictions are enforced automatically
Supported Inference Providers
Section titled “Supported Inference Providers”The Hugging Face provider supports routing to 20+ inference backends. Below is the current list of supported providers and their capabilities (as of December 2025):
| Provider | Chat | Embedding | Speech (TTS) | Transcription (ASR) | Image Generation | Image Generation (stream) | Image Edit | Image Edit (stream) |
|---|---|---|---|---|---|---|---|---|
hf-inference | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ |
cerebras | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
cohere | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
fal-ai | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
featherless-ai | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
fireworks | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
groq | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
hyperbolic | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
nebius | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
novita | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
nscale | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
ovhcloud-ai-endpoints | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
public-ai | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
replicate | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
sambanova | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
scaleway | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
together | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
z-ai | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Model Aliases & Identification
Section titled “Model Aliases & Identification”Unlike standard providers where model IDs are direct strings (e.g., gpt-4), Hugging Face models in DeepIntShield are identified by a composite key to route requests to the correct inference backend.
Format: huggingface/[inference_provider]/[model_id]
- inference_provider: The backend service (e.g.,
hf-inference,fal-ai,cerebras). - model_id: The actual model identifier on Hugging Face Hub (e.g.,
meta-llama/Meta-Llama-3-8B-Instruct).
Example: huggingface/hf-inference/meta-llama/Meta-Llama-3-8B-Instruct
DeepIntShield reads this model string to route each request to the correct inference backend automatically.
Request Handling Differences
Section titled “Request Handling Differences”The Hugging Face provider handles various tasks (Chat, Speech, Transcription) which often require different request structures depending on the underlying inference provider.
Inference Provider Constraints
Section titled “Inference Provider Constraints”Different inference providers have specific limitations and requirements:
Payload Limit
Section titled “Payload Limit”HuggingFace API enforces a 2 MB request body limit across all request types (Chat, Embedding, Speech, Transcription). This constraint applies to:
- JSON request payloads
- Raw audio bytes in transcription requests
- Any other request body data
Impact: Large audio files, extensive chat histories, or bulk embedding requests may need to be split or compressed before sending.
fal-ai Audio Format Restrictions
Section titled “fal-ai Audio Format Restrictions”The fal-ai provider has strict audio format requirements:
- Supported Format: Only MP3 (
audio/mpeg) is accepted - Rejected Formats: WAV (
audio/wav) and other formats are rejected with a clear error:fal-ai provider does not support audio/wav format; please use a different format like mp3 or ogg - Encoding: You can submit standard audio bytes; DeepIntShield base64-encodes them into the Data URI that
fal-aiexpects.
Speech (Text-to-Speech)
Section titled “Speech (Text-to-Speech)”Text-to-Speech (TTS) requests use the standard DeepIntShield /v1/audio/speech endpoint. You provide the text and a huggingface/{provider}/{model} model ID; DeepIntShield builds the backend-specific request for you. No special pipeline tagging is required on your side, even when the model is tagged as text-to-speech on the Hub.
Transcription (Automatic Speech Recognition)
Section titled “Transcription (Automatic Speech Recognition)”For transcription, the request format that reaches the backend differs depending on the inference provider you target. DeepIntShield handles this for you - you always send your audio to the standard /v1/audio/transcriptions endpoint.
1. hf-inference (Raw Bytes)
Section titled “1. hf-inference (Raw Bytes)”For the standard hf-inference backend, the audio is forwarded as raw bytes.
- Audio Format: Send the audio with its native mime type (e.g.,
audio/mpeg). - Payload Limit: Maximum 2 MB for the raw audio bytes.
- What you do: Submit your audio file to
/v1/audio/transcriptionsas usual - no special encoding required.
2. fal-ai (Base64 Data URI)
Section titled “2. fal-ai (Base64 Data URI)”For the fal-ai backend, the audio is forwarded as a base64-encoded Data URI.
- Audio Format Restriction: Only MP3 (
audio/mpeg) is supported. WAV files are rejected with a clear error. - What you do: Submit your MP3 audio file as usual; DeepIntShield base64-encodes it into the Data URI that
fal-airequires.
In both cases you use the same DeepIntShield request - the backend-specific encoding (raw bytes vs. base64 Data URI) is applied for you based on the inference provider in your model ID.
Image Generation
Section titled “Image Generation”The Hugging Face provider supports image generation through multiple inference providers, each with different request formats and capabilities.
Supported Inference Providers
Section titled “Supported Inference Providers”| Provider | Non-Streaming | Streaming | Notes |
|---|---|---|---|
hf-inference | ✅ | ❌ | Simple prompt-only format, returns raw image bytes |
fal-ai | ✅ | ✅ | Full parameter support, supports streaming via Server-Sent Events |
nebius | ✅ | ❌ | Uses Nebius-specific format with width/height, LoRAs support |
together | ✅ | ❌ | OpenAI-compatible format |
Supported Parameters by Backend
Section titled “Supported Parameters by Backend”Send a standard /v1/images/generations request with the model ID huggingface/{provider}/{model_id}. The supported parameters depend on the backend:
1. hf-inference
Section titled “1. hf-inference”The simplest backend, only requires a prompt:
- Supported Input:
promptonly. - Response: A generated image, returned as base64 in
b64_json. - Limitations: No size, quality, or other parameter support.
2. fal-ai
Section titled “2. fal-ai”The most feature-rich backend with extensive parameter support:
- Supported Parameters:
n,size(e.g.,"1024x1024"),output_format("jpg"is treated as"jpeg"),response_format, andmoderation. - Extra Parameters:
guidance_scale,acceleration,enable_prompt_expansion,enable_safety_checker. - Response: Images returned in
data[]withurland/orb64_json.
3. nebius
Section titled “3. nebius”Supports LoRAs (see the Nebius provider docs):
- Supported Parameters:
size(e.g.,"1024x1024"),output_format,seed,negative_prompt. - Extra Parameters:
num_inference_steps,guidance_scale, andloras(provided as a{"url": scale}map or[{"url": "...", "scale": ...}]array). - Response: Images returned in
data[].
4. together
Section titled “4. together”OpenAI-compatible:
- Supported Parameters:
prompt,model,response_format,size,n, andnum_inference_steps. - Response: Images returned in
data[].
Image Generation Streaming
Section titled “Image Generation Streaming”Only fal-ai supports streaming for HuggingFace image generation. Streaming uses Server-Sent Events (SSE).
To stream, send the same image-generation request with "stream": true. The same parameters as non-streaming apply (prompt, response_format, n, size, etc.). You receive partial image chunks until a final completed chunk delivers the last image, as either a url or b64_json.
Example Usage
Section titled “Example Usage”# fal-aicurl -X POST https://app.deepintshield.com/v1/images/generations \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "huggingface/fal-ai/fal-ai/flux/dev", "prompt": "A futuristic cityscape at sunset", "size": "1024x1024", "n": 2, "output_format": "png", "response_format": "url" }'# Streaming (fal-ai only)curl -X POST https://app.deepintshield.com/v1/images/generations \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "huggingface/fal-ai/fal-ai/flux/dev", "prompt": "A futuristic cityscape at sunset", "size": "1024x1024", "stream": true }'Provider-Specific Notes
Section titled “Provider-Specific Notes”- fal-ai:
output_format: "jpg"is treated as"jpeg"- Set
moderation: "low"to disable the safety checker
- nebius:
- LoRAs can be provided as a
{"url": scale}map or[{"url": "...", "scale": ...}]array
- LoRAs can be provided as a
- hf-inference:
- Minimal format, only prompt supported; returns the image as base64
- together:
- OpenAI-compatible format
Image Edit
Section titled “Image Edit”Only fal-ai supports image editing for HuggingFace. Image edit requests are routed to fal-ai inference provider.
Request Parameters
| Parameter | Type | Required | Notes |
|---|---|---|---|
model | string | ✅ | Model identifier (must be huggingface/fal-ai/{model_id}) |
prompt | string | ✅ | Text description of the edit |
image[] | binary | ✅ | Image file(s) to edit (supports multiple images for some models) |
n | int | ❌ | Number of images to generate (1-10) |
size | string | ❌ | Image size: "WxH" format (e.g., "1024x1024") |
output_format | string | ❌ | Output format: "png", "webp", "jpeg" (note: "jpg" is normalized to "jpeg") |
seed | int | ❌ | Seed for reproducibility (pass as an extra param) |
num_inference_steps | int | ❌ | Number of inference steps (pass as an extra param) |
guidance_scale | float | ❌ | Guidance scale (pass as an extra param) |
acceleration | string | ❌ | Acceleration mode (pass as an extra param) |
enable_safety_checker | bool | ❌ | Enable safety checker (pass as an extra param) |
use_image_urls | bool | ❌ | Override automatic image field selection (pass as an extra param) |
Behavior
- Supported Backend: Only the
fal-aiinference provider supports image edit. Requests targeting other backends return an unsupported-operation error. - Image Handling: Submit your image file(s) directly. DeepIntShield detects the format (
image/jpeg,image/webp,image/png) and encodes them into the form the backend expects. - Single vs. Multiple Images: For multi-image models (e.g.,
fal-ai/flux-2/edit,fal-ai/flux-2-pro/edit) you can pass several images; single-image models (e.g.,fal-ai/flux-pro/kontext,fal-ai/flux/dev/image-to-image) accept one. The correct field is selected automatically, and you can override it with theuse_image_urlsextra param. - Normalization:
output_format: "jpg"is normalized to"jpeg";sizeis given in"WxH"form (e.g.,"1024x1024").
Response
- Non-streaming: Edited images are returned in
data[]withurland/orb64_json, the same as image generation. - Streaming: Partial image chunks (
type: "image_edit.partial_image") stream until a finaltype: "image_edit.completed"chunk delivers the last image, as either aurlorb64_json.
Image Variation
Image variation is not supported by HuggingFace.
Request Format Handling
Section titled “Request Format Handling”Different inference backends expect different request shapes for the same operation. You always use the standard DeepIntShield endpoints (for example /v1/embeddings for embeddings) regardless of backend - the backend-specific request format is handled for you.
Model Discovery
Section titled “Model Discovery”DeepIntShield discovers available Hugging Face models for you via the Hugging Face Hub API.
List Models Behavior
Section titled “List Models Behavior”When you call GET /v1/models, DeepIntShield:
- Queries multiple backends to gather available models
- Filters by capability (chat, embedding, text-to-speech, transcription, image) so each model is listed under the methods it supports
- Aggregates the results into a single unified list
- Returns model IDs in the
huggingface/{provider}/{model_id}format you use in requests
Automatic Re-resolution
Section titled “Automatic Re-resolution”DeepIntShield maps each Hugging Face model ID to the identifier the target backend expects (for example, meta-llama/Meta-Llama-3-8B-Instruct resolves to the right per-backend model name for cerebras, groq, and others). If a backend reports that a mapping is stale (HTTP 404), DeepIntShield refreshes it and retries automatically, so you don’t have to handle the retry yourself.
Best Practices
Section titled “Best Practices”When working with the Hugging Face provider:
- Check Payload Size: Ensure request bodies are under 2 MB
- Audio Format: Use MP3 for
fal-ai, avoid WAV files - Model Aliases: Always specify the provider in the model string:
huggingface/{provider}/{model} - Provider Selection: Use the capability table above to pick a backend that supports your operation (chat, embedding, TTS, ASR, image)
- Verify Capabilities: Confirm the model supports your use case (chat, embedding, TTS, ASR, image) before sending requests