Hugging Face

The Hugging Face provider lets you call models hosted across many inference backends (like hf-inference, fal-ai, cerebras, sambanova, etc.) through a single, unified DeepIntShield endpoint.

Overview

Through the Hugging Face provider you get:

Multiple inference backends: Route requests to 19+ different inference providers
Dynamic model aliasing: Address any backend model with a single composite model ID
Heterogeneous request formats: JSON, raw binary, and base64-encoded payloads are handled for you
Provider-specific constraints: Varying payload limits and format restrictions are enforced automatically

Supported Inference Providers

The Hugging Face provider supports routing to 20+ inference backends. Below is the current list of supported providers and their capabilities (as of December 2025):

Provider	Chat	Embedding	Speech (TTS)	Transcription (ASR)	Image Generation	Image Generation (stream)	Image Edit	Image Edit (stream)
`hf-inference`	✅	✅	❌	✅	✅	❌	❌	❌
`cerebras`	✅	❌	❌	❌	❌	❌	❌	❌
`cohere`	✅	❌	❌	❌	❌	❌	❌	❌
`fal-ai`	❌	❌	✅	✅	✅	✅	✅	✅
`featherless-ai`	✅	❌	❌	❌	❌	❌	❌	❌
`fireworks`	✅	❌	❌	❌	❌	❌	❌	❌
`groq`	✅	❌	❌	❌	❌	❌	❌	❌
`hyperbolic`	✅	❌	❌	❌	❌	❌	❌	❌
`nebius`	✅	✅	❌	❌	✅	❌	❌	❌
`novita`	✅	❌	❌	❌	❌	❌	❌	❌
`nscale`	✅	❌	❌	❌	❌	❌	❌	❌
`ovhcloud-ai-endpoints`	✅	❌	❌	❌	❌	❌	❌	❌
`public-ai`	✅	❌	❌	❌	❌	❌	❌	❌
`replicate`	❌	❌	✅	✅	❌	❌	❌	❌
`sambanova`	✅	✅	❌	❌	❌	❌	❌	❌
`scaleway`	✅	✅	❌	❌	❌	❌	❌	❌
`together`	✅	❌	❌	❌	✅	❌	❌	❌
`z-ai`	✅	❌	❌	❌	❌	❌	❌	❌

Model Aliases & Identification

Unlike standard providers where model IDs are direct strings (e.g., gpt-4), Hugging Face models in DeepIntShield are identified by a composite key to route requests to the correct inference backend.

Format: huggingface/[inference_provider]/[model_id]

inference_provider: The backend service (e.g., hf-inference, fal-ai, cerebras).
model_id: The actual model identifier on Hugging Face Hub (e.g., meta-llama/Meta-Llama-3-8B-Instruct).

Example: huggingface/hf-inference/meta-llama/Meta-Llama-3-8B-Instruct

DeepIntShield reads this model string to route each request to the correct inference backend automatically.

Request Handling Differences

The Hugging Face provider handles various tasks (Chat, Speech, Transcription) which often require different request structures depending on the underlying inference provider.

Inference Provider Constraints

Different inference providers have specific limitations and requirements:

Payload Limit

HuggingFace API enforces a 2 MB request body limit across all request types (Chat, Embedding, Speech, Transcription). This constraint applies to:

JSON request payloads
Raw audio bytes in transcription requests
Any other request body data

Impact: Large audio files, extensive chat histories, or bulk embedding requests may need to be split or compressed before sending.

`fal-ai` Audio Format Restrictions

The fal-ai provider has strict audio format requirements:

Supported Format: Only MP3 (audio/mpeg) is accepted
Rejected Formats: WAV (audio/wav) and other formats are rejected with a clear error: fal-ai provider does not support audio/wav format; please use a different format like mp3 or ogg
Encoding: You can submit standard audio bytes; DeepIntShield base64-encodes them into the Data URI that fal-ai expects.

Speech (Text-to-Speech)

Text-to-Speech (TTS) requests use the standard DeepIntShield /v1/audio/speech endpoint. You provide the text and a huggingface/{provider}/{model} model ID; DeepIntShield builds the backend-specific request for you. No special pipeline tagging is required on your side, even when the model is tagged as text-to-speech on the Hub.

Transcription (Automatic Speech Recognition)

For transcription, the request format that reaches the backend differs depending on the inference provider you target. DeepIntShield handles this for you - you always send your audio to the standard /v1/audio/transcriptions endpoint.

1. `hf-inference` (Raw Bytes)

For the standard hf-inference backend, the audio is forwarded as raw bytes.

Audio Format: Send the audio with its native mime type (e.g., audio/mpeg).
Payload Limit: Maximum 2 MB for the raw audio bytes.
What you do: Submit your audio file to /v1/audio/transcriptions as usual - no special encoding required.

2. `fal-ai` (Base64 Data URI)

For the fal-ai backend, the audio is forwarded as a base64-encoded Data URI.

Audio Format Restriction: Only MP3 (audio/mpeg) is supported. WAV files are rejected with a clear error.
What you do: Submit your MP3 audio file as usual; DeepIntShield base64-encodes it into the Data URI that fal-ai requires.

In both cases you use the same DeepIntShield request - the backend-specific encoding (raw bytes vs. base64 Data URI) is applied for you based on the inference provider in your model ID.

Image Generation

The Hugging Face provider supports image generation through multiple inference providers, each with different request formats and capabilities.

Supported Inference Providers

Provider	Non-Streaming	Streaming	Notes
`hf-inference`	✅	❌	Simple prompt-only format, returns raw image bytes
`fal-ai`	✅	✅	Full parameter support, supports streaming via Server-Sent Events
`nebius`	✅	❌	Uses Nebius-specific format with width/height, LoRAs support
`together`	✅	❌	OpenAI-compatible format

Supported Parameters by Backend

Send a standard /v1/images/generations request with the model ID huggingface/{provider}/{model_id}. The supported parameters depend on the backend:

1. `hf-inference`

The simplest backend, only requires a prompt:

Supported Input: prompt only.
Response: A generated image, returned as base64 in b64_json.
Limitations: No size, quality, or other parameter support.

2. `fal-ai`

The most feature-rich backend with extensive parameter support:

Supported Parameters: n, size (e.g., "1024x1024"), output_format ("jpg" is treated as "jpeg"), response_format, and moderation.
Extra Parameters: guidance_scale, acceleration, enable_prompt_expansion, enable_safety_checker.
Response: Images returned in data[] with url and/or b64_json.

3. `nebius`

Supports LoRAs (see the Nebius provider docs):

Supported Parameters: size (e.g., "1024x1024"), output_format, seed, negative_prompt.
Extra Parameters: num_inference_steps, guidance_scale, and loras (provided as a {"url": scale} map or [{"url": "...", "scale": ...}] array).
Response: Images returned in data[].

4. `together`

OpenAI-compatible:

Supported Parameters: prompt, model, response_format, size, n, and num_inference_steps.
Response: Images returned in data[].

Image Generation Streaming

Only fal-ai supports streaming for HuggingFace image generation. Streaming uses Server-Sent Events (SSE).

To stream, send the same image-generation request with "stream": true. The same parameters as non-streaming apply (prompt, response_format, n, size, etc.). You receive partial image chunks until a final completed chunk delivers the last image, as either a url or b64_json.

Example Usage

# fal-ai
curl -X POST https://app.deepintshield.com/v1/images/generations \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "huggingface/fal-ai/fal-ai/flux/dev",
    "prompt": "A futuristic cityscape at sunset",
    "size": "1024x1024",
    "n": 2,
    "output_format": "png",
    "response_format": "url"
  }'

# Streaming (fal-ai only)
curl -X POST https://app.deepintshield.com/v1/images/generations \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "huggingface/fal-ai/fal-ai/flux/dev",
    "prompt": "A futuristic cityscape at sunset",
    "size": "1024x1024",
    "stream": true
  }'

Provider-Specific Notes

fal-ai:
- output_format: "jpg" is treated as "jpeg"
- Set moderation: "low" to disable the safety checker
nebius:
- LoRAs can be provided as a {"url": scale} map or [{"url": "...", "scale": ...}] array
hf-inference:
- Minimal format, only prompt supported; returns the image as base64
together:
- OpenAI-compatible format

Image Edit

Only fal-ai supports image editing for HuggingFace. Image edit requests are routed to fal-ai inference provider.

Request Parameters

Parameter	Type	Required	Notes
`model`	string	✅	Model identifier (must be `huggingface/fal-ai/{model_id}`)
`prompt`	string	✅	Text description of the edit
`image[]`	binary	✅	Image file(s) to edit (supports multiple images for some models)
`n`	int	❌	Number of images to generate (1-10)
`size`	string	❌	Image size: `"WxH"` format (e.g., `"1024x1024"`)
`output_format`	string	❌	Output format: `"png"`, `"webp"`, `"jpeg"` (note: `"jpg"` is normalized to `"jpeg"`)
`seed`	int	❌	Seed for reproducibility (pass as an extra param)
`num_inference_steps`	int	❌	Number of inference steps (pass as an extra param)
`guidance_scale`	float	❌	Guidance scale (pass as an extra param)
`acceleration`	string	❌	Acceleration mode (pass as an extra param)
`enable_safety_checker`	bool	❌	Enable safety checker (pass as an extra param)
`use_image_urls`	bool	❌	Override automatic image field selection (pass as an extra param)

Behavior

Supported Backend: Only the fal-ai inference provider supports image edit. Requests targeting other backends return an unsupported-operation error.
Image Handling: Submit your image file(s) directly. DeepIntShield detects the format (image/jpeg, image/webp, image/png) and encodes them into the form the backend expects.
Single vs. Multiple Images: For multi-image models (e.g., fal-ai/flux-2/edit, fal-ai/flux-2-pro/edit) you can pass several images; single-image models (e.g., fal-ai/flux-pro/kontext, fal-ai/flux/dev/image-to-image) accept one. The correct field is selected automatically, and you can override it with the use_image_urls extra param.
Normalization: output_format: "jpg" is normalized to "jpeg"; size is given in "WxH" form (e.g., "1024x1024").

Response

Non-streaming: Edited images are returned in data[] with url and/or b64_json, the same as image generation.
Streaming: Partial image chunks (type: "image_edit.partial_image") stream until a final type: "image_edit.completed" chunk delivers the last image, as either a url or b64_json.

Image Variation

Image variation is not supported by HuggingFace.

Request Format Handling

Different inference backends expect different request shapes for the same operation. You always use the standard DeepIntShield endpoints (for example /v1/embeddings for embeddings) regardless of backend - the backend-specific request format is handled for you.

Model Discovery

DeepIntShield discovers available Hugging Face models for you via the Hugging Face Hub API.

List Models Behavior

When you call GET /v1/models, DeepIntShield:

Queries multiple backends to gather available models
Filters by capability (chat, embedding, text-to-speech, transcription, image) so each model is listed under the methods it supports
Aggregates the results into a single unified list
Returns model IDs in the huggingface/{provider}/{model_id} format you use in requests

Automatic Re-resolution

DeepIntShield maps each Hugging Face model ID to the identifier the target backend expects (for example, meta-llama/Meta-Llama-3-8B-Instruct resolves to the right per-backend model name for cerebras, groq, and others). If a backend reports that a mapping is stale (HTTP 404), DeepIntShield refreshes it and retries automatically, so you don’t have to handle the retry yourself.

Best Practices

When working with the Hugging Face provider:

Check Payload Size: Ensure request bodies are under 2 MB
Audio Format: Use MP3 for fal-ai, avoid WAV files
Model Aliases: Always specify the provider in the model string: huggingface/{provider}/{model}
Provider Selection: Use the capability table above to pick a backend that supports your operation (chat, embedding, TTS, ASR, image)
Verify Capabilities: Confirm the model supports your use case (chat, embedding, TTS, ASR, image) before sending requests

Hugging Face

Overview

Supported Inference Providers

Model Aliases & Identification

Request Handling Differences

Inference Provider Constraints

Payload Limit

fal-ai Audio Format Restrictions

Speech (Text-to-Speech)

Transcription (Automatic Speech Recognition)

1. hf-inference (Raw Bytes)

2. fal-ai (Base64 Data URI)

Image Generation

Supported Inference Providers

Supported Parameters by Backend

1. hf-inference

2. fal-ai

3. nebius

4. together

Image Generation Streaming

Example Usage

Provider-Specific Notes

Image Edit

Request Format Handling

Model Discovery

List Models Behavior

Automatic Re-resolution

Best Practices

`fal-ai` Audio Format Restrictions

1. `hf-inference` (Raw Bytes)

2. `fal-ai` (Base64 Data URI)

1. `hf-inference`

2. `fal-ai`

3. `nebius`

4. `together`