Google Gemini

Overview

Call Google Gemini models through DeepIntShield using the same OpenAI-compatible Chat Completions and Responses APIs, plus Gemini’s native audio, image, video, embeddings, files, and batch capabilities. Gemini’s underlying API differs from OpenAI’s, so a few things are worth knowing:

Parameters - some fields use Gemini-native names (e.g. max_completion_tokens maps to maxOutputTokens, stop to stopSequences); see the parameter sections below
Multimodal input - text, images, video, and code execution are supported as message content
Reasoning - the reasoning object enables Gemini thinking
Tools - function calling is supported, including tool call IDs and thought signatures

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/v1beta/models/{model}:generateContent`
Responses API	✅	✅	`/v1beta/models/{model}:generateContent`
Speech (TTS)	✅	✅	`/v1beta/models/{model}:generateContent`
Transcriptions (STT)	✅	✅	`/v1beta/models/{model}:generateContent`
Image Generation	✅	-	`/v1beta/models/{model}:generateContent` or `/v1beta/models/{model}:predict` (Imagen)
Image Edit	✅	-	`/v1beta/models/{model}:generateContent` or `/v1beta/models/{model}:predict` (Imagen)
Video Generation	✅	-	`/v1beta/models/{model}:predictLongRunning`
Image Variation	❌	-	Not supported
Embeddings	✅	-	`/v1beta/models/{model}:embedContent`
Files	✅	-	`/upload/storage/v1beta/files`
Batch	✅	-	`/v1beta/batchJobs`
List Models	✅	-	`/v1beta/models`

Authentication

Gemini supports API key authentication in addition to OAuth2 Bearer token authentication. The appropriate method is used automatically based on the endpoint type.

API Key Authentication

API key authentication is supported via two methods:

Header Method (standard Gemini endpoints):
- Format: x-goog-api-key: YOUR_API_KEY header
- Used for: Standard Gemini endpoints (e.g., /v1beta/models/{model}:generateContent)
Query Parameter Method (Imagen and custom endpoints):
- Format: ?key=YOUR_API_KEY appended to request URLs
- Used for: Imagen models and custom endpoints
- Example: https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-001:predict?key=YOUR_API_KEY

DeepIntShield automatically selects the appropriate authentication method based on the endpoint type.

1. Chat Completions

Request Parameters

Send standard OpenAI-compatible Chat Completions requests. The following are supported:

max_completion_tokens, temperature, top_p, stop
response_format for JSON / structured output
tools and tool_choice for function calling (see Tools)
reasoning for Gemini thinking (see Reasoning / Thinking)

Ignored Parameters

The following are not supported by Gemini and are ignored: logit_bias, logprobs, top_logprobs, parallel_tool_calls, service_tier.

Extra Parameters

Gemini-specific fields such as top_k, presence_penalty, frequency_penalty, and seed can be passed directly in the request body:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40,
    "stop_sequences": ["###"]
  }'

Reasoning / Thinking

Documentation: See DeepIntShield Reasoning Reference

Use the reasoning object to enable Gemini thinking:

reasoning.effort sets the thinking level - "low"/"minimal" for low, "medium"/"high" for high
reasoning.max_tokens sets the thinking token budget (-1 for dynamic, 0 to disable, or a specific budget)

{"reasoning": {"effort": "high", "max_tokens": 10000}}

Multimodal Input

Message content can include text, images (URL or base64 data URL), and video. Video content can carry metadata such as fps and start/end offsets.

Tools

Standard OpenAI-style tool definitions are supported. tool_choice accepts:

`tool_choice`	Behavior
`"auto"`	Model decides (default)
`"none"`	No tool calls
`"required"`	Must call a tool
Specific tool	Restricts to the named function

The function.strict field is not supported by Gemini and is ignored.

Response

Responses come back in the standard OpenAI-compatible shape:

finish_reason - stop, length, tool_calls, or content_filter (the latter for Gemini safety/recitation blocks)
message.content for text, message.tool_calls for function calls (with arguments as a JSON string)
usage.prompt_tokens / completion_tokens / total_tokens, plus prompt_tokens_details.cached_tokens and completion_tokens_details.reasoning_tokens
reasoning for Gemini thinking output (when reasoning is enabled)

Streaming

Set "stream": true to receive output as standard OpenAI-compatible streaming chunks. Content text, reasoning, and tool-call arguments arrive incrementally; finish_reason and usage appear on the final chunk.

2. Responses API

Gemini supports the OpenAI-style Responses API on the same models as Chat Completions.

Request Parameters

Send standard OpenAI Responses requests. The following are supported:

max_output_tokens, temperature, top_p
instructions for system instructions
input as a string or array
tools and tool_choice (see Chat Completions)
reasoning for Gemini thinking (see Reasoning / Thinking)
text for structured output

Gemini-specific fields such as stop and top_k can be passed directly in the request body:

curl -X POST https://app.deepintshield.com/v1/responses \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "input": "Hello, how are you?",
    "instructions": "You are a helpful assistant.",
    "top_k": 40
  }'

Tool Support

Supported types: function, computer_use_preview, web_search, mcp. Tool behavior is the same as Chat Completions.

Response

Responses come back in the standard OpenAI Responses shape:

status (completed, or incomplete when blocked by a Gemini safety stop)
output items: assistant text as message, tool calls as function_call, Gemini thinking as reasoning
usage token counts, including cache tokens under *_tokens_details.cached_tokens

Streaming

Set "stream": true to receive output as standard OpenAI Responses streaming events.

3. Speech (Text-to-Speech)

Gemini can synthesize speech from text.

Request Parameters

Parameter	Notes
`input`	Text to synthesize
`voice`	Voice name (see Supported Voices)
`response_format`	Only `"wav"` is supported (default)

curl -X POST https://app.deepintshield.com/v1/audio/speech \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.5-flash-preview-tts",
    "input": "Hello, how are you?",
    "voice": "Chant-Female",
    "response_format": "wav"
  }'

Response

Returns WAV audio (default). If response_format is omitted, raw audio is returned.

Supported Voices

Common Gemini voices include:

Chant-Female - Female voice
Chant-Male - Male voice
Additional voices depend on model capabilities

Check model documentation for complete list of supported voices.

4. Transcriptions (Speech-to-Text)

Gemini transcribes audio to text using the standard /v1/audio/transcriptions endpoint.

Request Parameters

Parameter	Notes
`file`	Audio bytes to transcribe
`prompt`	Instructions (defaults to “Generate a transcript of the speech.”)
`language`	Language code, if supported by the model

Safety settings and caching can be passed directly in the request body.

curl -X POST https://app.deepintshield.com/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "file": "<binary-audio-data>",
    "prompt": "Transcribe this audio in the original language."
  }'

Response

Returns the transcribed text along with token usage.

5. Embeddings

Request Parameters:

input - text to embed (single string or array)
dimensions - output embedding dimensionality
Task type and title can be passed directly in the request body

Response: Standard embeddings response with the embedding array and token usage.

6. Batch API

Request formats: Inline requests array or file-based input

Pagination: Token-based with pageToken

Endpoints:

POST /v1beta/batchJobs - Create
GET /v1beta/batchJobs?pageSize={limit}&pageToken={token} - List
GET /v1beta/batchJobs/{batch_id} - Retrieve
POST /v1beta/batchJobs/{batch_id}:cancel - Cancel

Response:

Batch statuses: in_progress, completed, failed, cancelling, cancelled, expired
Results are returned inline or as a JSONL output file, depending on how the batch was created

7. Files API

Upload: Multipart/form-data with file (binary) and filename (optional)

Endpoints:

POST /upload/storage/v1beta/files - Upload
GET /v1beta/files?limit={limit}&pageToken={token} (cursor pagination)
GET /v1beta/files/{file_id} - Retrieve
DELETE /v1beta/files/{file_id} - Delete
GET /v1beta/files/{file_id}/content - Download

8. Image Generation

Gemini supports two image generation formats depending on the model:

Standard Gemini Format: Uses the /v1beta/models/{model}:generateContent endpoint
Imagen Format: Uses the /v1beta/models/{model}:predict endpoint for Imagen models (detected automatically)

The right endpoint is selected automatically based on the model name (Imagen models use the predict endpoint).

Request Parameters

Parameter	Notes
`prompt`	Text description of the image to generate
`n`	Number of images to generate
`size`	Image size in WxH format (e.g., `"1024x1024"`)
`output_format`	Output format: `"png"`, `"jpeg"`, `"webp"` (Imagen does not support webp)
`seed`	Seed for reproducible generation
`negative_prompt`	Negative prompt

Extra Parameters

Gemini-specific fields can be passed directly in the request body:

Parameter	Type	Notes
`personGeneration`	string	Person generation setting (Imagen only)
`language`	string	Language code (Imagen only)
`enhancePrompt`	bool	Prompt enhancement flag (Imagen only)
`safetySettings` / `safety_settings`	string/array	Safety settings configuration
`cachedContent` / `cached_content`	string	Cached content ID
`labels`	object	Custom labels map

curl -X POST https://app.deepintshield.com/v1/images/generations \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/imagen-4.0-generate-001",
    "prompt": "A sunset over the mountains",
    "size": "1024x1024",
    "n": 2,
    "output_format": "png"
  }'

Imagen Size Support

For Imagen models, size maps to a supported image size and aspect ratio:

Supported image sizes: up to "1k" (≤1024) or "2k" (≤2048); larger sizes are not supported
Supported aspect ratios: "1:1", "3:4", "4:3", "9:16", "16:9"

Response

Generated images are returned in the response array (as b64_json for Imagen), each with an index and the output format reported as a file extension. Token usage is included.

Streaming

Image generation streaming is not supported by Gemini.

9. Image Edit

Gemini supports image editing for both Gemini and Imagen models. The right endpoint is selected automatically based on the model name.

Request Parameters

Parameter	Type	Required	Notes
`model`	string	✅	Model identifier (Gemini or Imagen model)
`prompt`	string	✅	Text description of the edit
`image[]`	binary	✅	Image file(s) to edit (supports multiple images)
`mask`	binary	❌	Mask image file
`type`	string	❌	Edit type: `"inpainting"`, `"outpainting"`, `"inpaint_removal"`, `"bgswap"` (Imagen only)
`n`	int	❌	Number of images to generate (1-10)
`output_format`	string	❌	Output format: `"png"`, `"webp"`, `"jpeg"`
`output_compression`	int	❌	Compression level (0-100%)
`seed`	int	❌	Seed for reproducibility (pass as an extra param)
`negative_prompt`	string	❌	Negative prompt (pass `negativePrompt` as an extra param)
`guidanceScale`	int	❌	Guidance scale (pass as an extra param, Imagen only)
`baseSteps`	int	❌	Base steps (pass as an extra param, Imagen only)
`maskMode`	string	❌	Mask mode (pass as an extra param, Imagen only): `"MASK_MODE_USER_PROVIDED"`, `"MASK_MODE_BACKGROUND"`, `"MASK_MODE_FOREGROUND"`, `"MASK_MODE_SEMANTIC"`
`dilation`	float	❌	Mask dilation (pass as an extra param, Imagen only): Range [0, 1]
`maskClasses`	int[]	❌	Mask classes (pass as an extra param, Imagen only): For `MASK_MODE_SEMANTIC`

Behavior

Gemini Models

Send your prompt and one or more images (image/jpeg, image/webp, image/png).
safetySettings (safety_settings), cachedContent (cached_content), and labels can be passed directly in the request body.

Imagen Models

Mask: Provide a mask or set maskMode. When mask data is present, maskMode defaults to user-provided. dilation (range [0, 1]) and maskClasses (for semantic masks) can be set as extra params.
Edit type: The type parameter selects the edit mode: "inpainting", "outpainting", "inpaint_removal", or "bgswap". Alternatively, set editMode directly as an extra param.
Parameters: n, output_format, and output_compression are supported. seed, negativePrompt, guidanceScale, and baseSteps can be set as extra params. Additional Imagen-specific extra params: addWatermark, includeRaiReason, includeSafetyAttributes, personGeneration, safetySetting, language, storageUri.

Response

Same response format as image generation (see Image Generation section).

Streaming

Image edit streaming is not supported by Gemini.

Image Variation

Image variation is not supported by Gemini.

10. List Models

Request: GET /v1/models (no body)

Returns available Gemini models with IDs in the gemini/{model} format, along with their display name, description, and token limits.

11. Video Generation

Generate (`POST /v1/videos`)

Requests use JSON body (application/json).

Request Parameters

Parameter	Type	Required	Notes
`model`	string	✅	Veo model (e.g., `veo-3.1-generate-preview`)
`prompt`	string	✅	Text description of the video
`input_reference`	string	❌	Input image for image-to-video
`seconds`	string	❌	Duration of the video
`size`	string	❌	Resolution, mapped to an aspect ratio (`1280x720` → `16:9`, `720x1280` → `9:16`)
`negative_prompt`	string	❌	What to avoid in the video
`seed`	int	❌	Seed for reproducibility
`audio`	bool	❌	Enable audio generation
`video_uri`	string	❌	GCS video URI for video extension

Extra Params (any unrecognized JSON field can be passed directly in the request body)

Key	Notes
`aspectRatio`	Override the aspect ratio directly (e.g., `"16:9"`, `"9:16"`). Takes precedence over `size`
`resolution`	Native Gemini resolution string
`sampleCount`	Number of samples to generate
`personGeneration`	Person generation policy
`numberOfVideos`	Number of videos to generate
`storageURI`	GCS bucket for output storage
`compressionQuality`	Output compression quality
`enhancePrompt`	Auto-enhance the prompt
`resizeMode`	How to handle size mismatches
`reference_images`	Style/asset reference image objects
`lastFrame`	Last frame image object for interpolation

Response: DeepIntShieldVideoGenerationResponse - id, status, videos[]

If Gemini filters content for safety, status is failed and content_filter describes the reason.

Job Statuses: in_progress → completed / failed

Retrieve / Download

Operation	Endpoint	Notes
Get status	`GET /v1/videos/{id}`	Polls the long-running operation
Download	`GET /v1/videos/{id}/content`	Downloads from GCS URI or decodes base64 video

Video Delete, List, and Remix are not supported.

Content Type Support

DeepIntShield supports the following content modalities through Gemini:

Content Type	Support	Notes
Text	✅	Full support
Images (URL/Base64)	✅	URL and base64 data-URL images
Video	✅	With fps, start/end offset metadata
Audio	⚠️	Via file references only
PDF	✅	Via file references
Code Execution	✅	Auto-executed with results returned
Thinking/Reasoning	✅	Returned as reasoning output when enabled
Function Calls	✅	Supported

Caveats

Function Call Arguments Serialization

Severity: Low Behavior: Tool call arguments are returned as a JSON string in tool_calls Impact: Requires JSON parsing to access arguments

Streaming Finish Reason Timing

Severity: Medium Behavior: finish_reason and usage are only present in the final stream chunk Impact: Cannot determine completion until the end of the stream

Cached Content Token Reporting

Severity: Low Behavior: Cached tokens are reported in prompt_tokens_details.cached_tokens; cache creation vs read cannot be distinguished Impact: Billing estimates may be approximate

Google Gemini

Overview

Supported Operations

Authentication

API Key Authentication

1. Chat Completions

Request Parameters

Ignored Parameters

Extra Parameters

Reasoning / Thinking

Multimodal Input

Tools

Response

Streaming

2. Responses API

Request Parameters

Tool Support

Response

Streaming

3. Speech (Text-to-Speech)

Request Parameters

Response

Supported Voices

4. Transcriptions (Speech-to-Text)

Request Parameters

Response

5. Embeddings

6. Batch API

7. Files API

8. Image Generation

Request Parameters

Extra Parameters

Imagen Size Support

Response

Streaming

9. Image Edit

Gemini Models

Imagen Models

10. List Models

11. Video Generation

Generate (POST /v1/videos)

Retrieve / Download

Content Type Support

Caveats

Generate (`POST /v1/videos`)