Skip to content

Google Gemini

Call Google Gemini models through DeepIntShield using the same OpenAI-compatible Chat Completions and Responses APIs, plus Gemini’s native audio, image, video, embeddings, files, and batch capabilities. Gemini’s underlying API differs from OpenAI’s, so a few things are worth knowing:

  • Parameters - some fields use Gemini-native names (e.g. max_completion_tokens maps to maxOutputTokens, stop to stopSequences); see the parameter sections below
  • Multimodal input - text, images, video, and code execution are supported as message content
  • Reasoning - the reasoning object enables Gemini thinking
  • Tools - function calling is supported, including tool call IDs and thought signatures
OperationNon-StreamingStreamingEndpoint
Chat Completions/v1beta/models/{model}:generateContent
Responses API/v1beta/models/{model}:generateContent
Speech (TTS)/v1beta/models/{model}:generateContent
Transcriptions (STT)/v1beta/models/{model}:generateContent
Image Generation-/v1beta/models/{model}:generateContent or /v1beta/models/{model}:predict (Imagen)
Image Edit-/v1beta/models/{model}:generateContent or /v1beta/models/{model}:predict (Imagen)
Video Generation-/v1beta/models/{model}:predictLongRunning
Image Variation-Not supported
Embeddings-/v1beta/models/{model}:embedContent
Files-/upload/storage/v1beta/files
Batch-/v1beta/batchJobs
List Models-/v1beta/models

Gemini supports API key authentication in addition to OAuth2 Bearer token authentication. The appropriate method is used automatically based on the endpoint type.

API key authentication is supported via two methods:

  1. Header Method (standard Gemini endpoints):

    • Format: x-goog-api-key: YOUR_API_KEY header
    • Used for: Standard Gemini endpoints (e.g., /v1beta/models/{model}:generateContent)
  2. Query Parameter Method (Imagen and custom endpoints):

    • Format: ?key=YOUR_API_KEY appended to request URLs
    • Used for: Imagen models and custom endpoints
    • Example: https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-001:predict?key=YOUR_API_KEY

DeepIntShield automatically selects the appropriate authentication method based on the endpoint type.


Send standard OpenAI-compatible Chat Completions requests. The following are supported:

  • max_completion_tokens, temperature, top_p, stop
  • response_format for JSON / structured output
  • tools and tool_choice for function calling (see Tools)
  • reasoning for Gemini thinking (see Reasoning / Thinking)

The following are not supported by Gemini and are ignored: logit_bias, logprobs, top_logprobs, parallel_tool_calls, service_tier.

Gemini-specific fields such as top_k, presence_penalty, frequency_penalty, and seed can be passed directly in the request body:

Terminal window
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini/gemini-2.0-flash",
"messages": [{"role": "user", "content": "Hello"}],
"top_k": 40,
"stop_sequences": ["###"]
}'

Documentation: See DeepIntShield Reasoning Reference

Use the reasoning object to enable Gemini thinking:

  • reasoning.effort sets the thinking level - "low"/"minimal" for low, "medium"/"high" for high
  • reasoning.max_tokens sets the thinking token budget (-1 for dynamic, 0 to disable, or a specific budget)
{"reasoning": {"effort": "high", "max_tokens": 10000}}

Message content can include text, images (URL or base64 data URL), and video. Video content can carry metadata such as fps and start/end offsets.

Standard OpenAI-style tool definitions are supported. tool_choice accepts:

tool_choiceBehavior
"auto"Model decides (default)
"none"No tool calls
"required"Must call a tool
Specific toolRestricts to the named function

The function.strict field is not supported by Gemini and is ignored.

Responses come back in the standard OpenAI-compatible shape:

  • finish_reason - stop, length, tool_calls, or content_filter (the latter for Gemini safety/recitation blocks)
  • message.content for text, message.tool_calls for function calls (with arguments as a JSON string)
  • usage.prompt_tokens / completion_tokens / total_tokens, plus prompt_tokens_details.cached_tokens and completion_tokens_details.reasoning_tokens
  • reasoning for Gemini thinking output (when reasoning is enabled)

Set "stream": true to receive output as standard OpenAI-compatible streaming chunks. Content text, reasoning, and tool-call arguments arrive incrementally; finish_reason and usage appear on the final chunk.


Gemini supports the OpenAI-style Responses API on the same models as Chat Completions.

Send standard OpenAI Responses requests. The following are supported:

  • max_output_tokens, temperature, top_p
  • instructions for system instructions
  • input as a string or array
  • tools and tool_choice (see Chat Completions)
  • reasoning for Gemini thinking (see Reasoning / Thinking)
  • text for structured output

Gemini-specific fields such as stop and top_k can be passed directly in the request body:

Terminal window
curl -X POST https://app.deepintshield.com/v1/responses \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini/gemini-2.0-flash",
"input": "Hello, how are you?",
"instructions": "You are a helpful assistant.",
"top_k": 40
}'

Supported types: function, computer_use_preview, web_search, mcp. Tool behavior is the same as Chat Completions.

Responses come back in the standard OpenAI Responses shape:

  • status (completed, or incomplete when blocked by a Gemini safety stop)
  • output items: assistant text as message, tool calls as function_call, Gemini thinking as reasoning
  • usage token counts, including cache tokens under *_tokens_details.cached_tokens

Set "stream": true to receive output as standard OpenAI Responses streaming events.


Gemini can synthesize speech from text.

ParameterNotes
inputText to synthesize
voiceVoice name (see Supported Voices)
response_formatOnly "wav" is supported (default)
Terminal window
curl -X POST https://app.deepintshield.com/v1/audio/speech \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini/gemini-2.5-flash-preview-tts",
"input": "Hello, how are you?",
"voice": "Chant-Female",
"response_format": "wav"
}'

Returns WAV audio (default). If response_format is omitted, raw audio is returned.

Common Gemini voices include:

  • Chant-Female - Female voice
  • Chant-Male - Male voice
  • Additional voices depend on model capabilities

Check model documentation for complete list of supported voices.


Gemini transcribes audio to text using the standard /v1/audio/transcriptions endpoint.

ParameterNotes
fileAudio bytes to transcribe
promptInstructions (defaults to “Generate a transcript of the speech.”)
languageLanguage code, if supported by the model

Safety settings and caching can be passed directly in the request body.

Terminal window
curl -X POST https://app.deepintshield.com/v1/audio/transcriptions \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini/gemini-2.0-flash",
"file": "<binary-audio-data>",
"prompt": "Transcribe this audio in the original language."
}'

Returns the transcribed text along with token usage.


Request Parameters:

  • input - text to embed (single string or array)
  • dimensions - output embedding dimensionality
  • Task type and title can be passed directly in the request body

Response: Standard embeddings response with the embedding array and token usage.


Request formats: Inline requests array or file-based input

Pagination: Token-based with pageToken

Endpoints:

  • POST /v1beta/batchJobs - Create
  • GET /v1beta/batchJobs?pageSize={limit}&pageToken={token} - List
  • GET /v1beta/batchJobs/{batch_id} - Retrieve
  • POST /v1beta/batchJobs/{batch_id}:cancel - Cancel

Response:

  • Batch statuses: in_progress, completed, failed, cancelling, cancelled, expired
  • Results are returned inline or as a JSONL output file, depending on how the batch was created

Upload: Multipart/form-data with file (binary) and filename (optional)

Endpoints:

  • POST /upload/storage/v1beta/files - Upload
  • GET /v1beta/files?limit={limit}&pageToken={token} (cursor pagination)
  • GET /v1beta/files/{file_id} - Retrieve
  • DELETE /v1beta/files/{file_id} - Delete
  • GET /v1beta/files/{file_id}/content - Download

Gemini supports two image generation formats depending on the model:

  1. Standard Gemini Format: Uses the /v1beta/models/{model}:generateContent endpoint
  2. Imagen Format: Uses the /v1beta/models/{model}:predict endpoint for Imagen models (detected automatically)

The right endpoint is selected automatically based on the model name (Imagen models use the predict endpoint).

ParameterNotes
promptText description of the image to generate
nNumber of images to generate
sizeImage size in WxH format (e.g., "1024x1024")
output_formatOutput format: "png", "jpeg", "webp" (Imagen does not support webp)
seedSeed for reproducible generation
negative_promptNegative prompt

Gemini-specific fields can be passed directly in the request body:

ParameterTypeNotes
personGenerationstringPerson generation setting (Imagen only)
languagestringLanguage code (Imagen only)
enhancePromptboolPrompt enhancement flag (Imagen only)
safetySettings / safety_settingsstring/arraySafety settings configuration
cachedContent / cached_contentstringCached content ID
labelsobjectCustom labels map
Terminal window
curl -X POST https://app.deepintshield.com/v1/images/generations \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini/imagen-4.0-generate-001",
"prompt": "A sunset over the mountains",
"size": "1024x1024",
"n": 2,
"output_format": "png"
}'

For Imagen models, size maps to a supported image size and aspect ratio:

  • Supported image sizes: up to "1k" (≤1024) or "2k" (≤2048); larger sizes are not supported
  • Supported aspect ratios: "1:1", "3:4", "4:3", "9:16", "16:9"

Generated images are returned in the response array (as b64_json for Imagen), each with an index and the output format reported as a file extension. Token usage is included.

Image generation streaming is not supported by Gemini.


Gemini supports image editing for both Gemini and Imagen models. The right endpoint is selected automatically based on the model name.

Request Parameters

ParameterTypeRequiredNotes
modelstringModel identifier (Gemini or Imagen model)
promptstringText description of the edit
image[]binaryImage file(s) to edit (supports multiple images)
maskbinaryMask image file
typestringEdit type: "inpainting", "outpainting", "inpaint_removal", "bgswap" (Imagen only)
nintNumber of images to generate (1-10)
output_formatstringOutput format: "png", "webp", "jpeg"
output_compressionintCompression level (0-100%)
seedintSeed for reproducibility (pass as an extra param)
negative_promptstringNegative prompt (pass negativePrompt as an extra param)
guidanceScaleintGuidance scale (pass as an extra param, Imagen only)
baseStepsintBase steps (pass as an extra param, Imagen only)
maskModestringMask mode (pass as an extra param, Imagen only): "MASK_MODE_USER_PROVIDED", "MASK_MODE_BACKGROUND", "MASK_MODE_FOREGROUND", "MASK_MODE_SEMANTIC"
dilationfloatMask dilation (pass as an extra param, Imagen only): Range [0, 1]
maskClassesint[]Mask classes (pass as an extra param, Imagen only): For MASK_MODE_SEMANTIC

Behavior

  • Send your prompt and one or more images (image/jpeg, image/webp, image/png).
  • safetySettings (safety_settings), cachedContent (cached_content), and labels can be passed directly in the request body.
  • Mask: Provide a mask or set maskMode. When mask data is present, maskMode defaults to user-provided. dilation (range [0, 1]) and maskClasses (for semantic masks) can be set as extra params.
  • Edit type: The type parameter selects the edit mode: "inpainting", "outpainting", "inpaint_removal", or "bgswap". Alternatively, set editMode directly as an extra param.
  • Parameters: n, output_format, and output_compression are supported. seed, negativePrompt, guidanceScale, and baseSteps can be set as extra params. Additional Imagen-specific extra params: addWatermark, includeRaiReason, includeSafetyAttributes, personGeneration, safetySetting, language, storageUri.

Response

Same response format as image generation (see Image Generation section).

Streaming

Image edit streaming is not supported by Gemini.

Image Variation

Image variation is not supported by Gemini.


Request: GET /v1/models (no body)

Returns available Gemini models with IDs in the gemini/{model} format, along with their display name, description, and token limits.


Requests use JSON body (application/json).

Request Parameters

ParameterTypeRequiredNotes
modelstringVeo model (e.g., veo-3.1-generate-preview)
promptstringText description of the video
input_referencestringInput image for image-to-video
secondsstringDuration of the video
sizestringResolution, mapped to an aspect ratio (1280x72016:9, 720x12809:16)
negative_promptstringWhat to avoid in the video
seedintSeed for reproducibility
audioboolEnable audio generation
video_uristringGCS video URI for video extension

Extra Params (any unrecognized JSON field can be passed directly in the request body)

KeyNotes
aspectRatioOverride the aspect ratio directly (e.g., "16:9", "9:16"). Takes precedence over size
resolutionNative Gemini resolution string
sampleCountNumber of samples to generate
personGenerationPerson generation policy
numberOfVideosNumber of videos to generate
storageURIGCS bucket for output storage
compressionQualityOutput compression quality
enhancePromptAuto-enhance the prompt
resizeModeHow to handle size mismatches
reference_imagesStyle/asset reference image objects
lastFrameLast frame image object for interpolation

Response: DeepIntShieldVideoGenerationResponse - id, status, videos[]

If Gemini filters content for safety, status is failed and content_filter describes the reason.

Job Statuses: in_progresscompleted / failed

OperationEndpointNotes
Get statusGET /v1/videos/{id}Polls the long-running operation
DownloadGET /v1/videos/{id}/contentDownloads from GCS URI or decodes base64 video

Video Delete, List, and Remix are not supported.


DeepIntShield supports the following content modalities through Gemini:

Content TypeSupportNotes
TextFull support
Images (URL/Base64)URL and base64 data-URL images
VideoWith fps, start/end offset metadata
Audio⚠️Via file references only
PDFVia file references
Code ExecutionAuto-executed with results returned
Thinking/ReasoningReturned as reasoning output when enabled
Function CallsSupported

Function Call Arguments Serialization

Severity: Low Behavior: Tool call arguments are returned as a JSON string in tool_calls Impact: Requires JSON parsing to access arguments

Streaming Finish Reason Timing

Severity: Medium Behavior: finish_reason and usage are only present in the final stream chunk Impact: Cannot determine completion until the end of the stream

Cached Content Token Reporting

Severity: Low Behavior: Cached tokens are reported in prompt_tokens_details.cached_tokens; cache creation vs read cannot be distinguished Impact: Billing estimates may be approximate