Google Gemini
Overview
Section titled “Overview”Call Google Gemini models through DeepIntShield using the same OpenAI-compatible Chat Completions and Responses APIs, plus Gemini’s native audio, image, video, embeddings, files, and batch capabilities. Gemini’s underlying API differs from OpenAI’s, so a few things are worth knowing:
- Parameters - some fields use Gemini-native names (e.g.
max_completion_tokensmaps tomaxOutputTokens,stoptostopSequences); see the parameter sections below - Multimodal input - text, images, video, and code execution are supported as message content
- Reasoning - the
reasoningobject enables Gemini thinking - Tools - function calling is supported, including tool call IDs and thought signatures
Supported Operations
Section titled “Supported Operations”| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v1beta/models/{model}:generateContent |
| Responses API | ✅ | ✅ | /v1beta/models/{model}:generateContent |
| Speech (TTS) | ✅ | ✅ | /v1beta/models/{model}:generateContent |
| Transcriptions (STT) | ✅ | ✅ | /v1beta/models/{model}:generateContent |
| Image Generation | ✅ | - | /v1beta/models/{model}:generateContent or /v1beta/models/{model}:predict (Imagen) |
| Image Edit | ✅ | - | /v1beta/models/{model}:generateContent or /v1beta/models/{model}:predict (Imagen) |
| Video Generation | ✅ | - | /v1beta/models/{model}:predictLongRunning |
| Image Variation | ❌ | - | Not supported |
| Embeddings | ✅ | - | /v1beta/models/{model}:embedContent |
| Files | ✅ | - | /upload/storage/v1beta/files |
| Batch | ✅ | - | /v1beta/batchJobs |
| List Models | ✅ | - | /v1beta/models |
Authentication
Section titled “Authentication”Gemini supports API key authentication in addition to OAuth2 Bearer token authentication. The appropriate method is used automatically based on the endpoint type.
API Key Authentication
Section titled “API Key Authentication”API key authentication is supported via two methods:
-
Header Method (standard Gemini endpoints):
- Format:
x-goog-api-key: YOUR_API_KEYheader - Used for: Standard Gemini endpoints (e.g.,
/v1beta/models/{model}:generateContent)
- Format:
-
Query Parameter Method (Imagen and custom endpoints):
- Format:
?key=YOUR_API_KEYappended to request URLs - Used for: Imagen models and custom endpoints
- Example:
https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-001:predict?key=YOUR_API_KEY
- Format:
DeepIntShield automatically selects the appropriate authentication method based on the endpoint type.
1. Chat Completions
Section titled “1. Chat Completions”Request Parameters
Section titled “Request Parameters”Send standard OpenAI-compatible Chat Completions requests. The following are supported:
max_completion_tokens,temperature,top_p,stopresponse_formatfor JSON / structured outputtoolsandtool_choicefor function calling (see Tools)reasoningfor Gemini thinking (see Reasoning / Thinking)
Ignored Parameters
Section titled “Ignored Parameters”The following are not supported by Gemini and are ignored: logit_bias, logprobs, top_logprobs, parallel_tool_calls, service_tier.
Extra Parameters
Section titled “Extra Parameters”Gemini-specific fields such as top_k, presence_penalty, frequency_penalty, and seed can be passed directly in the request body:
curl -X POST https://app.deepintshield.com/v1/chat/completions \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini/gemini-2.0-flash", "messages": [{"role": "user", "content": "Hello"}], "top_k": 40, "stop_sequences": ["###"] }'Reasoning / Thinking
Section titled “Reasoning / Thinking”Documentation: See DeepIntShield Reasoning Reference
Use the reasoning object to enable Gemini thinking:
reasoning.effortsets the thinking level -"low"/"minimal"for low,"medium"/"high"for highreasoning.max_tokenssets the thinking token budget (-1for dynamic,0to disable, or a specific budget)
{"reasoning": {"effort": "high", "max_tokens": 10000}}Multimodal Input
Section titled “Multimodal Input”Message content can include text, images (URL or base64 data URL), and video. Video content can carry metadata such as fps and start/end offsets.
Standard OpenAI-style tool definitions are supported. tool_choice accepts:
tool_choice | Behavior |
|---|---|
"auto" | Model decides (default) |
"none" | No tool calls |
"required" | Must call a tool |
| Specific tool | Restricts to the named function |
The function.strict field is not supported by Gemini and is ignored.
Response
Section titled “Response”Responses come back in the standard OpenAI-compatible shape:
finish_reason-stop,length,tool_calls, orcontent_filter(the latter for Gemini safety/recitation blocks)message.contentfor text,message.tool_callsfor function calls (with arguments as a JSON string)usage.prompt_tokens/completion_tokens/total_tokens, plusprompt_tokens_details.cached_tokensandcompletion_tokens_details.reasoning_tokensreasoningfor Gemini thinking output (when reasoning is enabled)
Streaming
Section titled “Streaming”Set "stream": true to receive output as standard OpenAI-compatible streaming chunks. Content text, reasoning, and tool-call arguments arrive incrementally; finish_reason and usage appear on the final chunk.
2. Responses API
Section titled “2. Responses API”Gemini supports the OpenAI-style Responses API on the same models as Chat Completions.
Request Parameters
Section titled “Request Parameters”Send standard OpenAI Responses requests. The following are supported:
max_output_tokens,temperature,top_pinstructionsfor system instructionsinputas a string or arraytoolsandtool_choice(see Chat Completions)reasoningfor Gemini thinking (see Reasoning / Thinking)textfor structured output
Gemini-specific fields such as stop and top_k can be passed directly in the request body:
curl -X POST https://app.deepintshield.com/v1/responses \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini/gemini-2.0-flash", "input": "Hello, how are you?", "instructions": "You are a helpful assistant.", "top_k": 40 }'Tool Support
Section titled “Tool Support”Supported types: function, computer_use_preview, web_search, mcp. Tool behavior is the same as Chat Completions.
Response
Section titled “Response”Responses come back in the standard OpenAI Responses shape:
status(completed, orincompletewhen blocked by a Gemini safety stop)outputitems: assistant text asmessage, tool calls asfunction_call, Gemini thinking asreasoningusagetoken counts, including cache tokens under*_tokens_details.cached_tokens
Streaming
Section titled “Streaming”Set "stream": true to receive output as standard OpenAI Responses streaming events.
3. Speech (Text-to-Speech)
Section titled “3. Speech (Text-to-Speech)”Gemini can synthesize speech from text.
Request Parameters
Section titled “Request Parameters”| Parameter | Notes |
|---|---|
input | Text to synthesize |
voice | Voice name (see Supported Voices) |
response_format | Only "wav" is supported (default) |
curl -X POST https://app.deepintshield.com/v1/audio/speech \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini/gemini-2.5-flash-preview-tts", "input": "Hello, how are you?", "voice": "Chant-Female", "response_format": "wav" }'Response
Section titled “Response”Returns WAV audio (default). If response_format is omitted, raw audio is returned.
Supported Voices
Section titled “Supported Voices”Common Gemini voices include:
Chant-Female- Female voiceChant-Male- Male voice- Additional voices depend on model capabilities
Check model documentation for complete list of supported voices.
4. Transcriptions (Speech-to-Text)
Section titled “4. Transcriptions (Speech-to-Text)”Gemini transcribes audio to text using the standard /v1/audio/transcriptions endpoint.
Request Parameters
Section titled “Request Parameters”| Parameter | Notes |
|---|---|
file | Audio bytes to transcribe |
prompt | Instructions (defaults to “Generate a transcript of the speech.”) |
language | Language code, if supported by the model |
Safety settings and caching can be passed directly in the request body.
curl -X POST https://app.deepintshield.com/v1/audio/transcriptions \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini/gemini-2.0-flash", "file": "<binary-audio-data>", "prompt": "Transcribe this audio in the original language." }'Response
Section titled “Response”Returns the transcribed text along with token usage.
5. Embeddings
Section titled “5. Embeddings”Request Parameters:
input- text to embed (single string or array)dimensions- output embedding dimensionality- Task type and title can be passed directly in the request body
Response: Standard embeddings response with the embedding array and token usage.
6. Batch API
Section titled “6. Batch API”Request formats: Inline requests array or file-based input
Pagination: Token-based with pageToken
Endpoints:
- POST
/v1beta/batchJobs- Create - GET
/v1beta/batchJobs?pageSize={limit}&pageToken={token}- List - GET
/v1beta/batchJobs/{batch_id}- Retrieve - POST
/v1beta/batchJobs/{batch_id}:cancel- Cancel
Response:
- Batch statuses:
in_progress,completed,failed,cancelling,cancelled,expired - Results are returned inline or as a JSONL output file, depending on how the batch was created
7. Files API
Section titled “7. Files API”Upload: Multipart/form-data with file (binary) and filename (optional)
Endpoints:
- POST
/upload/storage/v1beta/files- Upload - GET
/v1beta/files?limit={limit}&pageToken={token}(cursor pagination) - GET
/v1beta/files/{file_id}- Retrieve - DELETE
/v1beta/files/{file_id}- Delete - GET
/v1beta/files/{file_id}/content- Download
8. Image Generation
Section titled “8. Image Generation”Gemini supports two image generation formats depending on the model:
- Standard Gemini Format: Uses the
/v1beta/models/{model}:generateContentendpoint - Imagen Format: Uses the
/v1beta/models/{model}:predictendpoint for Imagen models (detected automatically)
The right endpoint is selected automatically based on the model name (Imagen models use the predict endpoint).
Request Parameters
Section titled “Request Parameters”| Parameter | Notes |
|---|---|
prompt | Text description of the image to generate |
n | Number of images to generate |
size | Image size in WxH format (e.g., "1024x1024") |
output_format | Output format: "png", "jpeg", "webp" (Imagen does not support webp) |
seed | Seed for reproducible generation |
negative_prompt | Negative prompt |
Extra Parameters
Section titled “Extra Parameters”Gemini-specific fields can be passed directly in the request body:
| Parameter | Type | Notes |
|---|---|---|
personGeneration | string | Person generation setting (Imagen only) |
language | string | Language code (Imagen only) |
enhancePrompt | bool | Prompt enhancement flag (Imagen only) |
safetySettings / safety_settings | string/array | Safety settings configuration |
cachedContent / cached_content | string | Cached content ID |
labels | object | Custom labels map |
curl -X POST https://app.deepintshield.com/v1/images/generations \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini/imagen-4.0-generate-001", "prompt": "A sunset over the mountains", "size": "1024x1024", "n": 2, "output_format": "png" }'Imagen Size Support
Section titled “Imagen Size Support”For Imagen models, size maps to a supported image size and aspect ratio:
- Supported image sizes: up to
"1k"(≤1024) or"2k"(≤2048); larger sizes are not supported - Supported aspect ratios:
"1:1","3:4","4:3","9:16","16:9"
Response
Section titled “Response”Generated images are returned in the response array (as b64_json for Imagen), each with an index and the output format reported as a file extension. Token usage is included.
Streaming
Section titled “Streaming”Image generation streaming is not supported by Gemini.
9. Image Edit
Section titled “9. Image Edit”Gemini supports image editing for both Gemini and Imagen models. The right endpoint is selected automatically based on the model name.
Request Parameters
| Parameter | Type | Required | Notes |
|---|---|---|---|
model | string | ✅ | Model identifier (Gemini or Imagen model) |
prompt | string | ✅ | Text description of the edit |
image[] | binary | ✅ | Image file(s) to edit (supports multiple images) |
mask | binary | ❌ | Mask image file |
type | string | ❌ | Edit type: "inpainting", "outpainting", "inpaint_removal", "bgswap" (Imagen only) |
n | int | ❌ | Number of images to generate (1-10) |
output_format | string | ❌ | Output format: "png", "webp", "jpeg" |
output_compression | int | ❌ | Compression level (0-100%) |
seed | int | ❌ | Seed for reproducibility (pass as an extra param) |
negative_prompt | string | ❌ | Negative prompt (pass negativePrompt as an extra param) |
guidanceScale | int | ❌ | Guidance scale (pass as an extra param, Imagen only) |
baseSteps | int | ❌ | Base steps (pass as an extra param, Imagen only) |
maskMode | string | ❌ | Mask mode (pass as an extra param, Imagen only): "MASK_MODE_USER_PROVIDED", "MASK_MODE_BACKGROUND", "MASK_MODE_FOREGROUND", "MASK_MODE_SEMANTIC" |
dilation | float | ❌ | Mask dilation (pass as an extra param, Imagen only): Range [0, 1] |
maskClasses | int[] | ❌ | Mask classes (pass as an extra param, Imagen only): For MASK_MODE_SEMANTIC |
Behavior
Gemini Models
Section titled “Gemini Models”- Send your
promptand one or more images (image/jpeg,image/webp,image/png). safetySettings(safety_settings),cachedContent(cached_content), andlabelscan be passed directly in the request body.
Imagen Models
Section titled “Imagen Models”- Mask: Provide a
maskor setmaskMode. When mask data is present,maskModedefaults to user-provided.dilation(range [0, 1]) andmaskClasses(for semantic masks) can be set as extra params. - Edit type: The
typeparameter selects the edit mode:"inpainting","outpainting","inpaint_removal", or"bgswap". Alternatively, seteditModedirectly as an extra param. - Parameters:
n,output_format, andoutput_compressionare supported.seed,negativePrompt,guidanceScale, andbaseStepscan be set as extra params. Additional Imagen-specific extra params:addWatermark,includeRaiReason,includeSafetyAttributes,personGeneration,safetySetting,language,storageUri.
Response
Same response format as image generation (see Image Generation section).
Streaming
Image edit streaming is not supported by Gemini.
Image Variation
Image variation is not supported by Gemini.
10. List Models
Section titled “10. List Models”Request: GET /v1/models (no body)
Returns available Gemini models with IDs in the gemini/{model} format, along with their display name, description, and token limits.
11. Video Generation
Section titled “11. Video Generation”Generate (POST /v1/videos)
Section titled “Generate (POST /v1/videos)”Requests use JSON body (application/json).
Request Parameters
| Parameter | Type | Required | Notes |
|---|---|---|---|
model | string | ✅ | Veo model (e.g., veo-3.1-generate-preview) |
prompt | string | ✅ | Text description of the video |
input_reference | string | ❌ | Input image for image-to-video |
seconds | string | ❌ | Duration of the video |
size | string | ❌ | Resolution, mapped to an aspect ratio (1280x720 → 16:9, 720x1280 → 9:16) |
negative_prompt | string | ❌ | What to avoid in the video |
seed | int | ❌ | Seed for reproducibility |
audio | bool | ❌ | Enable audio generation |
video_uri | string | ❌ | GCS video URI for video extension |
Extra Params (any unrecognized JSON field can be passed directly in the request body)
| Key | Notes |
|---|---|
aspectRatio | Override the aspect ratio directly (e.g., "16:9", "9:16"). Takes precedence over size |
resolution | Native Gemini resolution string |
sampleCount | Number of samples to generate |
personGeneration | Person generation policy |
numberOfVideos | Number of videos to generate |
storageURI | GCS bucket for output storage |
compressionQuality | Output compression quality |
enhancePrompt | Auto-enhance the prompt |
resizeMode | How to handle size mismatches |
reference_images | Style/asset reference image objects |
lastFrame | Last frame image object for interpolation |
Response: DeepIntShieldVideoGenerationResponse - id, status, videos[]
If Gemini filters content for safety, status is failed and content_filter describes the reason.
Job Statuses: in_progress → completed / failed
Retrieve / Download
Section titled “Retrieve / Download”| Operation | Endpoint | Notes |
|---|---|---|
| Get status | GET /v1/videos/{id} | Polls the long-running operation |
| Download | GET /v1/videos/{id}/content | Downloads from GCS URI or decodes base64 video |
Video Delete, List, and Remix are not supported.
Content Type Support
Section titled “Content Type Support”DeepIntShield supports the following content modalities through Gemini:
| Content Type | Support | Notes |
|---|---|---|
| Text | ✅ | Full support |
| Images (URL/Base64) | ✅ | URL and base64 data-URL images |
| Video | ✅ | With fps, start/end offset metadata |
| Audio | ⚠️ | Via file references only |
| ✅ | Via file references | |
| Code Execution | ✅ | Auto-executed with results returned |
| Thinking/Reasoning | ✅ | Returned as reasoning output when enabled |
| Function Calls | ✅ | Supported |
Caveats
Section titled “Caveats”Function Call Arguments Serialization
Severity: Low
Behavior: Tool call arguments are returned as a JSON string in tool_calls
Impact: Requires JSON parsing to access arguments
Streaming Finish Reason Timing
Severity: Medium
Behavior: finish_reason and usage are only present in the final stream chunk
Impact: Cannot determine completion until the end of the stream
Cached Content Token Reporting
Severity: Low
Behavior: Cached tokens are reported in prompt_tokens_details.cached_tokens; cache creation vs read cannot be distinguished
Impact: Billing estimates may be approximate