Vertex AI
Overview
Section titled “Overview”Vertex AI is Google’s unified ML platform providing access to Gemini models, Anthropic Claude models, and other third-party LLMs through a single API. Call them through DeepIntShield using the same OpenAI-compatible Chat Completions, Responses, and Embeddings APIs. A few things are useful to know when you set up and call Vertex:
- Model selection - the right behavior is applied automatically from the model name you pass (Gemini vs. Claude)
- OAuth2 authentication - sign in with a GCP service account; tokens are refreshed for you (see Setup & Configuration)
- Project & region - endpoints are built from your configured GCP project and region
- Embeddings - vector generation with task type and truncation options
- Model listing - List Models returns your custom models plus any foundation models you configure (see Custom vs Non-Custom Models)
Supported Operations
Section titled “Supported Operations”| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /generate |
| Responses API | ✅ | ✅ | /messages |
| Embeddings | ✅ | - | /embeddings |
| Image Generation | ✅ | - | /generateContent or /predict (Imagen) |
| Image Edit | ✅ | - | /generateContent or /predict (Imagen) |
| Video Generation | ✅ | - | /predictLongRunning (Veo models only) |
| Image Variation | ❌ | - | Not supported |
| List Models | ✅ | - | /models |
| Text Completions | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
1. Chat Completions
Section titled “1. Chat Completions”Request Parameters
Section titled “Request Parameters”Send standard OpenAI-compatible Chat Completions requests. Parameters supported by the underlying model (Gemini or Claude) apply - see the Gemini and Anthropic provider pages. The correct behavior is applied automatically based on the model name.
Key Configuration
Section titled “Key Configuration”The key configuration for Vertex requires Google Cloud credentials:
{ "vertex_key_config": { "project_id": "my-gcp-project", "region": "us-central1", "auth_credentials": "{service-account-json}" }}Configuration Details:
project_id- GCP project ID (required)region- GCP region for API endpoints (required)- Examples:
us-central1,us-west1,eu-west1,global
- Examples:
auth_credentials- Service account JSON credentials (optional if using default credentials)
Authentication Methods
Section titled “Authentication Methods”-
Service Account JSON (recommended for production)
{"auth_credentials": "{full-service-account-json}"} -
Application Default Credentials (for local development)
- Requires
GOOGLE_APPLICATION_CREDENTIALSenvironment variable - Leave
auth_credentialsempty
- Requires
Gemini Models
Section titled “Gemini Models”When using Google’s Gemini models, all Gemini-compatible parameters are supported, including system prompts, tool/function calling, and streaming. See the Gemini provider page for details.
Anthropic Models (Claude)
Section titled “Anthropic Models (Claude)”When using Anthropic (Claude) models through Vertex AI, all standard Anthropic parameters are supported, including reasoning/thinking, system messages, and tools. See the Anthropic provider page for details.
A few Vertex-specific notes for Claude:
- The minimum reasoning budget is 1024 tokens.
- The
anthropic_versionis set automatically; you don’t need to provide it.
Region Selection
Section titled “Region Selection”The region determines the API endpoint:
| Region | Endpoint | Purpose |
|---|---|---|
us-central1 | us-central1-aiplatform.googleapis.com | US Central |
us-west1 | us-west1-aiplatform.googleapis.com | US West |
eu-west1 | eu-west1-aiplatform.googleapis.com | Europe West |
global | aiplatform.googleapis.com | Global (no region prefix) |
Availability varies by region. Check GCP documentation for model availability.
Streaming
Section titled “Streaming”Streaming format depends on model type:
- Gemini models: Standard Gemini streaming with server-sent events
- Anthropic models: Anthropic message streaming format
2. Responses API
Section titled “2. Responses API”The Responses API is available for both Anthropic (Claude) and Gemini models on Vertex AI.
Request Parameters
Section titled “Request Parameters”Send standard OpenAI Responses requests with instructions, input (string or array), max_output_tokens, and other parameters supported by the underlying model. The correct behavior is applied automatically based on the model name (Gemini or Claude).
curl -X POST https://app.deepintshield.com/v1/responses \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "vertex/claude-3-5-sonnet", "input": "What is AI?", "instructions": "You are a helpful assistant", "project_id": "my-gcp-project", "region": "us-central1" }'For parameter details, see the Gemini and Anthropic Responses API pages.
3. Embeddings
Section titled “3. Embeddings”Embeddings are supported for Gemini and other models that support embedding generation.
Request Parameters
Section titled “Request Parameters”Core Parameters
Section titled “Core Parameters”| Parameter | Notes |
|---|---|
input | Text to embed (single string or array) |
dimensions | Optional output embedding size |
Advanced Parameters
Section titled “Advanced Parameters”Embedding-specific options can be passed directly in the request body:
curl -X POST https://app.deepintshield.com/v1/embeddings \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "text-embedding-004", "input": ["text to embed"], "dimensions": 256, "task_type": "RETRIEVAL_DOCUMENT", "title": "Document title", "project_id": "my-gcp-project", "region": "us-central1", "autoTruncate": true }'Embedding Parameters
Section titled “Embedding Parameters”| Parameter | Type | Description |
|---|---|---|
task_type | string | Task type hint: RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING (optional) |
title | string | Optional title to help model produce better embeddings (used with task_type) |
autoTruncate | boolean | Auto-truncate input to max tokens (defaults to true) |
Task Type Effects
Section titled “Task Type Effects”Different task types optimize embeddings for specific use cases:
RETRIEVAL_DOCUMENT- Optimized for documents in retrieval systemsRETRIEVAL_QUERY- Optimized for queries searching documentsSEMANTIC_SIMILARITY- Optimized for semantic similarity tasksCLASSIFICATION- For classification tasksCLUSTERING- For clustering tasks
Response
Section titled “Response”The embeddings response includes vectors and truncation information:
{ "embeddings": [ { "values": [0.1234, -0.5678, ...], "statistics": { "token_count": 15, "truncated": false } } ]}Response Fields:
values- Embedding vector as floatsstatistics.token_count- Input token countstatistics.truncated- Whether input was truncated due to length
4. Image Generation
Section titled “4. Image Generation”Image Generation is supported for Gemini and Imagen models on Vertex AI. The right behavior and endpoint are selected automatically based on the model name.
Request Parameters
Section titled “Request Parameters”The same parameters as Gemini image generation apply, depending on the model:
- Gemini Models: See Gemini Image Generation
- Imagen Models: Imagen-specific parameters with size / aspect-ratio support
The region field is used to route the request and is not part of the prompt.
curl -X POST https://app.deepintshield.com/v1/images/generations \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "vertex/imagen-4.0-generate-001", "prompt": "A sunset over the mountains", "size": "1024x1024", "n": 2, "project_id": "my-gcp-project", "region": "us-central1" }'Response
Section titled “Response”Same response format as standard Gemini / Imagen image generation.
Streaming
Section titled “Streaming”Image generation streaming is not supported by Vertex AI.
5. Image Edit
Section titled “5. Image Edit”Image Edit is supported for Gemini and Imagen models on Vertex AI. The provider automatically routes to the appropriate format based on the model type.
Request Parameters
| Parameter | Type | Required | Notes |
|---|---|---|---|
model | string | ✅ | Model identifier (must be Gemini or Imagen model) |
prompt | string | ✅ | Text description of the edit |
image[] | binary | ✅ | Image file(s) to edit (supports multiple images) |
mask | binary | ❌ | Mask image file |
type | string | ❌ | Edit type: "inpainting", "outpainting", "inpaint_removal", "bgswap" (Imagen only) |
n | int | ❌ | Number of images to generate (1-10) |
output_format | string | ❌ | Output format: "png", "webp", "jpeg" |
output_compression | int | ❌ | Compression level (0-100%) |
seed | int | ❌ | Seed for reproducibility (pass as an extra param) |
negative_prompt | string | ❌ | Negative prompt (pass negativePrompt as an extra param) |
maskMode | string | ❌ | Mask mode (pass as an extra param, Imagen only): "MASK_MODE_USER_PROVIDED", "MASK_MODE_BACKGROUND", "MASK_MODE_FOREGROUND", "MASK_MODE_SEMANTIC" |
dilation | float | ❌ | Mask dilation (pass as an extra param, Imagen only): Range [0, 1] |
maskClasses | int[] | ❌ | Mask classes (pass as an extra param, Imagen only): For MASK_MODE_SEMANTIC |
Behavior
Vertex supports the same image edit behavior as Gemini:
- Gemini Models: See Gemini Image Edit
- Imagen Models: Imagen-specific edit types and mask configuration (see Gemini Image Edit)
Only Gemini and Imagen models are supported; other models return a configuration error. The region field is used to route the request and is not part of the prompt.
Response
Same response format as standard Gemini / Imagen image generation.
Streaming
Image edit streaming is not supported by Vertex AI.
Image Variation
Image variation is not supported by Vertex AI.
6. List Models
Section titled “6. List Models”Request Parameters
Section titled “Request Parameters”None required. Automatically uses project_id and region from key config.
Response
Section titled “Response”Lists models available in the specified project and region with metadata and deployment information:
{ "models": [ { "name": "projects/{project}/locations/{region}/models/gemini-2.0-flash", "display_name": "Gemini 2.0 Flash", "description": "Fast multimodal model", "version_id": "1", "version_aliases": ["latest", "stable"], "capabilities": [...], "deployed_models": [...] } ], "next_page_token": "..."}Custom vs Non-Custom Models
Section titled “Custom vs Non-Custom Models”So that the model list is complete, DeepIntShield combines the API results with the foundation models you configure. The returned list includes:
- Custom fine-tuned models deployed to your project (returned by the Vertex List Models API).
- Foundation models from your
deploymentsconfiguration (e.g.gemini-2.0-flash,claude-3-5-sonnet). - Foundation models from your
allowedModelslist that aren’t indeployments- add a model here to make it appear in the list.
To control which models appear:
- Leave
allowedModelsempty to list everything (custom models plus all configured foundation models). - Set
allowedModelsto restrict the list to only those models.
Duplicate model IDs are removed automatically.
Foundation models from your deployments and allowed models are given a readable display name in the list (for example, gemini-pro is shown as “Gemini Pro”).
Example Configuration
Section titled “Example Configuration”{ "vertex_key_config": { "project_id": "my-project", "region": "us-central1", "deployments": { "my-gemini-ft": "1234567890", "my-claude-ft": "9876543210" } }}This returns only your custom fine-tuned models from the API.
{ "vertex_key_config": { "project_id": "my-project", "region": "us-central1", "deployments": { "gemini-2.0-flash": "gemini-2.0-flash", "claude-3-5-sonnet": "claude-3-5-sonnet-v2@20241022" } }}This returns both custom models AND foundation models from deployments.
{ "vertex_key_config": { "project_id": "my-project", "region": "us-central1", "deployments": { "gemini-2.0-flash": "gemini-2.0-flash", "claude-3-5-sonnet": "claude-3-5-sonnet-v2@20241022", "gemini-2.5-pro": "gemini-2.5-pro" }, "allowedModels": ["gemini-2.0-flash", "claude-3-5-sonnet"] }}Only returns gemini-2.0-flash and claude-3-5-sonnet, excluding gemini-2.5-pro.
Pagination
Section titled “Pagination”Model listing is paginated automatically. If more than 100 models exist, next_page_token will be present. DeepIntShield handles pagination internally.
Caveats
Section titled “Caveats”Project ID and Region Required
Severity: High Behavior: Both project_id and region required for all operations Impact: Request fails without valid GCP project/region configuration
OAuth2 Token Management
Severity: Medium Behavior: Tokens cached and automatically refreshed when expired Impact: First request slightly slower due to auth; cached for subsequent requests
Automatic Model Handling
Severity: Medium Behavior: Gemini and Claude models are detected automatically from the model name across Chat and Responses APIs Impact: You don’t need to indicate the underlying model family
Anthropic Version Lock
Severity: Low Behavior: The Anthropic version is set automatically for Claude on Vertex Impact: You cannot override the Anthropic version for Claude on Vertex
List Models API Returns Only Custom Models
Severity: High
Behavior: Vertex AI’s List Models API only returns custom fine-tuned models, NOT foundation models
Impact: To see foundation models in the list, add them to your deployments or allowedModels configuration
Why: This is a Vertex AI API limitation - foundation models must be explicitly configured
Configuration
Section titled “Configuration”HTTP Settings: OAuth2 authentication with automatic token refresh | Region-specific endpoints | Max Connections 5000 | Max Idle 60 seconds
Scope: https://www.googleapis.com/auth/cloud-platform
Endpoint Format: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/{resource}
Note: For global region, endpoint is https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/{resource}
Setup & Configuration
Section titled “Setup & Configuration”Vertex AI requires project configuration, region selection, and Google Cloud authentication. For detailed instructions, see Provider-Specific Authentication - Google Vertex in the Gateway Quickstart for configuration steps in the Web UI.
Video Generation
Section titled “Video Generation”Vertex AI routes video generation through Gemini’s Veo models using the predictLongRunning endpoint. All parameters are identical to Gemini Video Generation.
Supported Operations
| Operation | Supported | Notes |
|---|---|---|
| Generate | ✅ | POST /v1/videos |
| Retrieve | ✅ | GET /v1/videos/{id} |
| Download | ✅ | GET /v1/videos/{id}/content |
| Delete | ❌ | Not supported |
| List | ❌ | Not supported |
| Remix | ❌ | Not supported |