Skip to content

Vertex AI

Vertex AI is Google’s unified ML platform providing access to Gemini models, Anthropic Claude models, and other third-party LLMs through a single API. Call them through DeepIntShield using the same OpenAI-compatible Chat Completions, Responses, and Embeddings APIs. A few things are useful to know when you set up and call Vertex:

  • Model selection - the right behavior is applied automatically from the model name you pass (Gemini vs. Claude)
  • OAuth2 authentication - sign in with a GCP service account; tokens are refreshed for you (see Setup & Configuration)
  • Project & region - endpoints are built from your configured GCP project and region
  • Embeddings - vector generation with task type and truncation options
  • Model listing - List Models returns your custom models plus any foundation models you configure (see Custom vs Non-Custom Models)
OperationNon-StreamingStreamingEndpoint
Chat Completions/generate
Responses API/messages
Embeddings-/embeddings
Image Generation-/generateContent or /predict (Imagen)
Image Edit-/generateContent or /predict (Imagen)
Video Generation-/predictLongRunning (Veo models only)
Image Variation-Not supported
List Models-/models
Text Completions-
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-

Send standard OpenAI-compatible Chat Completions requests. Parameters supported by the underlying model (Gemini or Claude) apply - see the Gemini and Anthropic provider pages. The correct behavior is applied automatically based on the model name.

The key configuration for Vertex requires Google Cloud credentials:

{
"vertex_key_config": {
"project_id": "my-gcp-project",
"region": "us-central1",
"auth_credentials": "{service-account-json}"
}
}

Configuration Details:

  • project_id - GCP project ID (required)
  • region - GCP region for API endpoints (required)
    • Examples: us-central1, us-west1, eu-west1, global
  • auth_credentials - Service account JSON credentials (optional if using default credentials)
  1. Service Account JSON (recommended for production)

    {"auth_credentials": "{full-service-account-json}"}
  2. Application Default Credentials (for local development)

    • Requires GOOGLE_APPLICATION_CREDENTIALS environment variable
    • Leave auth_credentials empty

When using Google’s Gemini models, all Gemini-compatible parameters are supported, including system prompts, tool/function calling, and streaming. See the Gemini provider page for details.

When using Anthropic (Claude) models through Vertex AI, all standard Anthropic parameters are supported, including reasoning/thinking, system messages, and tools. See the Anthropic provider page for details.

A few Vertex-specific notes for Claude:

  • The minimum reasoning budget is 1024 tokens.
  • The anthropic_version is set automatically; you don’t need to provide it.

The region determines the API endpoint:

RegionEndpointPurpose
us-central1us-central1-aiplatform.googleapis.comUS Central
us-west1us-west1-aiplatform.googleapis.comUS West
eu-west1eu-west1-aiplatform.googleapis.comEurope West
globalaiplatform.googleapis.comGlobal (no region prefix)

Availability varies by region. Check GCP documentation for model availability.

Streaming format depends on model type:

  • Gemini models: Standard Gemini streaming with server-sent events
  • Anthropic models: Anthropic message streaming format

The Responses API is available for both Anthropic (Claude) and Gemini models on Vertex AI.

Send standard OpenAI Responses requests with instructions, input (string or array), max_output_tokens, and other parameters supported by the underlying model. The correct behavior is applied automatically based on the model name (Gemini or Claude).

Terminal window
curl -X POST https://app.deepintshield.com/v1/responses \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "vertex/claude-3-5-sonnet",
"input": "What is AI?",
"instructions": "You are a helpful assistant",
"project_id": "my-gcp-project",
"region": "us-central1"
}'

For parameter details, see the Gemini and Anthropic Responses API pages.


Embeddings are supported for Gemini and other models that support embedding generation.

ParameterNotes
inputText to embed (single string or array)
dimensionsOptional output embedding size

Embedding-specific options can be passed directly in the request body:

Terminal window
curl -X POST https://app.deepintshield.com/v1/embeddings \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-004",
"input": ["text to embed"],
"dimensions": 256,
"task_type": "RETRIEVAL_DOCUMENT",
"title": "Document title",
"project_id": "my-gcp-project",
"region": "us-central1",
"autoTruncate": true
}'
ParameterTypeDescription
task_typestringTask type hint: RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING (optional)
titlestringOptional title to help model produce better embeddings (used with task_type)
autoTruncatebooleanAuto-truncate input to max tokens (defaults to true)

Different task types optimize embeddings for specific use cases:

  • RETRIEVAL_DOCUMENT - Optimized for documents in retrieval systems
  • RETRIEVAL_QUERY - Optimized for queries searching documents
  • SEMANTIC_SIMILARITY - Optimized for semantic similarity tasks
  • CLASSIFICATION - For classification tasks
  • CLUSTERING - For clustering tasks

The embeddings response includes vectors and truncation information:

{
"embeddings": [
{
"values": [0.1234, -0.5678, ...],
"statistics": {
"token_count": 15,
"truncated": false
}
}
]
}

Response Fields:

  • values - Embedding vector as floats
  • statistics.token_count - Input token count
  • statistics.truncated - Whether input was truncated due to length

Image Generation is supported for Gemini and Imagen models on Vertex AI. The right behavior and endpoint are selected automatically based on the model name.

The same parameters as Gemini image generation apply, depending on the model:

  • Gemini Models: See Gemini Image Generation
  • Imagen Models: Imagen-specific parameters with size / aspect-ratio support

The region field is used to route the request and is not part of the prompt.

Terminal window
curl -X POST https://app.deepintshield.com/v1/images/generations \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "vertex/imagen-4.0-generate-001",
"prompt": "A sunset over the mountains",
"size": "1024x1024",
"n": 2,
"project_id": "my-gcp-project",
"region": "us-central1"
}'

Same response format as standard Gemini / Imagen image generation.

Image generation streaming is not supported by Vertex AI.


Image Edit is supported for Gemini and Imagen models on Vertex AI. The provider automatically routes to the appropriate format based on the model type.

Request Parameters

ParameterTypeRequiredNotes
modelstringModel identifier (must be Gemini or Imagen model)
promptstringText description of the edit
image[]binaryImage file(s) to edit (supports multiple images)
maskbinaryMask image file
typestringEdit type: "inpainting", "outpainting", "inpaint_removal", "bgswap" (Imagen only)
nintNumber of images to generate (1-10)
output_formatstringOutput format: "png", "webp", "jpeg"
output_compressionintCompression level (0-100%)
seedintSeed for reproducibility (pass as an extra param)
negative_promptstringNegative prompt (pass negativePrompt as an extra param)
maskModestringMask mode (pass as an extra param, Imagen only): "MASK_MODE_USER_PROVIDED", "MASK_MODE_BACKGROUND", "MASK_MODE_FOREGROUND", "MASK_MODE_SEMANTIC"
dilationfloatMask dilation (pass as an extra param, Imagen only): Range [0, 1]
maskClassesint[]Mask classes (pass as an extra param, Imagen only): For MASK_MODE_SEMANTIC

Behavior

Vertex supports the same image edit behavior as Gemini:

Only Gemini and Imagen models are supported; other models return a configuration error. The region field is used to route the request and is not part of the prompt.

Response

Same response format as standard Gemini / Imagen image generation.

Streaming

Image edit streaming is not supported by Vertex AI.

Image Variation

Image variation is not supported by Vertex AI.


None required. Automatically uses project_id and region from key config.

Lists models available in the specified project and region with metadata and deployment information:

{
"models": [
{
"name": "projects/{project}/locations/{region}/models/gemini-2.0-flash",
"display_name": "Gemini 2.0 Flash",
"description": "Fast multimodal model",
"version_id": "1",
"version_aliases": ["latest", "stable"],
"capabilities": [...],
"deployed_models": [...]
}
],
"next_page_token": "..."
}

So that the model list is complete, DeepIntShield combines the API results with the foundation models you configure. The returned list includes:

  • Custom fine-tuned models deployed to your project (returned by the Vertex List Models API).
  • Foundation models from your deployments configuration (e.g. gemini-2.0-flash, claude-3-5-sonnet).
  • Foundation models from your allowedModels list that aren’t in deployments - add a model here to make it appear in the list.

To control which models appear:

  • Leave allowedModels empty to list everything (custom models plus all configured foundation models).
  • Set allowedModels to restrict the list to only those models.

Duplicate model IDs are removed automatically.

Foundation models from your deployments and allowed models are given a readable display name in the list (for example, gemini-pro is shown as “Gemini Pro”).

{
"vertex_key_config": {
"project_id": "my-project",
"region": "us-central1",
"deployments": {
"my-gemini-ft": "1234567890",
"my-claude-ft": "9876543210"
}
}
}

This returns only your custom fine-tuned models from the API.

Model listing is paginated automatically. If more than 100 models exist, next_page_token will be present. DeepIntShield handles pagination internally.


Project ID and Region Required

Severity: High Behavior: Both project_id and region required for all operations Impact: Request fails without valid GCP project/region configuration

OAuth2 Token Management

Severity: Medium Behavior: Tokens cached and automatically refreshed when expired Impact: First request slightly slower due to auth; cached for subsequent requests

Automatic Model Handling

Severity: Medium Behavior: Gemini and Claude models are detected automatically from the model name across Chat and Responses APIs Impact: You don’t need to indicate the underlying model family

Anthropic Version Lock

Severity: Low Behavior: The Anthropic version is set automatically for Claude on Vertex Impact: You cannot override the Anthropic version for Claude on Vertex

List Models API Returns Only Custom Models

Severity: High Behavior: Vertex AI’s List Models API only returns custom fine-tuned models, NOT foundation models Impact: To see foundation models in the list, add them to your deployments or allowedModels configuration Why: This is a Vertex AI API limitation - foundation models must be explicitly configured


HTTP Settings: OAuth2 authentication with automatic token refresh | Region-specific endpoints | Max Connections 5000 | Max Idle 60 seconds

Scope: https://www.googleapis.com/auth/cloud-platform

Endpoint Format: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/{resource}

Note: For global region, endpoint is https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/{resource}

Vertex AI requires project configuration, region selection, and Google Cloud authentication. For detailed instructions, see Provider-Specific Authentication - Google Vertex in the Gateway Quickstart for configuration steps in the Web UI.


Vertex AI routes video generation through Gemini’s Veo models using the predictLongRunning endpoint. All parameters are identical to Gemini Video Generation.

Supported Operations

OperationSupportedNotes
GeneratePOST /v1/videos
RetrieveGET /v1/videos/{id}
DownloadGET /v1/videos/{id}/content
DeleteNot supported
ListNot supported
RemixNot supported