Vertex AI

Overview

Vertex AI is Google’s unified ML platform providing access to Gemini models, Anthropic Claude models, and other third-party LLMs through a single API. Call them through DeepIntShield using the same OpenAI-compatible Chat Completions, Responses, and Embeddings APIs. A few things are useful to know when you set up and call Vertex:

Model selection - the right behavior is applied automatically from the model name you pass (Gemini vs. Claude)
OAuth2 authentication - sign in with a GCP service account; tokens are refreshed for you (see Setup & Configuration)
Project & region - endpoints are built from your configured GCP project and region
Embeddings - vector generation with task type and truncation options
Model listing - List Models returns your custom models plus any foundation models you configure (see Custom vs Non-Custom Models)

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/generate`
Responses API	✅	✅	`/messages`
Embeddings	✅	-	`/embeddings`
Image Generation	✅	-	`/generateContent` or `/predict` (Imagen)
Image Edit	✅	-	`/generateContent` or `/predict` (Imagen)
Video Generation	✅	-	`/predictLongRunning` (Veo models only)
Image Variation	❌	-	Not supported
List Models	✅	-	`/models`
Text Completions	❌	❌	-
Speech (TTS)	❌	❌	-
Transcriptions (STT)	❌	❌	-
Files	❌	❌	-
Batch	❌	❌	-

1. Chat Completions

Request Parameters

Send standard OpenAI-compatible Chat Completions requests. Parameters supported by the underlying model (Gemini or Claude) apply - see the Gemini and Anthropic provider pages. The correct behavior is applied automatically based on the model name.

Key Configuration

The key configuration for Vertex requires Google Cloud credentials:

{
  "vertex_key_config": {
    "project_id": "my-gcp-project",
    "region": "us-central1",
    "auth_credentials": "{service-account-json}"
  }
}

Configuration Details:

project_id - GCP project ID (required)
region - GCP region for API endpoints (required)
- Examples: us-central1, us-west1, eu-west1, global
auth_credentials - Service account JSON credentials (optional if using default credentials)

Authentication Methods

Service Account JSON (recommended for production)

{"auth_credentials": "{full-service-account-json}"}

Application Default Credentials (for local development)
- Requires GOOGLE_APPLICATION_CREDENTIALS environment variable
- Leave auth_credentials empty

Gemini Models

When using Google’s Gemini models, all Gemini-compatible parameters are supported, including system prompts, tool/function calling, and streaming. See the Gemini provider page for details.

Anthropic Models (Claude)

When using Anthropic (Claude) models through Vertex AI, all standard Anthropic parameters are supported, including reasoning/thinking, system messages, and tools. See the Anthropic provider page for details.

A few Vertex-specific notes for Claude:

The minimum reasoning budget is 1024 tokens.
The anthropic_version is set automatically; you don’t need to provide it.

Region Selection

The region determines the API endpoint:

Region	Endpoint	Purpose
`us-central1`	`us-central1-aiplatform.googleapis.com`	US Central
`us-west1`	`us-west1-aiplatform.googleapis.com`	US West
`eu-west1`	`eu-west1-aiplatform.googleapis.com`	Europe West
`global`	`aiplatform.googleapis.com`	Global (no region prefix)

Availability varies by region. Check GCP documentation for model availability.

Streaming

Streaming format depends on model type:

Gemini models: Standard Gemini streaming with server-sent events
Anthropic models: Anthropic message streaming format

2. Responses API

The Responses API is available for both Anthropic (Claude) and Gemini models on Vertex AI.

Request Parameters

Send standard OpenAI Responses requests with instructions, input (string or array), max_output_tokens, and other parameters supported by the underlying model. The correct behavior is applied automatically based on the model name (Gemini or Claude).

curl -X POST https://app.deepintshield.com/v1/responses \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/claude-3-5-sonnet",
    "input": "What is AI?",
    "instructions": "You are a helpful assistant",
    "project_id": "my-gcp-project",
    "region": "us-central1"
  }'

For parameter details, see the Gemini and Anthropic Responses API pages.

3. Embeddings

Embeddings are supported for Gemini and other models that support embedding generation.

Request Parameters

Core Parameters

Parameter	Notes
`input`	Text to embed (single string or array)
`dimensions`	Optional output embedding size

Advanced Parameters

Embedding-specific options can be passed directly in the request body:

curl -X POST https://app.deepintshield.com/v1/embeddings \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-004",
    "input": ["text to embed"],
    "dimensions": 256,
    "task_type": "RETRIEVAL_DOCUMENT",
    "title": "Document title",
    "project_id": "my-gcp-project",
    "region": "us-central1",
    "autoTruncate": true
  }'

Embedding Parameters

Parameter	Type	Description
`task_type`	string	Task type hint: `RETRIEVAL_QUERY`, `RETRIEVAL_DOCUMENT`, `SEMANTIC_SIMILARITY`, `CLASSIFICATION`, `CLUSTERING` (optional)
`title`	string	Optional title to help model produce better embeddings (used with task_type)
`autoTruncate`	boolean	Auto-truncate input to max tokens (defaults to true)

Task Type Effects

Different task types optimize embeddings for specific use cases:

RETRIEVAL_DOCUMENT - Optimized for documents in retrieval systems
RETRIEVAL_QUERY - Optimized for queries searching documents
SEMANTIC_SIMILARITY - Optimized for semantic similarity tasks
CLASSIFICATION - For classification tasks
CLUSTERING - For clustering tasks

Response

The embeddings response includes vectors and truncation information:

{
  "embeddings": [
    {
      "values": [0.1234, -0.5678, ...],
      "statistics": {
        "token_count": 15,
        "truncated": false
      }
    }
  ]
}

Response Fields:

values - Embedding vector as floats
statistics.token_count - Input token count
statistics.truncated - Whether input was truncated due to length

4. Image Generation

Image Generation is supported for Gemini and Imagen models on Vertex AI. The right behavior and endpoint are selected automatically based on the model name.

Request Parameters

The same parameters as Gemini image generation apply, depending on the model:

Gemini Models: See Gemini Image Generation
Imagen Models: Imagen-specific parameters with size / aspect-ratio support

The region field is used to route the request and is not part of the prompt.

curl -X POST https://app.deepintshield.com/v1/images/generations \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex/imagen-4.0-generate-001",
    "prompt": "A sunset over the mountains",
    "size": "1024x1024",
    "n": 2,
    "project_id": "my-gcp-project",
    "region": "us-central1"
  }'

Response

Same response format as standard Gemini / Imagen image generation.

Streaming

Image generation streaming is not supported by Vertex AI.

5. Image Edit

Image Edit is supported for Gemini and Imagen models on Vertex AI. The provider automatically routes to the appropriate format based on the model type.

Request Parameters

Parameter	Type	Required	Notes
`model`	string	✅	Model identifier (must be Gemini or Imagen model)
`prompt`	string	✅	Text description of the edit
`image[]`	binary	✅	Image file(s) to edit (supports multiple images)
`mask`	binary	❌	Mask image file
`type`	string	❌	Edit type: `"inpainting"`, `"outpainting"`, `"inpaint_removal"`, `"bgswap"` (Imagen only)
`n`	int	❌	Number of images to generate (1-10)
`output_format`	string	❌	Output format: `"png"`, `"webp"`, `"jpeg"`
`output_compression`	int	❌	Compression level (0-100%)
`seed`	int	❌	Seed for reproducibility (pass as an extra param)
`negative_prompt`	string	❌	Negative prompt (pass `negativePrompt` as an extra param)
`maskMode`	string	❌	Mask mode (pass as an extra param, Imagen only): `"MASK_MODE_USER_PROVIDED"`, `"MASK_MODE_BACKGROUND"`, `"MASK_MODE_FOREGROUND"`, `"MASK_MODE_SEMANTIC"`
`dilation`	float	❌	Mask dilation (pass as an extra param, Imagen only): Range [0, 1]
`maskClasses`	int[]	❌	Mask classes (pass as an extra param, Imagen only): For `MASK_MODE_SEMANTIC`

Behavior

Vertex supports the same image edit behavior as Gemini:

Gemini Models: See Gemini Image Edit
Imagen Models: Imagen-specific edit types and mask configuration (see Gemini Image Edit)

Only Gemini and Imagen models are supported; other models return a configuration error. The region field is used to route the request and is not part of the prompt.

Response

Same response format as standard Gemini / Imagen image generation.

Streaming

Image edit streaming is not supported by Vertex AI.

Image Variation

Image variation is not supported by Vertex AI.

6. List Models

Request Parameters

None required. Automatically uses project_id and region from key config.

Response

Lists models available in the specified project and region with metadata and deployment information:

{
  "models": [
    {
      "name": "projects/{project}/locations/{region}/models/gemini-2.0-flash",
      "display_name": "Gemini 2.0 Flash",
      "description": "Fast multimodal model",
      "version_id": "1",
      "version_aliases": ["latest", "stable"],
      "capabilities": [...],
      "deployed_models": [...]
    }
  ],
  "next_page_token": "..."
}

Custom vs Non-Custom Models

So that the model list is complete, DeepIntShield combines the API results with the foundation models you configure. The returned list includes:

Custom fine-tuned models deployed to your project (returned by the Vertex List Models API).
Foundation models from your deployments configuration (e.g. gemini-2.0-flash, claude-3-5-sonnet).
Foundation models from your allowedModels list that aren’t in deployments - add a model here to make it appear in the list.

To control which models appear:

Leave allowedModels empty to list everything (custom models plus all configured foundation models).
Set allowedModels to restrict the list to only those models.

Duplicate model IDs are removed automatically.

Foundation models from your deployments and allowed models are given a readable display name in the list (for example, gemini-pro is shown as “Gemini Pro”).

Example Configuration

{
  "vertex_key_config": {
    "project_id": "my-project",
    "region": "us-central1",
    "deployments": {
      "my-gemini-ft": "1234567890",
      "my-claude-ft": "9876543210"
    }
  }
}

This returns only your custom fine-tuned models from the API.

{
  "vertex_key_config": {
    "project_id": "my-project",
    "region": "us-central1",
    "deployments": {
      "gemini-2.0-flash": "gemini-2.0-flash",
      "claude-3-5-sonnet": "claude-3-5-sonnet-v2@20241022"
    }
  }
}

This returns both custom models AND foundation models from deployments.

{
  "vertex_key_config": {
    "project_id": "my-project",
    "region": "us-central1",
    "deployments": {
      "gemini-2.0-flash": "gemini-2.0-flash",
      "claude-3-5-sonnet": "claude-3-5-sonnet-v2@20241022",
      "gemini-2.5-pro": "gemini-2.5-pro"
    },
    "allowedModels": ["gemini-2.0-flash", "claude-3-5-sonnet"]
  }
}

Only returns gemini-2.0-flash and claude-3-5-sonnet, excluding gemini-2.5-pro.

Pagination

Model listing is paginated automatically. If more than 100 models exist, next_page_token will be present. DeepIntShield handles pagination internally.

Caveats

Project ID and Region Required

Severity: High Behavior: Both project_id and region required for all operations Impact: Request fails without valid GCP project/region configuration

OAuth2 Token Management

Severity: Medium Behavior: Tokens cached and automatically refreshed when expired Impact: First request slightly slower due to auth; cached for subsequent requests

Automatic Model Handling

Severity: Medium Behavior: Gemini and Claude models are detected automatically from the model name across Chat and Responses APIs Impact: You don’t need to indicate the underlying model family

Anthropic Version Lock

Severity: Low Behavior: The Anthropic version is set automatically for Claude on Vertex Impact: You cannot override the Anthropic version for Claude on Vertex

List Models API Returns Only Custom Models

Severity: High Behavior: Vertex AI’s List Models API only returns custom fine-tuned models, NOT foundation models Impact: To see foundation models in the list, add them to your deployments or allowedModels configuration Why: This is a Vertex AI API limitation - foundation models must be explicitly configured

Configuration

HTTP Settings: OAuth2 authentication with automatic token refresh | Region-specific endpoints | Max Connections 5000 | Max Idle 60 seconds

Scope: https://www.googleapis.com/auth/cloud-platform

Endpoint Format: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/{resource}

Note: For global region, endpoint is https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/{resource}

Setup & Configuration

Vertex AI requires project configuration, region selection, and Google Cloud authentication. For detailed instructions, see Provider-Specific Authentication - Google Vertex in the Gateway Quickstart for configuration steps in the Web UI.

Video Generation

Vertex AI routes video generation through Gemini’s Veo models using the predictLongRunning endpoint. All parameters are identical to Gemini Video Generation.

Supported Operations

Operation	Supported	Notes
Generate	✅	`POST /v1/videos`
Retrieve	✅	`GET /v1/videos/{id}`
Download	✅	`GET /v1/videos/{id}/content`
Delete	❌	Not supported
List	❌	Not supported
Remix	❌	Not supported