Cohere

Overview

Call Cohere (Command, Embed) models through DeepIntShield using the same OpenAI-compatible Chat Completions, Responses, and Embeddings APIs. You send standard OpenAI-style requests and DeepIntShield handles Cohere’s native format. A few Cohere-specific behaviors are worth knowing:

Reasoning - the reasoning object enables Cohere thinking (see Reasoning / Thinking)
Response format - text, json_object, and json_schema are supported (see Response Format)
Cohere-specific fields - pass extras like top_k or safety_mode via extra_params

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/v2/chat`
Responses API	✅	✅	`/v2/chat`
Embeddings	✅	-	`/v2/embed`
List Models	✅	-	`/v1/models`
Text Completions	❌	❌	-
Image Generation	❌	❌	-
Speech (TTS)	❌	❌	-
Transcriptions (STT)	❌	❌	-
Files	❌	❌	-
Batch	❌	❌	-

1. Chat Completions

Request Parameters

Send standard OpenAI-compatible Chat Completions requests. temperature, frequency_penalty, and presence_penalty pass through directly. The following parameters are not supported by Cohere and are ignored: logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier.

Extra Parameters

Use extra_params (SDK) or pass directly in the request body (Gateway) for Cohere-specific fields such as top_k, safety_mode, and strict_tool_choice:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
  -d '{
    "model": "cohere/command-r-plus",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40,
    "safety_mode": "STRICT",
    "log_probs": true,
    "strict_tool_choice": false
  }'

Reasoning / Thinking

Documentation: See DeepIntShield Reasoning Reference

Use the reasoning object to enable Cohere’s thinking:

reasoning.effort enables or disables thinking
reasoning.max_tokens sets the token budget for thinking

Constraints

Minimum budget: 1 token required; requests with 0 tokens disable thinking
Dynamic budget: -1 is converted to 1 automatically

{"reasoning": {"effort": "high", "max_tokens": 2048}}

Message Content

String content: Messages can have simple string content
Content blocks: Messages can have arrays of content blocks (text, images, thinking)
Images: image_url blocks with a URL are supported
Tools: Assistant tool calls and tool-result messages (with tool_call_id) are supported

Tool Support

Standard OpenAI-style tool definitions are supported. tool_choice values none, auto, and required are supported; forcing a specific tool by name resolves to a required tool call. Strict tool mode (strict: true) is not supported by Cohere and is ignored.

Response Format

Supported formats:

text - Plain text response
json_object - Structured JSON response
json_schema - JSON with schema validation

Pass a JSON schema via the response_format.json_schema field.

Response

Responses come back in the standard OpenAI-compatible shape:

finish_reason (stop, length, tool_calls)
usage.prompt_tokens / usage.completion_tokens, plus prompt_tokens_details.cached_tokens when present
Tool call arguments returned as a JSON string in tool_calls

Streaming

Set "stream": true to receive output as standard OpenAI-compatible streaming chunks (content text, reasoning, and tool-call arguments arrive incrementally).

2. Responses API

The Responses API uses the same underlying /v2/chat endpoint. Send standard OpenAI Responses requests; DeepIntShield handles the Cohere format.

Request Parameters

temperature passes through directly. Use extra_params (SDK) or pass directly in the request body (Gateway) for Cohere-specific fields such as top_k, stop, frequency_penalty, and presence_penalty:

curl -X POST https://app.deepintshield.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
  -d '{
    "model": "cohere/command-r-plus",
    "input": "Hello, how are you?",
    "top_k": 40,
    "stop": [".", "!"]
  }'

Tool Support

Supported types: function. Tool behavior is the same as Chat Completions.

Response

Responses come back in the standard OpenAI Responses shape:

output items: assistant text as message, tool calls as function_call
usage.input_tokens / usage.output_tokens, including cached-token details when present

Streaming

Set "stream": true to receive output as standard OpenAI Responses streaming events.

3. Embeddings

Request Parameters

Send standard OpenAI-compatible embeddings requests. Cohere v3+ models require an input_type (defaults to "search_document"). Use extra_params (SDK) or pass directly in the request body (Gateway) for Cohere-specific options such as input_type, embedding_types, and truncate:

curl -X POST https://app.deepintshield.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
  -d '{
    "model": "cohere/embed-english-v3.0",
    "input": ["text to embed"],
    "input_type": "search_query",
    "embedding_types": ["float"],
    "truncate": "START"
  }'

Notes

Input type required: Cohere v3+ models require the input_type parameter (defaults to "search_document")
Embedding types: Specify which embedding types to return (e.g., "float", "int8")

The response is returned in standard OpenAI-compatible embeddings shape with token usage.

4. List Models

Request: GET /v1/models (cursor-based pagination with next_page_token)

Note: endpoint and default_only filters are available via extra_params.