Skip to content

Cohere

Call Cohere (Command, Embed) models through DeepIntShield using the same OpenAI-compatible Chat Completions, Responses, and Embeddings APIs. You send standard OpenAI-style requests and DeepIntShield handles Cohere’s native format. A few Cohere-specific behaviors are worth knowing:

  • Reasoning - the reasoning object enables Cohere thinking (see Reasoning / Thinking)
  • Response format - text, json_object, and json_schema are supported (see Response Format)
  • Cohere-specific fields - pass extras like top_k or safety_mode via extra_params
OperationNon-StreamingStreamingEndpoint
Chat Completions/v2/chat
Responses API/v2/chat
Embeddings-/v2/embed
List Models-/v1/models
Text Completions-
Image Generation-
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-

Send standard OpenAI-compatible Chat Completions requests. temperature, frequency_penalty, and presence_penalty pass through directly. The following parameters are not supported by Cohere and are ignored: logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier.

Use extra_params (SDK) or pass directly in the request body (Gateway) for Cohere-specific fields such as top_k, safety_mode, and strict_tool_choice:

Terminal window
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
-d '{
"model": "cohere/command-r-plus",
"messages": [{"role": "user", "content": "Hello"}],
"top_k": 40,
"safety_mode": "STRICT",
"log_probs": true,
"strict_tool_choice": false
}'

Documentation: See DeepIntShield Reasoning Reference

Use the reasoning object to enable Cohere’s thinking:

  • reasoning.effort enables or disables thinking
  • reasoning.max_tokens sets the token budget for thinking
  • Minimum budget: 1 token required; requests with 0 tokens disable thinking
  • Dynamic budget: -1 is converted to 1 automatically
{"reasoning": {"effort": "high", "max_tokens": 2048}}
  • String content: Messages can have simple string content
  • Content blocks: Messages can have arrays of content blocks (text, images, thinking)
  • Images: image_url blocks with a URL are supported
  • Tools: Assistant tool calls and tool-result messages (with tool_call_id) are supported

Standard OpenAI-style tool definitions are supported. tool_choice values none, auto, and required are supported; forcing a specific tool by name resolves to a required tool call. Strict tool mode (strict: true) is not supported by Cohere and is ignored.

Supported formats:

  • text - Plain text response
  • json_object - Structured JSON response
  • json_schema - JSON with schema validation

Pass a JSON schema via the response_format.json_schema field.

Responses come back in the standard OpenAI-compatible shape:

  • finish_reason (stop, length, tool_calls)
  • usage.prompt_tokens / usage.completion_tokens, plus prompt_tokens_details.cached_tokens when present
  • Tool call arguments returned as a JSON string in tool_calls

Set "stream": true to receive output as standard OpenAI-compatible streaming chunks (content text, reasoning, and tool-call arguments arrive incrementally).


The Responses API uses the same underlying /v2/chat endpoint. Send standard OpenAI Responses requests; DeepIntShield handles the Cohere format.

temperature passes through directly. Use extra_params (SDK) or pass directly in the request body (Gateway) for Cohere-specific fields such as top_k, stop, frequency_penalty, and presence_penalty:

Terminal window
curl -X POST https://app.deepintshield.com/v1/responses \
-H "Content-Type: application/json" \
-H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
-d '{
"model": "cohere/command-r-plus",
"input": "Hello, how are you?",
"top_k": 40,
"stop": [".", "!"]
}'

Supported types: function. Tool behavior is the same as Chat Completions.

Responses come back in the standard OpenAI Responses shape:

  • output items: assistant text as message, tool calls as function_call
  • usage.input_tokens / usage.output_tokens, including cached-token details when present

Set "stream": true to receive output as standard OpenAI Responses streaming events.


Send standard OpenAI-compatible embeddings requests. Cohere v3+ models require an input_type (defaults to "search_document"). Use extra_params (SDK) or pass directly in the request body (Gateway) for Cohere-specific options such as input_type, embedding_types, and truncate:

Terminal window
curl -X POST https://app.deepintshield.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
-d '{
"model": "cohere/embed-english-v3.0",
"input": ["text to embed"],
"input_type": "search_query",
"embedding_types": ["float"],
"truncate": "START"
}'
  • Input type required: Cohere v3+ models require the input_type parameter (defaults to "search_document")
  • Embedding types: Specify which embedding types to return (e.g., "float", "int8")

The response is returned in standard OpenAI-compatible embeddings shape with token usage.


Request: GET /v1/models (cursor-based pagination with next_page_token)

Note: endpoint and default_only filters are available via extra_params.