Cohere
Overview
Section titled “Overview”Call Cohere (Command, Embed) models through DeepIntShield using the same OpenAI-compatible Chat Completions, Responses, and Embeddings APIs. You send standard OpenAI-style requests and DeepIntShield handles Cohere’s native format. A few Cohere-specific behaviors are worth knowing:
- Reasoning - the
reasoningobject enables Cohere thinking (see Reasoning / Thinking) - Response format -
text,json_object, andjson_schemaare supported (see Response Format) - Cohere-specific fields - pass extras like
top_korsafety_modeviaextra_params
Supported Operations
Section titled “Supported Operations”| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v2/chat |
| Responses API | ✅ | ✅ | /v2/chat |
| Embeddings | ✅ | - | /v2/embed |
| List Models | ✅ | - | /v1/models |
| Text Completions | ❌ | ❌ | - |
| Image Generation | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
1. Chat Completions
Section titled “1. Chat Completions”Request Parameters
Section titled “Request Parameters”Send standard OpenAI-compatible Chat Completions requests. temperature, frequency_penalty, and presence_penalty pass through directly. The following parameters are not supported by Cohere and are ignored: logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier.
Extra Parameters
Section titled “Extra Parameters”Use extra_params (SDK) or pass directly in the request body (Gateway) for Cohere-specific fields such as top_k, safety_mode, and strict_tool_choice:
curl -X POST https://app.deepintshield.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \ -d '{ "model": "cohere/command-r-plus", "messages": [{"role": "user", "content": "Hello"}], "top_k": 40, "safety_mode": "STRICT", "log_probs": true, "strict_tool_choice": false }'Reasoning / Thinking
Section titled “Reasoning / Thinking”Documentation: See DeepIntShield Reasoning Reference
Use the reasoning object to enable Cohere’s thinking:
reasoning.effortenables or disables thinkingreasoning.max_tokenssets the token budget for thinking
Constraints
Section titled “Constraints”- Minimum budget: 1 token required; requests with 0 tokens disable thinking
- Dynamic budget:
-1is converted to1automatically
{"reasoning": {"effort": "high", "max_tokens": 2048}}Message Content
Section titled “Message Content”- String content: Messages can have simple string content
- Content blocks: Messages can have arrays of content blocks (text, images, thinking)
- Images:
image_urlblocks with a URL are supported - Tools: Assistant tool calls and tool-result messages (with
tool_call_id) are supported
Tool Support
Section titled “Tool Support”Standard OpenAI-style tool definitions are supported. tool_choice values none, auto, and required are supported; forcing a specific tool by name resolves to a required tool call. Strict tool mode (strict: true) is not supported by Cohere and is ignored.
Response Format
Section titled “Response Format”Supported formats:
text- Plain text responsejson_object- Structured JSON responsejson_schema- JSON with schema validation
Pass a JSON schema via the response_format.json_schema field.
Response
Section titled “Response”Responses come back in the standard OpenAI-compatible shape:
finish_reason(stop,length,tool_calls)usage.prompt_tokens/usage.completion_tokens, plusprompt_tokens_details.cached_tokenswhen present- Tool call arguments returned as a JSON string in
tool_calls
Streaming
Section titled “Streaming”Set "stream": true to receive output as standard OpenAI-compatible streaming chunks (content text, reasoning, and tool-call arguments arrive incrementally).
2. Responses API
Section titled “2. Responses API”The Responses API uses the same underlying /v2/chat endpoint. Send standard OpenAI Responses requests; DeepIntShield handles the Cohere format.
Request Parameters
Section titled “Request Parameters”temperature passes through directly. Use extra_params (SDK) or pass directly in the request body (Gateway) for Cohere-specific fields such as top_k, stop, frequency_penalty, and presence_penalty:
curl -X POST https://app.deepintshield.com/v1/responses \ -H "Content-Type: application/json" \ -H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \ -d '{ "model": "cohere/command-r-plus", "input": "Hello, how are you?", "top_k": 40, "stop": [".", "!"] }'Tool Support
Section titled “Tool Support”Supported types: function. Tool behavior is the same as Chat Completions.
Response
Section titled “Response”Responses come back in the standard OpenAI Responses shape:
outputitems: assistant text asmessage, tool calls asfunction_callusage.input_tokens/usage.output_tokens, including cached-token details when present
Streaming
Section titled “Streaming”Set "stream": true to receive output as standard OpenAI Responses streaming events.
3. Embeddings
Section titled “3. Embeddings”Request Parameters
Section titled “Request Parameters”Send standard OpenAI-compatible embeddings requests. Cohere v3+ models require an input_type (defaults to "search_document"). Use extra_params (SDK) or pass directly in the request body (Gateway) for Cohere-specific options such as input_type, embedding_types, and truncate:
curl -X POST https://app.deepintshield.com/v1/embeddings \ -H "Content-Type: application/json" \ -H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \ -d '{ "model": "cohere/embed-english-v3.0", "input": ["text to embed"], "input_type": "search_query", "embedding_types": ["float"], "truncate": "START" }'- Input type required: Cohere v3+ models require the
input_typeparameter (defaults to"search_document") - Embedding types: Specify which embedding types to return (e.g.,
"float","int8")
The response is returned in standard OpenAI-compatible embeddings shape with token usage.
4. List Models
Section titled “4. List Models”Request: GET /v1/models (cursor-based pagination with next_page_token)
Note: endpoint and default_only filters are available via extra_params.