December 11, 2023



Today, Mistral released Mixtral 8x7B, a high-quality sparse mixture of experts model (SMoE) with open weights.

Mixtral-8x7b-32kseqlen, DiscoLM-mixtral-8x7b-v2 and are now live on our inference platform! We have optimized the Together Inference Engine for Mixtral and it is available at up to 100 token/s for $0.0006/1K tokens — to our knowledge the fastest performance at the lowest price!

Chat with it in our playground:

Or use this code snippet: 

curl -X POST \
      -H 'Content-Type: application/json' \
      -H "Authorization: Bearer $TOGETHER_API_KEY"\
      -d '{
      "model": "DiscoResearch/DiscoLM-mixtral-8x7b-v2",
      "max_tokens": 512,
      "prompt": "<|im_start|>user\nTell me about San Francisco<|im_end|>\n<|im_start|>assistant",
      "temperature": 0.7,
      "top_p": 0.7,
      "top_k": 50,
      "repetition_penalty": 1,
      "stream_tokens": true,
      "stop": [

More on Mixtral

Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.


  • Handles a context of 32k tokens.
  • Handles English, French, Italian, German and Spanish.
  • Shows strong performance in code generation.
  • Can be finetuned into an instruction-following model that achieves a score of 8.3 on MT-Bench.

Transitioning from OpenAI?

Here’s how simple it is to switch from Open AI to Together’s Mixtral serverless endpoint -

import openai
import os

client = openai.OpenAI(

chat_completion =
           "role": "user",
           "content": "Tell me about San Francisco",


Simply add your "TOGETHER_API_KEY" (which you can find here), change the base URL to:, and the model name to one of our 100+ open source models, and you'll be off to the races!

