This website uses cookies to anonymously analyze website traffic using Google Analytics.
Company

Can you feel the MoE? Mixtral available with over 100 tokens per second through Together Platform!

December 11, 2023

By 

Together

Today, Mistral released Mixtral 8x7B, a high-quality sparse mixture of experts model (SMoE) with open weights.

Mixtral-8x7b-32kseqlen, DiscoLM-mixtral-8x7b-v2 and are now live on our inference platform! We have optimized the Together Inference Engine for Mixtral and it is available at up to 100 token/s for $0.0006/1K tokens — to our knowledge the fastest performance at the lowest price!

Chat with it in our playground:

Or use this code snippet: 

curl -X POST https://api.together.xyz/inference \
      -H 'Content-Type: application/json' \
      -H "Authorization: Bearer $TOGETHER_API_KEY"\
      -d '{
      "model": "DiscoResearch/DiscoLM-mixtral-8x7b-v2",
      "max_tokens": 512,
      "prompt": "<|im_start|>user\nTell me about San Francisco<|im_end|>\n<|im_start|>assistant",
      "temperature": 0.7,
      "top_p": 0.7,
      "top_k": 50,
      "repetition_penalty": 1,
      "stream_tokens": true,
      "stop": [
        "<|im_end|>",
        "<|im_start|>"
      ]
    }'

More on Mixtral

Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.

Mixtral...

  • Handles a context of 32k tokens.
  • Handles English, French, Italian, German and Spanish.
  • Shows strong performance in code generation.
  • Can be finetuned into an instruction-following model that achieves a score of 8.3 on MT-Bench.

Transitioning from OpenAI?

Here’s how simple it is to switch from Open AI to Together’s Mixtral serverless endpoint -


import openai
import os

client = openai.OpenAI(
    api_key=os.environ.get("TOGETHER_API_KEY"),
    base_url='https://api.together.xyz',
)

chat_completion = client.chat.completions.create(
    messages=[
        {
           "role": "user",
           "content": "Tell me about San Francisco",
        }
    ],
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
)

print(chat_completion.choices[0].message.content)

Simply add your "TOGETHER_API_KEY" (which you can find here), change the base URL to: https://api.together.xyz, and the model name to one of our 100+ open source models, and you'll be off to the races!

  • Lower
    Cost
    20%
  • faster
    training
    4x
  • network
    compression
    117x
Start
building
yours
here →