Announcing v1 of our Python SDK

The Together AI Python SDK is officially out of beta with the v1 release! It provides great OpenAI compatible APIs to:

Run inference on chat, language, code, moderation, and image models
Fine-tune models (including Llama 3) with your own data
Generate embeddings from text for RAG applications

v1 comes with several improvements including a new more intuitive fully OpenAI compatible API, async support, messages support, more thorough tests, and better error handling. Upgrade to v1 by running pip install --upgrade together.

Chat Completions

To use any of the 60+ chat models we support, you can run the following code:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

response = client.chat.completions.create(
    model="meta-llama/Llama-3-8b-chat-hf",
    messages=[{"role": "user", "content": "tell me about new york"}],
)
print(response.choices[0].message.content)

Streaming

To stream back a response, simply specify stream=True.

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
stream = client.chat.completions.create(
    model="meta-llama/Llama-3-8b-chat-hf",
    messages=[{"role": "user", "content": "tell me about new york"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Completions

To run completions on our code and language models, do the following:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

response = client.completions.create(
    model="codellama/CodeLlama-70b-Python-hf",
    prompt="def bubble_sort(): ",
)
print(response.choices[0].text)

Image Models

To use our image models, run the following:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

response = client.images.generate(
    prompt="space robots",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    steps=10,
    n=2,
)
print(response.data[0].b64_json)

Embeddings

To generate embeddings with any of our embedding models, do the following:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

text = "Our solar system orbits the Milky Way galaxy at about 515,000 mph"
embeddingModel = 'togethercomputer/m2-bert-80M-8k-retrieval'
embeddings = client.embeddings.create(model=embeddingModel, input=text)

print(embeddings)

Async Support

We now have async support! Here’s what that looks like for chat completions:

import os, asyncio
from together import AsyncTogether

async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
messages = [
    "What are the top things to do in San Francisco?",
    "What country is Paris in?",
]

async def async_chat_completion(messages):
    async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Llama-3-70b-chat-hf",
            messages=[{"role": "user", "content": message}],
        )
        for message in messages
    ]
    responses = await asyncio.gather(*tasks)

    for response in responses:
        print(response.choices[0].message.content)

asyncio.run(async_chat_completion(messages))

See this example to see async support for completions.

Fine-tuning

We also provide the ability to fine-tune models through our SDK or CLI, including the newly released Llama 3 models. Simply upload a file in JSONL format and create a fine-tuning job as seen in the code below:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

# Uploads a jsonl file and returns the ID
uploaded_file = client.files.upload(file="somedata.jsonl")

# Creates a fine-tuning job
client.fine_tuning.create(
    training_file=uploaded_file.id,
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    n_epochs=3,
    # wandb_api_key="1a2b3c4d5e.......",
)

For more about fine-tuning, including data formats, check out our finetuning docs.

Learn more in our documentation and python library on GitHub. We’re also actively working on a similar TypeScript SDK that will be out in the coming weeks as well!