The Together AI Python SDK is officially out of beta with the v1 release! It provides great OpenAI compatible APIs to:
- Run inference on chat, language, code, moderation, and image models
- Fine-tune models (including Llama 3) with your own data
- Generate embeddings from text for RAG applications
v1 comes with several improvements including a new more intuitive fully OpenAI compatible API, async support, messages support, more thorough tests, and better error handling. Upgrade to v1 by running pip install --upgrade together.
Chat Completions
To use any of the 60+ chat models we support, you can run the following code:
import os
from together import Together
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
response = client.chat.completions.create(
model="meta-llama/Llama-3-8b-chat-hf",
messages=[{"role": "user", "content": "tell me about new york"}],
)
print(response.choices[0].message.content)
Streaming
To stream back a response, simply specify stream=True.
import os
from together import Together
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
stream = client.chat.completions.create(
model="meta-llama/Llama-3-8b-chat-hf",
messages=[{"role": "user", "content": "tell me about new york"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
Completions
To run completions on our code and language models, do the following:
import os
from together import Together
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
response = client.completions.create(
model="codellama/CodeLlama-70b-Python-hf",
prompt="def bubble_sort(): ",
)
print(response.choices[0].text)
Image Models
To use our image models, run the following:
import os
from together import Together
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
response = client.images.generate(
prompt="space robots",
model="stabilityai/stable-diffusion-xl-base-1.0",
steps=10,
n=2,
)
print(response.data[0].b64_json)
Embeddings
To generate embeddings with any of our embedding models, do the following:
import os
from together import Together
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
text = "Our solar system orbits the Milky Way galaxy at about 515,000 mph"
embeddingModel = 'togethercomputer/m2-bert-80M-8k-retrieval'
embeddings = client.embeddings.create(model=embeddingModel, input=text)
print(embeddings)
Async Support
We now have async support! Here’s what that looks like for chat completions:
import os, asyncio
from together import AsyncTogether
async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
messages = [
"What are the top things to do in San Francisco?",
"What country is Paris in?",
]
async def async_chat_completion(messages):
async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
tasks = [
async_client.chat.completions.create(
model="meta-llama/Llama-3-70b-chat-hf",
messages=[{"role": "user", "content": message}],
)
for message in messages
]
responses = await asyncio.gather(*tasks)
for response in responses:
print(response.choices[0].message.content)
asyncio.run(async_chat_completion(messages))
See this example to see async support for completions.
Fine-tuning
We also provide the ability to fine-tune models through our SDK or CLI, including the newly released Llama 3 models. Simply upload a file in JSONL format and create a fine-tuning job as seen in the code below:
import os
from together import Together
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
# Uploads a jsonl file and returns the ID
uploaded_file = client.files.upload(file="somedata.jsonl")
# Creates a fine-tuning job
client.fine_tuning.create(
training_file=uploaded_file.id,
model="meta-llama/Meta-Llama-3-8B-Instruct",
n_epochs=3,
# wandb_api_key="1a2b3c4d5e.......",
)
For more about fine-tuning, including data formats, check out our finetuning docs.
Learn more in our documentation and python library on GitHub. We’re also actively working on a similar TypeScript SDK that will be out in the coming weeks as well!