Models /  /  / Orpheus TTS API

Orpheus TTS API

Human-level speech generation with natural emotion and intonation

Try Now
Orpheus TTS

This model is not currently supported on Together AI.

Visit our Models page to view all the latest models.

Orpheus TTS is a breakthrough speech-LLM family built on Llama-3B that achieves human-level speech generation with natural emotion and intonation. Trained on 100k+ hours of English speech data, Orpheus demonstrates that open-source TTS can finally compete with—and surpass—closed-source models like ElevenLabs and PlayHT in real-world quality.

~200ms
Streaming Latency
Realtime conversational AI (reducible to ~100ms)
100K+
Training Hours
English speech + billions of text tokens
Zero
Shot Voice Cloning
Emergent capability without explicit training
Why Orpheus?
Superior to Closed-Source: Beats ElevenLabs & PlayHT in natural intonation and emotion
Production-Ready Streaming: Faster than realtime playback on A100 for conversational AI
Flexible Model Sizes: 3B, 1B, 400M, 150M parameters—deploy anywhere from edge to cloud
Guided Emotion Control: Simple tags for laugh, sigh, cry, gasp—no complex prompt engineering

Orpheus TTS API Usage

Endpoint

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "canopylabs/orpheus-tts-0-1-finetune-prod",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'
curl -X POST "https://api.together.xyz/v1/images/generations" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "canopylabs/orpheus-tts-0-1-finetune-prod",
    "prompt": "Draw an anime style version of this image.",
    "width": 1024,
    "height": 768,
    "steps": 28,
    "n": 1,
    "response_format": "url",
    "image_url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
  }'
curl -X POST https://api.together.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "canopylabs/orpheus-tts-0-1-finetune-prod",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe what you see in this image."},
        {"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}}
      ]
    }],
    "max_tokens": 512
  }'
curl -X POST https://api.together.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "canopylabs/orpheus-tts-0-1-finetune-prod",
    "messages": [{
      "role": "user",
      "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
    }]
  }'
curl -X POST https://api.together.xyz/v1/rerank \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "canopylabs/orpheus-tts-0-1-finetune-prod",
    "query": "What animals can I find near Peru?",
    "documents": [
      "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
      "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
      "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
      "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations."
    ],
    "top_n": 2
  }'
curl -X POST https://api.together.xyz/v1/embeddings \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Our solar system orbits the Milky Way galaxy at about 515,000 mph.",
    "model": "canopylabs/orpheus-tts-0-1-finetune-prod"
  }'
curl -X POST https://api.together.xyz/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -d '{
    "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    "prompt": "A horse is a horse",
    "max_tokens": 32,
    "temperature": 0.1,
    "safety_model": "canopylabs/orpheus-tts-0-1-finetune-prod"
  }'
curl --location 'https://api.together.ai/v1/audio/generations' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer $TOGETHER_API_KEY' \
  --output speech.mp3 \
  --data '{
    "input": "Today is a wonderful day to build something people love!",
    "voice": "helpful woman",
    "response_format": "mp3",
    "sample_rate": 44100,
    "stream": false,
    "model": "canopylabs/orpheus-tts-0-1-finetune-prod"
  }'
curl -X POST "https://api.together.xyz/v1/audio/transcriptions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -F "model=canopylabs/orpheus-tts-0-1-finetune-prod" \
  -F "language=en" \
  -F "response_format=json" \
  -F "timestamp_granularities=segment"
curl --request POST \
  --url https://api.together.xyz/v2/videos \
  --header "Authorization: Bearer $TOGETHER_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "canopylabs/orpheus-tts-0-1-finetune-prod",
    "prompt": "some penguins building a snowman"
  }'
curl --request POST \
  --url https://api.together.xyz/v2/videos \
  --header "Authorization: Bearer $TOGETHER_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "canopylabs/orpheus-tts-0-1-finetune-prod",
    "frame_images": [{"input_image": "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg"}]
  }'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="canopylabs/orpheus-tts-0-1-finetune-prod",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)
from together import Together

client = Together()

imageCompletion = client.images.generate(
    model="canopylabs/orpheus-tts-0-1-finetune-prod",
    width=1024,
    height=768,
    steps=28,
    prompt="Draw an anime style version of this image.",
    image_url="https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
)

print(imageCompletion.data[0].url)


from together import Together

client = Together()

response = client.chat.completions.create(
    model="canopylabs/orpheus-tts-0-1-finetune-prod",
    messages=[{
    	"role": "user",
      "content": [
        {"type": "text", "text": "Describe what you see in this image."},
        {"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}}
      ]
    }]
)
print(response.choices[0].message.content)

from together import Together

client = Together()
response = client.chat.completions.create(
  model="canopylabs/orpheus-tts-0-1-finetune-prod",
  messages=[
  	{
	    "role": "user", 
      "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
    }
 ],
)

print(response.choices[0].message.content)

from together import Together

client = Together()

query = "What animals can I find near Peru?"

documents = [
  "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
  "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
  "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
  "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations.",
]

response = client.rerank.create(
  model="canopylabs/orpheus-tts-0-1-finetune-prod",
  query=query,
  documents=documents,
  top_n=2
)

for result in response.results:
    print(f"Relevance Score: {result.relevance_score}")

from together import Together

client = Together()

response = client.embeddings.create(
  model = "canopylabs/orpheus-tts-0-1-finetune-prod",
  input = "Our solar system orbits the Milky Way galaxy at about 515,000 mph"
)

from together import Together

client = Together()

response = client.completions.create(
  model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
  prompt="A horse is a horse",
  max_tokens=32,
  temperature=0.1,
  safety_model="canopylabs/orpheus-tts-0-1-finetune-prod",
)

print(response.choices[0].text)

from together import Together

client = Together()

speech_file_path = "speech.mp3"

response = client.audio.speech.create(
  model="canopylabs/orpheus-tts-0-1-finetune-prod",
  input="Today is a wonderful day to build something people love!",
  voice="helpful woman",
)
    
response.stream_to_file(speech_file_path)

from together import Together

client = Together()
response = client.audio.transcribe(
    model="canopylabs/orpheus-tts-0-1-finetune-prod",
    language="en",
    response_format="json",
    timestamp_granularities="segment"
)
print(response.text)
from together import Together

client = Together()

# Create a video generation job
job = client.videos.create(
    prompt="A serene sunset over the ocean with gentle waves",
    model="canopylabs/orpheus-tts-0-1-finetune-prod"
)
from together import Together

client = Together()

job = client.videos.create(
    model="canopylabs/orpheus-tts-0-1-finetune-prod",
    frame_images=[
        {
            "input_image": "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg",
        }
    ]
)
import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'canopylabs/orpheus-tts-0-1-finetune-prod',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);
import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.images.create({
    model: "canopylabs/orpheus-tts-0-1-finetune-prod",
    width: 1024,
    height: 1024,
    steps: 28,
    prompt: "Draw an anime style version of this image.",
    image_url: "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
  });

  console.log(response.data[0].url);
}

main();

import Together from "together-ai";

const together = new Together();
const imageUrl = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png";

async function main() {
  const response = await together.chat.completions.create({
    model: "canopylabs/orpheus-tts-0-1-finetune-prod",
    messages: [{
      role: "user",
      content: [
        { type: "text", text: "Describe what you see in this image." },
        { type: "image_url", image_url: { url: imageUrl } }
      ]
    }]
  });
  
  console.log(response.choices[0]?.message?.content);
}

main();

import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.chat.completions.create({
    model: "canopylabs/orpheus-tts-0-1-finetune-prod",
    messages: [{
      role: "user",
      content: "Given two binary strings `a` and `b`, return their sum as a binary string"
    }]
  });
  
  console.log(response.choices[0]?.message?.content);
}

main();

import Together from "together-ai";

const together = new Together();

const query = "What animals can I find near Peru?";
const documents = [
  "The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China.",
  "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era.",
  "The wild Bactrian camel (Camelus ferus) is an endangered species of camel endemic to Northwest China and southwestern Mongolia.",
  "The guanaco is a camelid native to South America, closely related to the llama. Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations."
];

async function main() {
  const response = await together.rerank.create({
    model: "canopylabs/orpheus-tts-0-1-finetune-prod",
    query: query,
    documents: documents,
    top_n: 2
  });
  
  for (const result of response.results) {
    console.log(`Relevance Score: ${result.relevance_score}`);
  }
}

main();


import Together from "together-ai";

const together = new Together();

const response = await client.embeddings.create({
  model: 'canopylabs/orpheus-tts-0-1-finetune-prod',
  input: 'Our solar system orbits the Milky Way galaxy at about 515,000 mph',
});

import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.completions.create({
    model: "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    prompt: "A horse is a horse",
    max_tokens: 32,
    temperature: 0.1,
    safety_model: "canopylabs/orpheus-tts-0-1-finetune-prod"
  });
  
  console.log(response.choices[0]?.text);
}

main();

import Together from 'together-ai';

const together = new Together();

async function generateAudio() {
   const res = await together.audio.create({
    input: 'Today is a wonderful day to build something people love!',
    voice: 'helpful woman',
    response_format: 'mp3',
    sample_rate: 44100,
    stream: false,
    model: 'canopylabs/orpheus-tts-0-1-finetune-prod',
  });

  if (res.body) {
    console.log(res.body);
    const nodeStream = Readable.from(res.body as ReadableStream);
    const fileStream = createWriteStream('./speech.mp3');

    nodeStream.pipe(fileStream);
  }
}

generateAudio();

import Together from "together-ai";

const together = new Together();

const response = await together.audio.transcriptions.create(
  model: "canopylabs/orpheus-tts-0-1-finetune-prod",
  language: "en",
  response_format: "json",
  timestamp_granularities: "segment"
});
console.log(response)
import Together from "together-ai";

const together = new Together();

async function main() {
  // Create a video generation job
  const job = await together.videos.create({
    prompt: "A serene sunset over the ocean with gentle waves",
    model: "canopylabs/orpheus-tts-0-1-finetune-prod"
  });
import Together from "together-ai";

const together = new Together();

const job = await together.videos.create({
  model: "canopylabs/orpheus-tts-0-1-finetune-prod",
  frame_images: [
    {
      input_image: "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg",
    }
  ]
});

How to use Orpheus TTS

Model details

Architecture Overview:
• Llama-3B backbone architecture adapted for speech-LLM applications
• Trained on 100k+ hours of English speech data and billions of text tokens
• SNAC audio tokenizer with 7 tokens per frame decoded as flattened sequence
• CNN-based detokenizer with sliding window modification for streaming without popping

Training Methodology:
• Pretrained on massive scale speech and text data to maintain language understanding
• Text token training boosts TTS performance while preserving semantic reasoning ability
• Trained exclusively on permissive/non-copyrighted audio data
• Fine-tuned models available for production use with 8 distinct voices (tara, leah, jess, leo, dan, mia, zac, zoe)
• Supports custom fine-tuning with as few as 50 examples per speaker

Performance Characteristics:
• Handles disfluencies naturally without artifacts
• Streaming inference faster than real-time playback on A100 40GB for 3B parameter model
• vLLM implementation enables efficient GPU utilization
• Supports realtime streaming with ~200ms latency, reducible to ~25-50ms with input streaming

Prompting Orpheus TTS

API Integration:
• Simple Python package installation via pip install orpheus-speech
• Built on vLLM for fast inference with standard LLM generation arguments
• Supports streaming and non-streaming modes for flexible deployment
• Compatible with Baseten for optimized fp8 and fp16 inference
• Available through multiple integration options including OpenAI-compatible APIs

Prompting Format:
• Finetuned model format: {voice_name}: Your text here (voices: tara, leah, jess, leo, dan, mia, zac, zoe)
• Emotion tags: , , , , , , ,  
• Pretrained model supports zero-shot voice cloning via conditioning on text-speech pairs in prompt
• Standard LLM generation args: temperature, top_p, repetition_penalty≥1.1 required for stable generations

Advanced Techniques:
• Zero-shot voice cloning emerges from large pretraining data without explicit training objective
• Multiple text-speech pairs in prompt improve voice cloning reliability
• Increasing repetition_penalty and temperature makes the model speak faster
• Supports multilingual models in research preview (7 language pairs)

Optimization Strategies:
• Realtime output streaming with very low latency ~200ms
• Input streaming into KV cache reduces latency to ~25-50ms
• Simple fine-tuning process analogous to LLM tuning with Trainer and Transformers
• LoRA fine-tuning support for efficient adaptation
• Custom dataset preparation via HuggingFace datasets format

Applications & Use Cases

Conversational AI & Virtual Assistants:
• Low-latency streaming enables natural conversational experiences
• Emotional intelligence and empathy expression for human-like interactions
• Multiple voice options for personalized assistant experiences
• Handles natural disfluencies and conversational patterns

Voice Cloning & Customization:
• Zero-shot voice cloning without prior fine-tuning
• Custom voice creation with 50+ training examples for high quality results
• Production-ready finetuned models with 8 distinct voices
• Sample fine-tuning scripts provided for easy customization

Content Creation & Media:
• Audiobook narration with natural emotion and intonation
• Podcast generation with multiple speaker voices
• Video voiceovers with guided emotion control
• Character voices for gaming and animation

Enterprise & Production Applications:
• Contact center automation with empathetic customer service voices
• E-learning and training content with engaging narration
• Accessibility applications for text-to-speech needs
• Real-time translation and dubbing services

Creative Applications:
• Guided emotion and intonation for dramatic readings
• Role-playing and character voice generation
• Music and audio production with vocal synthesis
• Interactive storytelling with dynamic voice expressions

Looking for production scale? Deploy on a dedicated endpoint

Deploy Orpheus TTS on a dedicated endpoint with custom hardware configuration, as many instances as you need, and auto-scaling.

Get started