Qwen2.5-VL 72B Instruct
Vision-language model with advanced visual reasoning, video understanding, structured outputs, and agentic capabilities.
About model
Qwen2.5-VL 72B Instruct analyzes images and texts, recognizing objects, charts, and graphics, and generates structured outputs, making it suitable for developers and applications requiring advanced vision-language understanding.
- TypeChatVision
- Main use casesChatVision
- FeaturesJSON Mode
- DeploymentOn-Demand DedicatedMonthly Reserved
- Parameters72B
- Input price
$1.95 / 1M tokens
- Output price
$8 / 1M tokens
- Input modalitiesTextImage
- Output modalitiesText
- ReleasedJanuary 26, 2025
- Last updatedJanuary 4, 2026
- External link
- CategoryVision