Models / Qwen
Chat
Vision

Qwen2.5-VL 72B Instruct

Vision-language model with advanced visual reasoning, video understanding, structured outputs, and agentic capabilities.

About model

Qwen2.5-VL 72B Instruct analyzes images and texts, recognizing objects, charts, and graphics, and generates structured outputs, making it suitable for developers and applications requiring advanced vision-language understanding.

    Related models
    • Model provider
      Qwen
    • Type
      Chat
      Vision
    • Main use cases
      Chat
      Vision
    • Features
      JSON Mode
    • Deployment
      On-Demand Dedicated
      Monthly Reserved
    • Parameters
      72B
    • Input price

      $1.95 / 1M tokens

    • Output price

      $8 / 1M tokens

    • Input modalities
      Text
      Image
    • Output modalities
      Text
    • Released
      January 26, 2025
    • Last updated
      January 4, 2026
    • External link
    • Category
      Vision