Models / Meta
Vision

NIM Llama 3.2 11B Vision Instruct

NVIDIA NIM for GPU accelerated Llama 3.2 11B Vision Instruct inference through OpenAI compatible APIs.

About model

NIM Llama 3.2 11B Vision Instruct processes multimodal inputs, combining text and vision capabilities. It excels at tasks requiring both language understanding and visual context. Suitable for developers and researchers needing advanced multimodal processing.

To run this model, you first need to deploy it on a Dedicated Endpoint.

    Related models
    • Model provider
      Meta
    • Type
      Vision
    • Main use cases
      Small & Fast
      Vision
    • Deployment
      On-Demand Dedicated
      Monthly Reserved
    • Parameters
      11B
    • Context length
      128K
    • Input modalities
      Text
      Image
    • Output modalities
      Text
    • Released
      September 24, 2024
    • Last updated
      August 26, 2025
    • External link
    • Category
      Vision