Vision

NIM Llama 3.2 11B Vision Instruct

NVIDIA NIM for GPU accelerated Llama 3.2 11B Vision Instruct inference through OpenAI compatible APIs.

About model

NIM Llama 3.2 11B Vision Instruct processes multimodal inputs, combining text and vision capabilities. It excels at tasks requiring both language understanding and visual context. Suitable for developers and researchers needing advanced multimodal processing.

To run this model, you first need to deploy it on a Dedicated Endpoint.

Quickstart guides

Vision

Quickstart: How to Do OCR

RAG

Building a RAG Workflow

Related models

Model provider
Meta
Type
Vision
Main use cases
Small & Fast
Vision
Deployment
On-Demand Dedicated
Monthly Reserved
Parameters
11B
Context length
128K
Input modalities
Text
Image
Output modalities
Text

Released
September 24, 2024
Last updated
August 26, 2025
External link
Provider docs
Category
Vision

Quickstart docs

Deploy model