Models / Chat / NIM Llama 3.1 Nemotron 70B Instruct API
NIM Llama 3.1 Nemotron 70B Instruct API
Chat
NVIDIA NIM for GPU accelerated Llama 3.1 Nemotron 70B Instruct inference through OpenAI compatible APIs.
Deploy this NIM model

Models / Chat / NIM Llama 3.1 Nemotron 70B Instruct API
NVIDIA NIM for GPU accelerated Llama 3.1 Nemotron 70B Instruct inference through OpenAI compatible APIs.
Endpoint
RUN INFERENCE
This model is available as a Together Dedicated Endpoints deployment.
Follow our Docs to configure an endpoint via our API or CLI.
JSON RESPONSE
RUN INFERENCE
This model is available as a Together Dedicated Endpoints deployment.
Follow our Docs to configure an endpoint via our API or CLI.
JSON RESPONSE
RUN INFERENCE
This model is available as a Together Dedicated Endpoints deployment.
Follow our Docs to configure an endpoint via our API or CLI.
JSON RESPONSE
Model Provider:
Meta
Type:
Chat
Variant:
Instruct
Parameters:
70B
Deployment:
✔️ Dedicated
Quantization
Context length:
128K
Pricing:
Run in playground
Deploy model
Quickstart docs
Quickstart docs
Deploy NIM Llama 3.1 Nemotron 70B Instruct on a dedicated endpoint with custom hardware configuration, as many instances as you need, and auto-scaling.