Models / Chat / NIM Llama 3.1 70B Instruct API
NIM Llama 3.1 70B Instruct API
Chat
NVIDIA NIM for GPU accelerated Llama 3.1 70B Instruct inference through OpenAI compatible APIs.
Deploy this NIM model

API Usage
How to use NIM Llama 3.1 70B InstructModel CardPrompting NIM Llama 3.1 70B InstructApplications & Use CasesNIM Llama 3.1 70B Instruct API Usage
Endpoint
nim/meta/llama-3.1-70b-instruct
RUN INFERENCE
This model is available as a Together Dedicated Endpoints deployment.
Follow our Docs to configure an endpoint via our API or CLI.
JSON RESPONSE
RUN INFERENCE
This model is available as a Together Dedicated Endpoints deployment.
Follow our Docs to configure an endpoint via our API or CLI.
JSON RESPONSE
RUN INFERENCE
This model is available as a Together Dedicated Endpoints deployment.
Follow our Docs to configure an endpoint via our API or CLI.
JSON RESPONSE
Model Provider:
Meta
Type:
Chat
Variant:
Instruct
Parameters:
70B
Deployment:
✔️ Dedicated
Quantization
Context length:
128K
Pricing:
Run in playground
Deploy model
Quickstart docs
Quickstart docs
How to use NIM Llama 3.1 70B Instruct
Model details
Prompting NIM Llama 3.1 70B Instruct
Applications & Use Cases
Looking for production scale? Deploy on a dedicated endpoint
Deploy NIM Llama 3.1 70B Instruct on a dedicated endpoint with custom hardware configuration, as many instances as you need, and auto-scaling.
