Models / Chat / LLaMA-2 Chat (7B) API
LLaMA-2 Chat (7B) API

Models / Chat / LLaMA-2 Chat (7B) API
Endpoint
RUN INFERENCE
JSON RESPONSE
RUN INFERENCE
JSON RESPONSE
RUN INFERENCE
JSON RESPONSE
Model Provider:
Meta
Type:
Chat
Variant:
Parameters:
7B
Deployment:
✔ Serverless
✔️ On-Demand Dedicated
Quantization
Context length:
4K
Pricing:
$0.20
Check pricing
Run in playground
Deploy model
Quickstart docs
Quickstart docs
Deploy LLaMA-2 Chat (7B) on a dedicated endpoint with custom hardware configuration, as many instances as you need, and auto-scaling.