Model family
Llama on FlexAI.Every variant. One key
Llama is Meta's open model line on FlexAI. 2 variants run serverless on the OpenAI-compatible API, with 4 more available as dedicated endpoints, spanning chat. One API key serves every variant.
Variants
Every served variant in the family, with live serverless pricing.
Serverless · pay per token
| Model | Context | Price | Status |
|---|---|---|---|
| Llama 3.3 70B Instruct | 128K | $0.09 / $0.288 per M | Serving |
| Llama 3.1 8B Instruct | 16K | $0.018 / $0.027 per M | Serving |
Dedicated endpoints · reserved GPUs
| Model | Context | Price | Status |
|---|---|---|---|
| Meta Llama 3 70B Instruct | 8K | Dedicated | Dedicated |
| Llama 3.1 8B Base | 128K | Dedicated | Dedicated |
| Meta Llama 3 8B | 8K | Dedicated | Dedicated |
| Meta Llama 3 8B Instruct | 8K | Dedicated | Dedicated |
Which variant for what
Pick by the role you're filling. Same key for all of them.
Flagship
Llama 3.3 70B Instruct
Llama 3.3 70B Instruct is the largest served serverless variant. Reach for it first.
Fast & economical
Llama 3.1 8B Instruct
Llama 3.1 8B Instruct is the smallest serverless variant. Lowest latency and cost.
Llama 3.3 70B Instruct runs answer in private copilots
Call the flagship
OpenAI-compatible. Swap the model id for any variant above.
curl https://tokens.flex.ai/v1/chat/completions \
-H "Authorization: Bearer $FLEXAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Llama-3.3-70B-Instruct-FP8",
"messages": [{"role": "user", "content": "Hello from FlexAI"}]
}'Where it runs
Use cases that put Llama to work in a pipeline.
Run Llama on one API key
Every Llama variant, serverless and dedicated, behind one OpenAI-compatible key.
$10/month in free credits for your first 3 months