Model family

Llama on FlexAI.Every variant. One key

Llama is Meta's open model line on FlexAI. 2 variants run serverless on the OpenAI-compatible API, with 7 more available as dedicated endpoints, spanning chat, multimodal, rerank. One API key serves every variant.

Get an API key All models

Variants

Every served variant in the family, with live serverless pricing.

Serverless · pay per token

Model	Context	Price	Status
Llama 3.3 70B Instruct	64K	$0.1 / $0.32 per M	Serving
Llama 3.1 8B Instruct	128K	$0.02 / $0.03 per M	Serving

Dedicated endpoints · reserved GPUs

Model	Context	Price	Status
Llama 4 Maverick	1.0M	Dedicated	Dedicated
Llama 4 Scout 17B 16E	9.5M	Dedicated	Dedicated
Meta Llama 3 70B Instruct	8K	Dedicated	Dedicated
Llama 3.1 8B Base	128K	Dedicated	Dedicated
Meta Llama 3 8B	8K	Dedicated	Dedicated
Meta Llama 3 8B Instruct	8K	Dedicated	Dedicated
Llama Nemotron Rerank VL 1B v2	128K	Dedicated	Dedicated

Which variant for what

Pick by the role you're filling. Same key for all of them.

Flagship

Llama 3.3 70B Instruct

Llama 3.3 70B Instruct is the largest served serverless variant. Reach for it first.

Fast & economical

Llama 3.1 8B Instruct

Llama 3.1 8B Instruct is the smallest serverless variant. Lowest latency and cost.

Llama 3.1 8B Instruct runs respond in support agents · Llama 3.3 70B Instruct runs answer in private copilots

Call the flagship

OpenAI-compatible. Swap the model id for any variant above.

curl https://tokens.flex.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLEXAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Llama-3.3-70B-Instruct-FP8",
    "messages": [{"role": "user", "content": "Hello from FlexAI"}]
  }'

Where it runs

Use cases that put Llama to work in a pipeline.

Support agents

Agents that triage, answer, and resolve customer conversations.

See the pipeline

Private copilots

In-house assistants grounded in your own documents and data.

See the pipeline

Run Llama on one API key

Every Llama variant, serverless and dedicated, behind one OpenAI-compatible key.

Get an API key Talk to us

$10/month in free credits for your first 3 months

See how much you could save