Skip to content

    Model family

    Llama on FlexAI.Every variant. One key

    Llama is Meta's open model line on FlexAI. 2 variants run serverless on the OpenAI-compatible API, with 4 more available as dedicated endpoints, spanning chat. One API key serves every variant.

    Variants

    Every served variant in the family, with live serverless pricing.

    Serverless · pay per token

    ModelContextPriceStatus
    Llama 3.3 70B Instruct128K$0.09 / $0.288 per M Serving
    Llama 3.1 8B Instruct16K$0.018 / $0.027 per M Serving

    Dedicated endpoints · reserved GPUs

    ModelContextPriceStatus
    Meta Llama 3 70B Instruct8KDedicatedDedicated
    Llama 3.1 8B Base128KDedicatedDedicated
    Meta Llama 3 8B8KDedicatedDedicated
    Meta Llama 3 8B Instruct8KDedicatedDedicated

    Which variant for what

    Pick by the role you're filling. Same key for all of them.

    Flagship

    Llama 3.3 70B Instruct

    Llama 3.3 70B Instruct is the largest served serverless variant. Reach for it first.

    Fast & economical

    Llama 3.1 8B Instruct

    Llama 3.1 8B Instruct is the smallest serverless variant. Lowest latency and cost.

    Llama 3.3 70B Instruct runs answer in private copilots

    Call the flagship

    OpenAI-compatible. Swap the model id for any variant above.

    curl https://tokens.flex.ai/v1/chat/completions \
      -H "Authorization: Bearer $FLEXAI_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "Llama-3.3-70B-Instruct-FP8",
        "messages": [{"role": "user", "content": "Hello from FlexAI"}]
      }'

    Run Llama on one API key

    Every Llama variant, serverless and dedicated, behind one OpenAI-compatible key.

    $10/month in free credits for your first 3 months