Skip to content

    Model family

    Nemotron on FlexAI.Every variant. One key

    Nemotron is NVIDIA's open model line on FlexAI. One variant runs serverless on the OpenAI-compatible API, with 4 more available as dedicated endpoints, spanning chat, transcription. One API key serves every variant.

    Variants

    Every served variant in the family, with live serverless pricing.

    Serverless · pay per token

    ModelContextPriceStatus
    Nemotron 3 Super 120B A12B256K$0.081 / $0.405 per M Serving

    Dedicated endpoints · reserved GPUs

    ModelContextPriceStatus
    Nemotron 3 Ultra 550B A55B256KDedicatedDedicated
    NVIDIA Nemotron 3 Nano 30B A3B FP8256KDedicatedDedicated
    Nemotron Nano 9B v2128KDedicatedDedicated
    Nemotron Speech Streaming 0.6BDedicatedDedicated

    Which variant for what

    Pick by the role you're filling. Same key for all of them.

    Flagship

    Nemotron 3 Super 120B A12B

    Nemotron 3 Super 120B A12B is the largest served serverless variant. Reach for it first.

    Call the flagship

    OpenAI-compatible. Swap the model id for any variant above.

    curl https://tokens.flex.ai/v1/chat/completions \
      -H "Authorization: Bearer $FLEXAI_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "Nemotron-3-Super-120B-A12B",
        "messages": [{"role": "user", "content": "Hello from FlexAI"}]
      }'

    Run Nemotron on one API key

    Every Nemotron variant, serverless and dedicated, behind one OpenAI-compatible key.

    $10/month in free credits for your first 3 months