Model family
Nemotron on FlexAI.Every variant. One key
Nemotron is NVIDIA's open model line on FlexAI. One variant runs serverless on the OpenAI-compatible API, with 4 more available as dedicated endpoints, spanning chat, transcription. One API key serves every variant.
Variants
Every served variant in the family, with live serverless pricing.
Serverless · pay per token
| Model | Context | Price | Status |
|---|---|---|---|
| Nemotron 3 Super 120B A12B | 256K | $0.081 / $0.405 per M | Serving |
Dedicated endpoints · reserved GPUs
| Model | Context | Price | Status |
|---|---|---|---|
| Nemotron 3 Ultra 550B A55B | 256K | Dedicated | Dedicated |
| NVIDIA Nemotron 3 Nano 30B A3B FP8 | 256K | Dedicated | Dedicated |
| Nemotron Nano 9B v2 | 128K | Dedicated | Dedicated |
| Nemotron Speech Streaming 0.6B | — | Dedicated | Dedicated |
Which variant for what
Pick by the role you're filling. Same key for all of them.
Flagship
Nemotron 3 Super 120B A12B
Nemotron 3 Super 120B A12B is the largest served serverless variant. Reach for it first.
Call the flagship
OpenAI-compatible. Swap the model id for any variant above.
curl https://tokens.flex.ai/v1/chat/completions \
-H "Authorization: Bearer $FLEXAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Nemotron-3-Super-120B-A12B",
"messages": [{"role": "user", "content": "Hello from FlexAI"}]
}'Run Nemotron on one API key
Every Nemotron variant, serverless and dedicated, behind one OpenAI-compatible key.
$10/month in free credits for your first 3 months