Model family

Nemotron on FlexAI.Every variant. One key

Nemotron is NVIDIA's open model line on FlexAI. One variant runs serverless on the OpenAI-compatible API, with 7 more available as dedicated endpoints, spanning chat, transcription, embeddings, multimodal. One API key serves every variant.

Get an API key All models

Variants

Every served variant in the family, with live serverless pricing.

Serverless · pay per token

Model	Context	Price	Status
Nemotron 3 Super 120B A12B	256K	$0.085 / $0.4 per M	Serving

Dedicated endpoints · reserved GPUs

Model	Context	Price	Status
Nemotron 3 Ultra 550B A55B	256K	Dedicated	Dedicated
Nemotron 3 Nano Omni 30B A3B Reasoning	256K	Dedicated	Dedicated
NVIDIA Nemotron 3 Nano 30B A3B FP8	256K	Dedicated	Dedicated
Nemotron Nano 9B v2	128K	Dedicated	Dedicated
Nemotron 3 Embed 1B	256K	Dedicated	Dedicated
Nemotron Speech Streaming 0.6B	—	Dedicated	Dedicated
Nemotron 3.5 ASR Streaming 0.6B	—	Dedicated	Dedicated

Which variant for what

Pick by the role you're filling. Same key for all of them.

Flagship

Nemotron 3 Super 120B A12B

Nemotron 3 Super 120B A12B is the largest served serverless variant. Reach for it first.

Nemotron 3 Super 120B A12B runs reason in research agents

Call the flagship

OpenAI-compatible. Swap the model id for any variant above.

curl https://tokens.flex.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLEXAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Nemotron-3-Super-120B-A12B",
    "messages": [{"role": "user", "content": "Hello from FlexAI"}]
  }'

Where it runs

Use cases that put Nemotron to work in a pipeline.

Research agents

Agents that retrieve, reason over, and synthesize large source sets.

See the pipeline

Run Nemotron on one API key

Every Nemotron variant, serverless and dedicated, behind one OpenAI-compatible key.

Get an API key Talk to us

$10/month in free credits for your first 3 months

See how much you could save