Nemotron 3 Super 120B A12B

Chat

Nemotron-3-Super-120B-A12B

Nemotron 3 Super 120B A12B on FlexAI: NVIDIA LLM, NVIDIA Open Model License, served via the OpenAI-compatible Token Factory at the live market-tracked rate.

Recommended for

Research agents · as Reason

Pricing

Input

$0.085 / M tokens

Output

$0.4 / M tokens

Cached input

$0.0128 / M tokens

Context

256K tokens

API endpoint

/v1/chat/completions

Compatibility

OpenAI

Parameters

120B MoE (12B active)

License

NVIDIA Open Model License

Hardware

2× H100

Quantization

FP8

Estimate your monthly cost

Input tokens / month

M tokens

Output tokens / month

M tokens

Cache hit rate25%

Share of input tokens served from prompt cache: repeated system prompts, long documents, and multi-turn context.

7.5M × $0.085/M input$0.64

2.5M × $0.0128/M cached input$0.03

2M × $0.4/M output$0.8

Estimated monthly cost$1.47

Saved vs. no caching$0.18 (11%)

Estimate only, at the current market-tracked rate. Usage-based; no minimums.

Get an API key

Quick Start

Nemotron-3-Super-120B-A12B

from openai import OpenAI

client = OpenAI(
    base_url="https://tokens.flex.ai/v1",
    api_key="your-api-key",
)

response = client.chat.completions.create(
    model="Nemotron-3-Super-120B-A12B",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)

Get an API key Run this model on a dedicated endpoint