Mistral Nemo

Chat

Mistral-Nemo-Instruct-2407-FP8

Mistral Nemo on FlexAI: Mistral LLM, Apache 2.0 license, served via the OpenAI-compatible Token Factory at the live market-tracked rate.

Recommended for

Workflow automation · as Execute steps

Pricing

Input

$0.018 / M tokens

Output

$0.03 / M tokens

Cached input

$0.0027 / M tokens

Context

32K tokens

API endpoint

/v1/chat/completions

Compatibility

OpenAI

Parameters

12B

License

Apache 2.0

Hardware

H100

Quantization

FP8

Estimate your monthly cost

Input tokens / month

M tokens

Output tokens / month

M tokens

Cache hit rate25%

Share of input tokens served from prompt cache: repeated system prompts, long documents, and multi-turn context.

7.5M × $0.018/M input$0.13

2.5M × $0.0027/M cached input$0.01

2M × $0.03/M output$0.06

Estimated monthly cost$0.2

Saved vs. no caching$0.04 (17%)

Estimate only, at the current market-tracked rate. Usage-based; no minimums.

Get an API key

Quick Start

Mistral-Nemo-Instruct-2407-FP8

from openai import OpenAI

client = OpenAI(
    base_url="https://tokens.flex.ai/v1",
    api_key="your-api-key",
)

response = client.chat.completions.create(
    model="Mistral-Nemo-Instruct-2407-FP8",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)

Get an API key Run this model on a dedicated endpoint