DeepSeek V4 Flash

Chat

DeepSeek-V4-Flash

DeepSeek V4 Flash on FlexAI: DeepSeek LLM, MIT license, served via the OpenAI-compatible Token Factory at the live market-tracked rate.

Recommended for

Research agents · as Summarize

Pricing

Input

$0.09 / M tokens

Output

$0.179 / M tokens

Cached input

$0.0134 / M tokens

Context

768K tokens

API endpoint

/v1/chat/completions

Compatibility

OpenAI

Parameters

284B MoE (13B active)

License

MIT

Hardware

4× H100

Quantization

FP8

Estimate your monthly cost

Input tokens / month

M tokens

Output tokens / month

M tokens

Cache hit rate25%

Share of input tokens served from prompt cache: repeated system prompts, long documents, and multi-turn context.

7.5M × $0.0896/M input$0.67

2.5M × $0.0134/M cached input$0.03

2M × $0.1792/M output$0.36

Estimated monthly cost$1.06

Saved vs. no caching$0.2 (16%)

Estimate only, at the current market-tracked rate. Usage-based; no minimums.

Get an API key

Quick Start

DeepSeek-V4-Flash

from openai import OpenAI

client = OpenAI(
    base_url="https://tokens.flex.ai/v1",
    api_key="your-api-key",
)

response = client.chat.completions.create(
    model="DeepSeek-V4-Flash",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)

Get an API key Run this model on a dedicated endpoint