Skip to content

    Predictable pricing.Auditable per model

    Every serverless model priced from public market sources and repriced automatically, with each row linking its source.

    Serverless: Token Factory

    Pay for what you use, one OpenAI-compatible key, every model in the catalog.

    Pricing an agent workload means more than the sticker rate: average tokens per run, tool calls, fallback rate, and whether steady traffic should move to dedicated endpoints.

    ModelInput $/MOutput $/MContextSource
    DeepSeek V3.2$0.225$0.225160Ksource
    GPT-OSS 120B$0.035$0.090128Ksource
    GPT-OSS 20B$0.027$0.117128Ksource
    Llama 3.1 8B Instruct$0.018$0.027128Ksource
    Llama 3.3 70B Instruct$0.090$0.288128Ksource
    Mistral Nemo 12B$0.018$0.027128Ksource
    Qwen3 Coder 30B A3B$0.063$0.234128Ksource
    Qwen3.6-35B-A3B$0.090$0.135128Ksource
    GLM 4.7$0.360$1.386128Ksource
    Gemma 4 31B IT$0.108$0.315128Ksource
    Gemma 4 26B A4B$0.054$0.27064Ksource
    PaddleOCR-VL 0.9B$0.126$0.720128Ksource
    Mistral Small 3.1 24B$0.045$0.090128Ksource
    Nemotron 3 Super 120B A12B$0.081$0.40532Ksource
    GLM 5$0.540$1.728128Ksource
    GLM 4.7 Flash$0.054$0.360128Ksource
    MiniMax M2.7$0.225$0.900195Ksource
    Qwen3.5 9B$0.050$0.150256Ksource
    Phi 4 Multimodal$0.045$0.090128Ksource
    GLM 4.5 Air FP8$0.113$0.765128Ksource
    DeepSeek R1 Distill 32B$0.243$0.24332Ksource
    DeepSeek V4 Flash$0.081$0.162256Ksource
    Qwen 3 8B$0.050$0.15032Ksource
    Qwen 2.5 32B$0.800$0.80032Ksource
    Qwen 3 30B Thinking$0.100$0.150128Ksource
    Qwen 3 Coder Next$0.063$0.243128Ksource
    Nemotron 3 Nano 30B A3B FP8$0.100$0.10032Ksource
    Kimi K2.5$0.315$1.70164Ksource
    Llama 4 Maverick$0.135$0.540256Ksource
    MiniMax M2.5$0.135$0.810128Ksource
    Qwen3 235B A22B 2507$0.064$0.090128Ksource
    Qwen3 Coder 480B A35B$0.162$0.90032Ksource
    ModelInput $/MContextSource
    BGE-M3$0.0098Ksource
    ModelPer imageContextSource
    FLUX.1 Schnell$0.0005256source
    Stable Diffusion 3.5 Large$0.0585source
    FLUX.2 [dev]$0.0400source
    ModelRateContextSource
    Whisper Large V3 Turbo$0.0006 / min448source
    Parakeet TDT 0.6B v3$0.0014 / min448source
    Kokoro-82M$9 / M chars512source
    Voxtral 4B TTS$180 / M charssource
    ModelRateContextSource
    Wan 2.2 T2V 14B$1.94 / videosource

    Find your serverless to dedicated break-even point

    Get an API key

    $10/month in free credits for your first 3 months.

    Dedicated GPU pricing

    Your own endpoints, fine-tuned models, or whole clusters, priced per GPU-hour and metered per second, on FlexAI or your own infrastructure. On-demand and reserved options; managed fine-tuning available.

    On FlexAI

    Compute hosted on FlexAI's NVIDIA and AMD fleets, with up to 99.9% SLA by tier.

    • State-of-the-art NVIDIA GPUs with NVLink and InfiniBand
    • On-demand and reserved pricing
    • Per-second metering
    • Region selection
    • Managed fine-tuning: per-M training tokens + storage $/GB-mo
    • One-click setup: get started in minutes
    Per GPU-hour, on-demand*
    B200$6.25/hr
    H200$3.15/hr
    H100$2.10/hr
    A100$1.80/hr
    L40S$1.50/hr

    *Contact sales for Essential and Custom package rates.

    Contact Us

    Custom: AI Factory & Enterprise

    Private cloud on your hardware, with the same managed AI stack.

    • Custom pricing on your workload. Talk to us.
    • VPC, on-prem, and air-gapped options
    • SLA 99.9% with geo redundancy
    • Dedicated CSM
    Proven on FlexAI
    • 75% lower compute cost
    • €22.5K total training cost

    LegML fine-tuned a 32B legal LLM on FlexAI H100.

    Cloud Savings Calculator

    Estimate your savings against AWS Bedrock, Azure OpenAI, GCP Vertex, Together, Fireworks, OpenAI, or Anthropic, at FlexAI's published rates.

    TierPriceIncludesSLA
    Starter$10/month in free credits for your first 3 months, card required at signup, then pay-as-you-go at our published per-token rates2 workspace seats. OpenAI-compatible API. Dedicated endpoints on demand. Playground access.99%
    Essential$100 matched ($100 deposit → $200 credit). Contact us today; self-serve soon.8 seats. Concurrency + multi-fractional. Architecture call at $50 spend. SE access. HIPAA, DORA.99.5%
    CustomCustom pricing on your workload. Talk to us.VPC, on-prem, air-gapped. Dedicated CSM. Geo redundancy.99.9%

    Building for production? See the FlexAI Startup Program. Apply-only, three stages, scaling on one account as your workload grows.

    Models with royalty obligations or pricing floors (currently MiniMax M2.7 and FLUX.2) are excluded from Startup Program introductory discounts.

    How our pricing works

    How the rate is set

    For each model we track the cheapest credible market rate (pricepertoken.com, Artificial Analysis, and provider pages) and price at that rate × 0.90. When a source moves, a scheduled refresh reprices automatically.

    Auditable per row

    Every row links the public source its rate derives from. Click through to check the current market rate for yourself.

    Why this matters

    Most token APIs reprice reactively when they lose share. A rate pinned to a public source, at a fixed margin below it, is predictable: you can budget against it.

    Frequently Asked Questions

    Coming from OpenAI or Claude? Read the migration guide

    Comparing providers? See how FlexAI compares with OpenAI, Fireworks, AWS Bedrock, and CoreWeave.