When volume proves out.
Reserved throughput.Same API key
Dedicated GPUs for your models and fine-tunes, using the same account and API key.
When your agent loop becomes production traffic: move to isolated GPUs with stable latency, predictable cost, and the same key.
Find your break-even
Serverless per-token vs a reserved GPU at a flat hourly rate. Pick a model and your monthly volume.
At 500M tokens/month, serverless is cheaper by $1,523/month.
Crossover: dedicated wins above ~74.06B tokens/month for this model.
When dedicated wins
Steady-volume economics
Past your break-even volume, a reserved GPU at a flat hourly rate undercuts per-token serverless pricing. Per-second GPU billing, with no extra idle surcharge.
Consistent latency and isolation
Isolated GPUs with predictable performance and no shared rate limits, so latency-sensitive workloads stay steady under load.
Custom and fine-tuned models
Deploy your own fine-tuned checkpoints on the same managed endpoints, beyond the open catalog.
What you get
The dedicated mechanics, exactly as priced on the Dedicated tab today.
- Reserved NVIDIA GPUs with NVLink and InfiniBand, on-demand or reserved
- Per-second GPU billing, with no extra idle surcharge
- The same OpenAI-compatible endpoint and API key as serverless
- Region selection and one-click setup
- Managed fine-tuning: per-M training tokens plus storage
- Observability and audit on every endpoint
Models on dedicated
Open models sized for reserved deployment, plus your own fine-tuned models.
Deploy your own fine-tunes and LoRA adapters on dedicated endpoints. Stack and swap adapters without retraining the base model.
How it works
Pick model and hardware
Choose an open model or your fine-tune, and the GPU configuration the catalog sizes for it.
FlexAI provisions and manages
We stand up the reserved endpoint, scale it, and keep it healthy. No infrastructure to run.
Same API, your endpoint
Call it through the same OpenAI-compatible key. Move from serverless without migrating.
Same account for serverless inference and training.
Economics
Per GPU-hour on FlexAI's NVIDIA and AMD fleets, with per-second GPU billing.
- B200$6.25/hr
- H200$3.15/hr
- H100$2.10/hr
One account, the whole way
Dedicated endpoints are rung three. Same account from your first request to your private cloud.
Built for production workloads
“FlexAI provides a much more cost-effective & hassle-free experience for training & deploying my models.”
Frequently Asked Questions
Reserved throughput, when you're ready for it
Reserved GPUs when volume proves out, serverless until then. One account throughout.
$10/month in free credits for your first 3 months