Managed inference for builders: 20+ open models behind one OpenAI-compatible API, with a path to fine-tuning, dedicated GPUs, and private cloud on one account.

How is FlexAI priced?

Token Factory is usage-priced per unit: per token for text models, and for media models per image, per million characters of speech, per minute of audio, or per generated video clip. Each rate is set against the credible market rate and repriced automatically as the market moves. Dedicated endpoints are priced per GPU-hour, and AI Factory is priced per deployment. Full rate tables are on the pricing page under the Serverless, Dedicated, and Custom tabs.

Which models can I run on FlexAI today?

20+ open-weight models across text, vision, image, and audio — see the models page for the live catalog, which updates as we deploy new models. Point the OpenAI SDK at https://tokens.flex.ai/v1 and your existing code runs unchanged.

Do I need an account to try it?

No. The in-browser demo is ungated. Sign up when you're ready to build: $10/month in free credits for your first 3 months (card required to create an API key).

How is FlexAI different from AWS Bedrock or Fireworks?

One OpenAI-compatible API with infrastructure you can grow into (dedicated endpoints, fine-tuning, and private cloud) without ecosystem lock-in. You can leave with a config change.

AI infrastructure that adapts as you grow

The platform foragent-native AI

Bring your agents.One OpenAI-compatible inference key.Scale to dedicated endpoints and private AI cloud.

Token FactoryAgent SDKAI Factory

Get an API keyTry without signup

$10/month in free credits for your first 3 months

Image generation

Example

A neon-lit cyberpunk city at dusk, rain-slicked streets reflecting holographic billboards, cinematic wide shot

Open the full playground

Image generation

Generate images in seconds

Fast open-weight diffusion on H100 GPUs. Describe what you want and watch it appear, through the same OpenAI-compatible API your apps already use.

Near-instant few-step diffusion
Same OpenAI-compatible API for your apps
No signup needed to try it right now

Chat completion

Talk to any model instantly

From coding assistants to reasoning models, try them all with zero setup. Streaming responses, tool calling, and thinking models included.

Open models across text, vision, code, and reasoning
Drop-in replacement for the OpenAI SDK
No API key needed to try it right here

Text generation

Example

Explain how a transformer model works in three sentences.

A transformer processes input as a sequence of tokens, then uses self-attention to let every token weigh how much each other token should influence its own representation. Stacked layers refine those representations through repeated attention + feed-forward steps. The output is a context-aware embedding per token, which the model uses to predict the next token, classify, or otherwise complete the task.

Open the full playground