Getting started

Switch to FlexAIfrom OpenAI or Claude

Your code already works, so point it at FlexAI and keep going: 20+ open models behind the same OpenAI-compatible surface, at competitive performance and pricing.

Keep your agent loop: prompts, RAG, tools, sessions, evals, approvals. FlexAI swaps the model layer first, the whole workload when you're ready.

From the OpenAI SDK

Change two values. Everything else stays.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://tokens.flex.ai/v1",
    api_key=os.environ["FLEXAI_API_KEY"],
)

resp = client.chat.completions.create(
    model="Meta-Llama-3.1-8B-Instruct-FP8",
    messages=[{"role": "user", "content": "Hello from FlexAI"}],
)
print(resp.choices[0].message.content)

What carries over unchanged: chat completions, streaming, tool calls, structured output, vision inputs. What changes: the model name. See the model mapping below.

From the Claude API

The Anthropic Messages API differs from the OpenAI shape, so the move is a small refactor, not a config change. The mapping: the system prompt moves from the top-level system parameter into a system-role message; content blocks flatten to message strings unless you use vision; tool definitions move from input_schema to the OpenAI function-calling schema; max_tokens stays required. If you use the OpenAI-compat layer in your framework (Vercel AI SDK, LangChain), switch the provider to an OpenAI-compatible endpoint and you are done.

Before (Anthropic)

import anthropic

client = anthropic.Anthropic()

resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.content[0].text)

After (FlexAI)

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://tokens.flex.ai/v1",
    api_key=os.environ["FLEXAI_API_KEY"],
)

resp = client.chat.completions.create(
    model="Meta-Llama-3.1-8B-Instruct-FP8",
    max_tokens=1024,
    messages=[
        # system prompt becomes a system-role message
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello"},
    ],
)
print(resp.choices[0].message.content)

Which model should I pick?

A starting point. Browse the full catalog for the rest.

Choosing models for agents: coding agents → reasoning and coding models · RAG agents → context window and retrieval quality · interactive agents → balance latency and cost.

Coming from	Try on FlexAI	Why
gpt-4o-mini (small, fast chat)	Llama 3.1 8B Instruct	Small and fast for high-volume classification, summarization, and RAG.
claude-haiku class (fast, low-cost)	Qwen3.5 9B	A small, fast model for high-volume, latency-sensitive calls.
gpt-4 / gpt-4o class (frontier chat)	GLM 5.2	A large open flagship for the hardest reasoning tasks.
text-embedding-3 / ada (embeddings)	BGE-M3	Multilingual retrieval embeddings.
DALL·E / image generation	FLUX.1 [schnell]	Fast open image generation.

Browse all models

Verify your first call

A successful response returns the same OpenAI response shape. If you get 402, your key budget is exhausted; 429 means rate limits. The status page shows per-model availability.

What you keep

Your app loop, prompts, RAG, tools, sessions, eval harness, and approval logic. FlexAI replaces the model layer, not your application. When you want FlexAI to run whole agent workloads, see the Agent SDK.

Switch in two values

Point the OpenAI SDK at tokens.flex.ai, swap the model, and keep shipping.

Get an API key Talk to us

$10/month in free credits for your first 3 months

See how much you could save