Getting started
Switch to FlexAIfrom OpenAI or Claude
Your code already works, so point it at FlexAI and keep going: 20+ open models behind the same OpenAI-compatible surface, at competitive performance and pricing.
Keep your agent loop: prompts, RAG, tools, sessions, evals, approvals. FlexAI swaps the model layer first, the whole workload when you're ready.
From the OpenAI SDK
Change two values. Everything else stays.
from openai import OpenAI
import os
client = OpenAI(
base_url="https://tokens.flex.ai/v1",
api_key=os.environ["FLEXAI_API_KEY"],
)
resp = client.chat.completions.create(
model="Meta-Llama-3.1-8B-Instruct-FP8",
messages=[{"role": "user", "content": "Hello from FlexAI"}],
)
print(resp.choices[0].message.content)What carries over unchanged: chat completions, streaming, tool calls, structured output, vision inputs. What changes: the model name. See the model mapping below.
From the Claude API
The Anthropic Messages API differs from the OpenAI shape, so the move is a small refactor, not a config change. The mapping: the system prompt moves from the top-level system parameter into a system-role message; content blocks flatten to message strings unless you use vision; tool definitions move from input_schema to the OpenAI function-calling schema; max_tokens stays required. If you use the OpenAI-compat layer in your framework (Vercel AI SDK, LangChain), switch the provider to an OpenAI-compatible endpoint and you are done.
Before (Anthropic)
import anthropic
client = anthropic.Anthropic()
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.content[0].text)After (FlexAI)
from openai import OpenAI
import os
client = OpenAI(
base_url="https://tokens.flex.ai/v1",
api_key=os.environ["FLEXAI_API_KEY"],
)
resp = client.chat.completions.create(
model="Meta-Llama-3.1-8B-Instruct-FP8",
max_tokens=1024,
messages=[
# system prompt becomes a system-role message
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"},
],
)
print(resp.choices[0].message.content)Which model should I pick?
A starting point. Browse the full catalog for the rest.
| Coming from | Try on FlexAI | Why |
|---|---|---|
| gpt-4o-mini (small, fast chat) | Llama 3.1 8B Instruct | Small and fast for high-volume classification, summarization, and RAG. |
| claude-haiku class (fast, low-cost) | Qwen3.5 9B | A small, fast model for high-volume, latency-sensitive calls. |
| claude-sonnet class (balanced) | Qwen3.6-35B-A3B | A capable, balanced model for everyday production work. |
| gpt-4 / gpt-4o class (frontier chat) | DeepSeek V3.2 | A large open flagship for the hardest reasoning tasks. |
| text-embedding-3 / ada (embeddings) | BGE-M3 | Multilingual retrieval embeddings. |
| DALL·E / image generation | FLUX.1 [schnell] | Fast open image generation. |
Verify your first call
A successful response returns the same OpenAI response shape. If you get 402, your key budget is exhausted; 429 means rate limits. The status page shows per-model availability.
What you keep
Your app loop, prompts, RAG, tools, sessions, eval harness, and approval logic. FlexAI replaces the model layer, not your application. When you want FlexAI to run whole agent workloads, see the Agent SDK.
Switch in two values
Point the OpenAI SDK at tokens.flex.ai, swap the model, and keep shipping.
$10/month in free credits for your first 3 months