Use case
Coding agents.One key, many models
Agents that generate, review, and repair code across your repo.
Why open models fit
Coding agents run constantly, so open-model per-token economics decide viability: one key serves both the generator and a fast reviewer.
The pipeline
One OpenAI-compatible key runs the whole pipeline. Swap any step's model without touching your loop.
- 1 ServingGenerateQwen3 Coder 30B A3B
Code-specialized chat model with a 262K-token context for whole-repo edits.
256K ctx · $0.063 / $0.234 per M
- 2 ServingReview & fast editsGPT-OSS 20B
A small, fast chat model for quick review passes and inline fixes.
128K ctx · $0.027 / $0.117 per M
Review & fast edits loops back to Generate: most runs iterate a few times before the code exits clean.
Call the whole pipeline
Point the OpenAI SDK at FlexAI once. Every step names its own model on the same key: no per-model clients, endpoints, or keys to juggle.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://tokens.flex.ai/v1",
apiKey: process.env.FLEXAI_API_KEY,
});
const prompt = "Describe what you want the agent to do.";
// 1. Generate: same key, swap the model anytime
const generate = await client.chat.completions.create({
model: "Qwen3-Coder-30B-A3B-Instruct-FP8",
messages: [{ role: "user", content: prompt }],
});
// 2. Review & fast edits: same key, swap the model anytime
const reviewFastEdits = await client.chat.completions.create({
model: "gpt-oss-20b",
messages: [{ role: "user", content: generate.choices[0].message.content }],
});Start serverless and pay per token. When a step becomes steady production traffic, move it to a dedicated endpoint on the same key.
Build your coding agents on FlexAI
Every model in the pipeline behind one OpenAI-compatible key, with source-linked pricing.
$10/month in free credits for your first 3 months