Use case

Coding agents.One key, many models

Agents that generate, review, and repair code across your repo.

Why open models fit

Coding agents run constantly, so open-model per-token economics decide viability: one key serves both the generator and a fast reviewer.

The pipeline

One OpenAI-compatible key runs the whole pipeline. Swap any step's model without touching your loop.

1 Serving
GenerateQwen3 Coder 30B A3B
Code-specialized chat model with a 262K-token context for whole-repo edits.
256K ctx · $0.07 / $0.26 per M
2 Dedicated
Review & fast editsQwen3.6-35B-A3B-FP8
Code-tuned MoE with 3B active parameters: fast review passes over the same 262K context as the generator.

Review & fast edits loops back to Generate: most runs iterate a few times before the code exits clean.

Call the whole pipeline

Point the OpenAI SDK at FlexAI once. Every step names its own model on the same key: no per-model clients, endpoints, or keys to juggle.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://tokens.flex.ai/v1",
  apiKey: process.env.FLEXAI_API_KEY,
});

const prompt = "Describe what you want the agent to do.";

// 1. Generate: same key, swap the model anytime
const generate = await client.chat.completions.create({
  model: "Qwen3-Coder-30B-A3B-Instruct-FP8",
  messages: [{ role: "user", content: prompt }],
});

// 2. Review & fast edits: same key, swap the model anytime
const reviewFastEdits = await client.chat.completions.create({
  model: "Qwen3.6-35B-A3B-FP8",
  messages: [{ role: "user", content: generate.choices[0].message.content }],
});

Start serverless and pay per token. When a step becomes steady production traffic, move it to a dedicated endpoint on the same key.

Build your coding agents on FlexAI

Every model in the pipeline behind one OpenAI-compatible key, priced at the market rate.

Get an API key Talk to us

$10/month in free credits for your first 3 months

See how much you could save