Use case
Research agents.One key, many models
Agents that retrieve, reason over, and synthesize large source sets.
Why open models fit
Retrieval-heavy and long-context: open embeddings retrieve, a reasoning model works the evidence, and a long-context model synthesizes, each on one key.
The pipeline
One OpenAI-compatible key runs the whole pipeline. Swap any step's model without touching your loop.
- 2 ServingReasonDeepSeek V3.2
A reasoning-grade chat model with a 163K-token context.
160K ctx · $0.225 / $0.225 per M
- 3 ServingSummarizeDeepSeek V4 Flash
A 1M-token context for synthesizing long source sets.
1.0M ctx · $0.082 / $0.164 per M
Call the whole pipeline
Point the OpenAI SDK at FlexAI once. Every step names its own model on the same key: no per-model clients, endpoints, or keys to juggle.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://tokens.flex.ai/v1",
apiKey: process.env.FLEXAI_API_KEY,
});
const chunks = ["...your source text..."];
// search() → your vector store lookup (pgvector, Pinecone, …)
// 1. Retrieve: same key, swap the model anytime
const retrieve = await client.embeddings.create({
model: "bge-m3",
input: chunks,
});
// 2. Reason: same key, swap the model anytime
const reason = await client.chat.completions.create({
model: "DeepSeek-V3.2",
messages: [{ role: "user", content: await search(retrieve.data[0].embedding) }],
});
// 3. Summarize: same key, swap the model anytime
const summarize = await client.chat.completions.create({
model: "DeepSeek-V4-Flash",
messages: [{ role: "user", content: reason.choices[0].message.content }],
});Start serverless and pay per token. When a step becomes steady production traffic, move it to a dedicated endpoint on the same key.
Build your research agents on FlexAI
Every model in the pipeline behind one OpenAI-compatible key, with source-linked pricing.
$10/month in free credits for your first 3 months