Use case
Private copilots.One key, many models
In-house assistants grounded in your own documents and data.
Why open models fit
Retrieval, document reading, and grounded answers on open models, graduating to dedicated endpoints or AI Factory on the same API when governance demands it.
The pipeline
One OpenAI-compatible key runs the whole pipeline. Swap any step's model without touching your loop.
- 2 ServingRead documentsPaddleOCR-VL 1.5
Vision OCR to read scanned and image-based files.
16K ctx · $0.126 / $0.72 per M
- 3 ServingAnswerLlama 3.3 70B Instruct
A chat model with a 131K-token context for grounded answers.
128K ctx · $0.09 / $0.288 per M
Call the whole pipeline
Point the OpenAI SDK at FlexAI once. Every step names its own model on the same key: no per-model clients, endpoints, or keys to juggle.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://tokens.flex.ai/v1",
apiKey: process.env.FLEXAI_API_KEY,
});
const chunks = ["...your source text..."];
// search() → your vector store lookup (pgvector, Pinecone, …)
// 1. Retrieve: same key, swap the model anytime
const retrieve = await client.embeddings.create({
model: "bge-m3",
input: chunks,
});
// 2. Read documents: same key, swap the model anytime
const readDocuments = await client.chat.completions.create({
model: "PaddleOCR-VL",
messages: [{ role: "user", content: await search(retrieve.data[0].embedding) }],
});
// 3. Answer: same key, swap the model anytime
const answer = await client.chat.completions.create({
model: "Llama-3.3-70B-Instruct-FP8",
messages: [{ role: "user", content: readDocuments.choices[0].message.content }],
});Start serverless and pay per token. When a step becomes steady production traffic, move it to a dedicated endpoint on the same key.
Build your private copilots on FlexAI
Every model in the pipeline behind one OpenAI-compatible key, with source-linked pricing.
$10/month in free credits for your first 3 months