Use case

Support agents.One key, many models

Agents that triage, answer, and resolve customer conversations.

Why open models fit

Support crosses voice, screenshots, and text, so open models cover transcription, document OCR, retrieval, reply, and speech behind one low-latency key.

The pipeline

One OpenAI-compatible key runs the whole pipeline. Swap any step's model without touching your loop.

1 Serving
TranscribeWhisper Large V3 Turbo
Speech-to-text for inbound calls and voice notes.
$0.00067 / min
2 Serving
Read attachmentsPaddleOCR-VL 1.5
Vision OCR to read screenshots and documents customers attach.
128K ctx · $0.14 / $0.8 per M
3 Serving
RetrieveBGE-M3
Embeddings to pull the right help-center context.
8K ctx · $0.01 per M
4 Serving
RespondLlama 3.1 8B Instruct
A small, fast chat model for low-latency replies.
128K ctx · $0.02 / $0.03 per M
5 Serving
SpeakKokoro-82M
Text-to-speech for spoken responses.
$0.62 / M chars

Call the whole pipeline

Point the OpenAI SDK at FlexAI once. Every step names its own model on the same key: no per-model clients, endpoints, or keys to juggle.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://tokens.flex.ai/v1",
  apiKey: process.env.FLEXAI_API_KEY,
});

const audio = await fetch("/call.mp3").then((r) => r.blob());
// search() → your vector store lookup (pgvector, Pinecone, …)

// 1. Transcribe: same key, swap the model anytime
const transcribe = await client.audio.transcriptions.create({
  model: "whisper-large-v3-turbo",
  file: audio,
});

// 2. Read attachments: same key, swap the model anytime
const readAttachments = await client.chat.completions.create({
  model: "PaddleOCR-VL",
  messages: [{ role: "user", content: transcribe.text }],
});

// 3. Retrieve: same key, swap the model anytime
const retrieve = await client.embeddings.create({
  model: "bge-m3",
  input: readAttachments.choices[0].message.content,
});

// 4. Respond: same key, swap the model anytime
const respond = await client.chat.completions.create({
  model: "Meta-Llama-3.1-8B-Instruct-FP8",
  messages: [{ role: "user", content: await search(retrieve.data[0].embedding) }],
});

// 5. Speak: same key, swap the model anytime
const speak = await client.audio.speech.create({
  model: "Kokoro-82M",
  voice: "alloy", input: respond.choices[0].message.content,
});

Start serverless and pay per token. When a step becomes steady production traffic, move it to a dedicated endpoint on the same key.

Build your support agents on FlexAI

Every model in the pipeline behind one OpenAI-compatible key, priced at the market rate.

Get an API key Talk to us

$10/month in free credits for your first 3 months

See how much you could save