Skip to content

    Use case

    Support agents.One key, many models

    Agents that triage, answer, and resolve customer conversations.

    Why open models fit

    Support crosses voice, screenshots, and text, so open models cover transcription, document OCR, retrieval, reply, and speech behind one low-latency key.

    The pipeline

    One OpenAI-compatible key runs the whole pipeline. Swap any step's model without touching your loop.

    1. 1 Serving
      TranscribeWhisper Large V3 Turbo

      Speech-to-text for inbound calls and voice notes.

      $0.0006 / min

    2. 2 Serving
      Read attachmentsPaddleOCR-VL 1.5

      Vision OCR to read screenshots and documents customers attach.

      16K ctx · $0.126 / $0.72 per M

    3. 3 Serving
      RetrieveBGE-M3

      Embeddings to pull the right help-center context.

      8K ctx · $0.009 per M

    4. 4 Serving
      RespondGPT-OSS 20B

      A small, fast chat model for low-latency replies.

      128K ctx · $0.027 / $0.117 per M

    5. 5 Serving
      SpeakKokoro-82M

      Text-to-speech for spoken responses.

      $0.558 / M chars

    Call the whole pipeline

    Point the OpenAI SDK at FlexAI once. Every step names its own model on the same key: no per-model clients, endpoints, or keys to juggle.

    import OpenAI from "openai";
    
    const client = new OpenAI({
      baseURL: "https://tokens.flex.ai/v1",
      apiKey: process.env.FLEXAI_API_KEY,
    });
    
    const audio = await fetch("/call.mp3").then((r) => r.blob());
    // search() → your vector store lookup (pgvector, Pinecone, …)
    
    // 1. Transcribe: same key, swap the model anytime
    const transcribe = await client.audio.transcriptions.create({
      model: "whisper-large-v3-turbo",
      file: audio,
    });
    
    // 2. Read attachments: same key, swap the model anytime
    const readAttachments = await client.chat.completions.create({
      model: "PaddleOCR-VL",
      messages: [{ role: "user", content: transcribe.text }],
    });
    
    // 3. Retrieve: same key, swap the model anytime
    const retrieve = await client.embeddings.create({
      model: "bge-m3",
      input: readAttachments.choices[0].message.content,
    });
    
    // 4. Respond: same key, swap the model anytime
    const respond = await client.chat.completions.create({
      model: "gpt-oss-20b",
      messages: [{ role: "user", content: await search(retrieve.data[0].embedding) }],
    });
    
    // 5. Speak: same key, swap the model anytime
    const speak = await client.audio.speech.create({
      model: "Kokoro-82M",
      voice: "alloy", input: respond.choices[0].message.content,
    });

    Start serverless and pay per token. When a step becomes steady production traffic, move it to a dedicated endpoint on the same key.

    Build your support agents on FlexAI

    Every model in the pipeline behind one OpenAI-compatible key, with source-linked pricing.

    $10/month in free credits for your first 3 months