Skip to content

    Use case

    Private copilots.One key, many models

    In-house assistants grounded in your own documents and data.

    Why open models fit

    Retrieval, document reading, and grounded answers on open models, graduating to dedicated endpoints or AI Factory on the same API when governance demands it.

    The pipeline

    One OpenAI-compatible key runs the whole pipeline. Swap any step's model without touching your loop.

    1. 1 Serving
      RetrieveBGE-M3

      Embeddings over your internal documents.

      8K ctx · $0.009 per M

    2. 2 Serving
      Read documentsPaddleOCR-VL 1.5

      Vision OCR to read scanned and image-based files.

      16K ctx · $0.126 / $0.72 per M

    3. 3 Serving
      AnswerLlama 3.3 70B Instruct

      A chat model with a 131K-token context for grounded answers.

      128K ctx · $0.09 / $0.288 per M

    Call the whole pipeline

    Point the OpenAI SDK at FlexAI once. Every step names its own model on the same key: no per-model clients, endpoints, or keys to juggle.

    import OpenAI from "openai";
    
    const client = new OpenAI({
      baseURL: "https://tokens.flex.ai/v1",
      apiKey: process.env.FLEXAI_API_KEY,
    });
    
    const chunks = ["...your source text..."];
    // search() → your vector store lookup (pgvector, Pinecone, …)
    
    // 1. Retrieve: same key, swap the model anytime
    const retrieve = await client.embeddings.create({
      model: "bge-m3",
      input: chunks,
    });
    
    // 2. Read documents: same key, swap the model anytime
    const readDocuments = await client.chat.completions.create({
      model: "PaddleOCR-VL",
      messages: [{ role: "user", content: await search(retrieve.data[0].embedding) }],
    });
    
    // 3. Answer: same key, swap the model anytime
    const answer = await client.chat.completions.create({
      model: "Llama-3.3-70B-Instruct-FP8",
      messages: [{ role: "user", content: readDocuments.choices[0].message.content }],
    });

    Start serverless and pay per token. When a step becomes steady production traffic, move it to a dedicated endpoint on the same key.

    Build your private copilots on FlexAI

    Every model in the pipeline behind one OpenAI-compatible key, with source-linked pricing.

    $10/month in free credits for your first 3 months