Skip to content

    Use case

    Research agents.One key, many models

    Agents that retrieve, reason over, and synthesize large source sets.

    Why open models fit

    Retrieval-heavy and long-context: open embeddings retrieve, a reasoning model works the evidence, and a long-context model synthesizes, each on one key.

    The pipeline

    One OpenAI-compatible key runs the whole pipeline. Swap any step's model without touching your loop.

    1. 1 Serving
      RetrieveBGE-M3

      Multilingual embeddings for retrieval over your corpus.

      8K ctx · $0.009 per M

    2. 2 Serving
      ReasonDeepSeek V3.2

      A reasoning-grade chat model with a 163K-token context.

      160K ctx · $0.225 / $0.225 per M

    3. 3 Serving
      SummarizeDeepSeek V4 Flash

      A 1M-token context for synthesizing long source sets.

      1.0M ctx · $0.082 / $0.164 per M

    Call the whole pipeline

    Point the OpenAI SDK at FlexAI once. Every step names its own model on the same key: no per-model clients, endpoints, or keys to juggle.

    import OpenAI from "openai";
    
    const client = new OpenAI({
      baseURL: "https://tokens.flex.ai/v1",
      apiKey: process.env.FLEXAI_API_KEY,
    });
    
    const chunks = ["...your source text..."];
    // search() → your vector store lookup (pgvector, Pinecone, …)
    
    // 1. Retrieve: same key, swap the model anytime
    const retrieve = await client.embeddings.create({
      model: "bge-m3",
      input: chunks,
    });
    
    // 2. Reason: same key, swap the model anytime
    const reason = await client.chat.completions.create({
      model: "DeepSeek-V3.2",
      messages: [{ role: "user", content: await search(retrieve.data[0].embedding) }],
    });
    
    // 3. Summarize: same key, swap the model anytime
    const summarize = await client.chat.completions.create({
      model: "DeepSeek-V4-Flash",
      messages: [{ role: "user", content: reason.choices[0].message.content }],
    });

    Start serverless and pay per token. When a step becomes steady production traffic, move it to a dedicated endpoint on the same key.

    Build your research agents on FlexAI

    Every model in the pipeline behind one OpenAI-compatible key, with source-linked pricing.

    $10/month in free credits for your first 3 months