Skip to content

    Getting started

    Switch to FlexAIfrom OpenAI or Claude

    Your code already works, so point it at FlexAI and keep going: 20+ open models behind the same OpenAI-compatible surface, at competitive performance and pricing.

    Keep your agent loop: prompts, RAG, tools, sessions, evals, approvals. FlexAI swaps the model layer first, the whole workload when you're ready.

    From the OpenAI SDK

    Change two values. Everything else stays.

    from openai import OpenAI
    import os
    
    client = OpenAI(
        base_url="https://tokens.flex.ai/v1",
        api_key=os.environ["FLEXAI_API_KEY"],
    )
    
    resp = client.chat.completions.create(
        model="Meta-Llama-3.1-8B-Instruct-FP8",
        messages=[{"role": "user", "content": "Hello from FlexAI"}],
    )
    print(resp.choices[0].message.content)

    What carries over unchanged: chat completions, streaming, tool calls, structured output, vision inputs. What changes: the model name. See the model mapping below.

    From the Claude API

    The Anthropic Messages API differs from the OpenAI shape, so the move is a small refactor, not a config change. The mapping: the system prompt moves from the top-level system parameter into a system-role message; content blocks flatten to message strings unless you use vision; tool definitions move from input_schema to the OpenAI function-calling schema; max_tokens stays required. If you use the OpenAI-compat layer in your framework (Vercel AI SDK, LangChain), switch the provider to an OpenAI-compatible endpoint and you are done.

    Before (Anthropic)

    import anthropic
    
    client = anthropic.Anthropic()
    
    resp = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="You are a helpful assistant.",
        messages=[{"role": "user", "content": "Hello"}],
    )
    print(resp.content[0].text)

    After (FlexAI)

    from openai import OpenAI
    import os
    
    client = OpenAI(
        base_url="https://tokens.flex.ai/v1",
        api_key=os.environ["FLEXAI_API_KEY"],
    )
    
    resp = client.chat.completions.create(
        model="Meta-Llama-3.1-8B-Instruct-FP8",
        max_tokens=1024,
        messages=[
            # system prompt becomes a system-role message
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello"},
        ],
    )
    print(resp.choices[0].message.content)

    Which model should I pick?

    A starting point. Browse the full catalog for the rest.

    Choosing models for agents: coding agents → reasoning and coding models · RAG agents → context window and retrieval quality · interactive agents → balance latency and cost.
    Coming fromTry on FlexAIWhy
    gpt-4o-mini (small, fast chat)Llama 3.1 8B InstructSmall and fast for high-volume classification, summarization, and RAG.
    claude-haiku class (fast, low-cost)Qwen3.5 9BA small, fast model for high-volume, latency-sensitive calls.
    claude-sonnet class (balanced)Qwen3.6-35B-A3BA capable, balanced model for everyday production work.
    gpt-4 / gpt-4o class (frontier chat)DeepSeek V3.2A large open flagship for the hardest reasoning tasks.
    text-embedding-3 / ada (embeddings)BGE-M3Multilingual retrieval embeddings.
    DALL·E / image generationFLUX.1 [schnell]Fast open image generation.

    Browse all models

    Verify your first call

    A successful response returns the same OpenAI response shape. If you get 402, your key budget is exhausted; 429 means rate limits. The status page shows per-model availability.

    What you keep

    Your app loop, prompts, RAG, tools, sessions, eval harness, and approval logic. FlexAI replaces the model layer, not your application. When you want FlexAI to run whole agent workloads, see the Agent SDK.

    Switch in two values

    Point the OpenAI SDK at tokens.flex.ai, swap the model, and keep shipping.

    $10/month in free credits for your first 3 months