Model family

Gemma on FlexAI.Every variant. One key

Gemma is Google's open model line on FlexAI. 2 variants run serverless on the OpenAI-compatible API, with 2 more available as dedicated endpoints, spanning chat, multimodal. One API key serves every variant.

Get an API key All models

Variants

Every served variant in the family, with live serverless pricing.

Serverless · pay per token

Model	Context	Price	Status
Gemma 4 31B IT	256K	$0.09 / $0.34 per M	Serving
Gemma 4 26B A4B	256K	$0.06 / $0.3 per M	Serving

Dedicated endpoints · reserved GPUs

Model	Context	Price	Status
Gemma 3 27B IT	128K	Dedicated	Dedicated
Gemma 3n E4B Instruct	32K	Dedicated	Dedicated

Which variant for what

Pick by the role you're filling. Same key for all of them.

Flagship

Gemma 4 31B IT

Gemma 4 31B IT is the largest served serverless variant. Reach for it first.

Fast & economical

Gemma 4 26B A4B

Gemma 4 26B A4B is the smallest serverless variant. Lowest latency and cost.

Gemma 4 31B IT runs describe & qa in multimodal generation

Call the flagship

OpenAI-compatible. Swap the model id for any variant above.

curl https://tokens.flex.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLEXAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-31b-it",
    "messages": [{"role": "user", "content": "Hello from FlexAI"}]
  }'

Where it runs

Use cases that put Gemma to work in a pipeline.

Multimodal generation

Agents that generate images and reason over what they produce.

See the pipeline

Run Gemma on one API key

Every Gemma variant, serverless and dedicated, behind one OpenAI-compatible key.

Get an API key Talk to us

$10/month in free credits for your first 3 months

See how much you could save