Run LLM inference on Pinata with an OpenAI-compatible API
Pinata Inference is a hosted, OpenAI-compatible LLM endpoint. Enable it on your account, then use it two ways:
API — point any OpenAI-compatible client at Pinata and call chat completions directly, no agent required.
Agents app — the same hosted models show up alongside your other providers in an agent’s Models list (connect the Pinata provider in the Secrets Vault).
Either way, requests run against the models Pinata hosts, and usage is metered and drawn down from your credit balance.
Inference is billed through credits. Before your first request, make sure your workspace has a credit balance and (optionally) auto top-up enabled so requests don’t fail on an empty balance.
Enabling generates an Ed25519 key pair: the private key is returned once in this response and also stored encrypted in your secrets as PINATA_LLM_KEY; the public key is kept for request validation.
The privateKey is your inference credential — it’s what authenticates chat completions, not your Pinata JWT. Save it now; it isn’t returned again.
Prefer the dashboard? Connecting the Pinata provider in the Secrets Vault does the same thing, and is also how Pinata-hosted models become selectable inside your agents.
Disabling revokes the inference key and removes the associated secrets. New requests to the chat completions endpoint are rejected and Pinata-hosted models are no longer selectable in agents — any running agents using the managed key are automatically flipped to the free tier fallback so they don’t break. Your credit balance is untouched — disabling does not refund or expire credits — and you can re-enable at any time (which mints a newprivateKey).
Disabling takes effect immediately. Any in-flight requests may be cut off, and anything pointing at the endpoint will start getting errors until you re-enable.