2026-05-19HexSaga

Debugging Common AI API 401, 429, and 500 Errors

A practical troubleshooting guide for AI APIs, OpenAI-compatible gateways, and relay providers, focused on separating authentication, quota, rate limits, model configuration, proxy issues, and upstream failures.

AI / Debugging AI API 401 429 500 Troubleshooting

Debugging Common AI API 401, 429, and 500 Errors

When an AI API call fails, it is tempting to say “the model is down” or “the relay is unstable.” Sometimes that is true. Often the real cause is authentication, quota, rate limits, endpoint mismatch, proxy behavior, or request shape.

Before changing models, keys, or providers, answer one question: which layer failed?

Identity: key, project, organization, permission.
Quota: balance, quota, subscription, billing cap.
Rate limit: RPM, TPM, concurrency, queue limits, gateway limits.
Request: endpoint, model id, parameters, context length.
Network: DNS, proxy, TLS, timeout.
Upstream: model service, region, third-party provider.

If you have not checked the configuration basics yet, start with /en/posts/ai-tool-config-checklist.

Collect the Smallest Useful Evidence

Do not stop at a screenshot that says “request failed.” Keep at least:

Time of the request.
Tool or SDK version.
base_url domain and path.
Endpoint, such as /v1/responses, /v1/chat/completions, or /v1/messages.
Model id.
HTTP status code.
Error type, code, and message from the response body.
Request id, trace id, or gateway log id.
Whether a relay, company gateway, or proxy is involved.

Redact before sharing. Do not expose full API keys, full prompts, cookies, sessions, JWTs, or production user data.

Then reproduce with a minimal request:

Ask for one hello sentence. No tools, no long context, no files, no concurrency.

If the minimal request fails, focus on authentication, endpoint, network, and quota. If the minimal request succeeds but the real task fails, look at context length, tool calls, output format, concurrency, and model capability.

401: Treat It as Identity First

401 usually means the request was not authenticated correctly.

Common causes:

The key is wrong, copied with extra whitespace, or truncated.
The tool did not read the key you thought it read.
The key belongs to another platform.
The key was revoked, expired, or attached to a disabled project.
The account has multiple organizations, projects, or workspaces, and this key is not attached to the target one.
A relay key and an official provider key were mixed up.
The Authorization: Bearer ... header was not sent correctly.

A practical order:

Confirm the key source the tool actually uses.
Send a minimal request with that same key.
Confirm base_url belongs to the platform that issued the key.
Confirm the endpoint matches the platform protocol.
Check the key, project, and permission status in the provider dashboard.

If you call through a relay, separate relay-side 401 from upstream 401. A relay-side 401 may mean your relay key is invalid. An upstream 401 may mean the relay’s upstream channel is invalid. Those are different owners and different fixes.

A useful signal: if the same relay key returns 401 for every model, suspect relay identity. If only one upstream model fails, suspect that upstream channel or permission.

403: Not 401, Still Often Permission

This article focuses on 401, 429, and 500, but 403 appears often enough to mention. 403 usually means “you are identified, but not allowed to do this.”

Common causes:

The key lacks access to the target model.
The model requires additional enablement.
The project or organization does not satisfy billing, region, or safety requirements.
Gateway policy blocks that endpoint.
A company proxy or WAF rejects the request.

Do not solve 403 by blindly rotating keys. First confirm whether this key should be allowed to call this model, endpoint, and region.

429: Separate Quota From Rate Limit

429 is easy to misread. It can mean “too many requests,” but it can also mean insufficient balance, project quota exhaustion, relay-level throttling, or gateway-level throttling.

Common types:

Insufficient balance: no money, plan exhausted, project quota reached.
Request rate limit: too many requests per minute.
Token rate limit: too many input and output tokens per minute.
Concurrency limit: too many simultaneous requests.
Queue limit: batch or agent jobs are queued too heavily.
Gateway limit: relay provider, API gateway, proxy, or WAF limit.

Read the response body for clues. Different providers use different wording, but look for quota, rate limit, tokens per minute, requests per minute, insufficient balance, or too many requests.

The fix depends on the type:

Insufficient quota: add balance, switch project, use cheaper models, cap output.
RPM limit: slow down requests and add backoff.
TPM limit: reduce context, reduce concurrency, cap output tokens.
Concurrency limit: add a queue and avoid launching too many agents.
Gateway limit: inspect gateway logs and relay dashboard, not only upstream status.

For coding agents, 429 often comes from large context plus concurrency. One agent can read many files, call tools, retry, and generate long outputs. That can hit token-rate limits quickly. For task splitting and model choice, see /en/posts/choose-model-for-coding-agent.

500: Do Not Immediately Blame the Model

5xx errors mean a server-side failure, but “server-side” may refer to several layers:

Local proxy.
Company API gateway.
Relay provider.
Upstream model provider.
A regional model service.
An endpoint compatibility adapter.

Start with the response body and headers. If there is a request id or trace id, keep it. Without that id, support and gateway operators have far less to search for.

Common causes:

Endpoint and model are not compatible.
The gateway transformed the request body incorrectly.
Long context or input format triggered a server error.
Tool schema is too complex or unsupported.
Upstream provider has a temporary incident.
Relay channel switching failed.
Proxy timeout was wrapped as a 500.

Debug in this order:

Does the minimal request return 500?
Does a smaller model on the same platform return 500?
Does another endpoint return 500?
Does the request fail without the relay or gateway?
Does it recover after reducing context length and disabling tools?
Do status pages or gateway logs show an incident?

If only a complex request fails, shrink the body. Remove tools, long context, images, files, and long output limits. Finding “the request fails only when this part is added” is much more useful than saying “it returned 500.”

Identify Where the Error Came From

The same status code means different things depending on the source.

Useful signals:

Whether response headers identify the upstream provider.
Whether the error body looks like OpenAI, Anthropic, or the gateway’s own format.
Whether the relay dashboard has a record for this request.
Whether the upstream dashboard has a record.
Whether the request id is a gateway id or an upstream id.
Whether direct upstream access reproduces the error.

If the relay dashboard has no record, the request may never have reached the relay. Check local network, base_url, proxy, and DNS.

If the relay has a record but the upstream does not, suspect relay authentication, routing, channel health, request adaptation, or relay-to-upstream network.

If the upstream also has a record, the issue is more likely upstream rejection, rate limiting, model capability, or the request body itself.

Retry Carefully

Not every error should be retried.

401: usually do not retry automatically; fix the key.
403: do not blindly retry; check permission.
429: retry with backoff, but also reduce concurrency or token volume.
500/502/503/504: retry a few times and keep request ids.
Timeout: retry, but consider whether the first request may have already been billed or caused side effects.

Be especially careful with agents. One “retry” may reread files, re-run tools, and regenerate long output. Unbounded retries can turn a small outage into a quota problem.

A Short Status-Code Table

Status	Suspect first	First action
401	Key, project, auth header	Confirm the actual key source and matching base URL
403	Permission, model access, policy	Confirm the key can call this model and endpoint
429	Quota, RPM, TPM, concurrency	Determine whether it is balance or rate limiting
500	Gateway, upstream, request body	Reproduce with a minimal request and keep request id
502/503/504	Upstream or proxy chain	Check gateway logs, status pages, and retry results

The goal is not to memorize status codes. The goal is to locate the failing layer first: identity, quota, rate limit, request, network, or upstream. Once that layer is clear, changing keys, models, or providers becomes a deliberate fix instead of random motion.