Weekly LLM API Reliability Report — May 26, 2026

This week's real-time monitoring reveals interesting patterns across the top LLM APIs. Here's what happened.

Executive Summary

Provider	Uptime (7d)	Avg Latency	Incidents	Confidence
Claude	99.94%	1,024ms	1	High
ChatGPT	99.82%	573ms	2	High
xAI (Grok)	98.80%	1,358ms	3	High
Gemini	70.27%	915ms	2*	Medium**

*Gemini 2.5 Pro hit daily quota limits (1,000 requests/day). Gemini 2.5 Flash recovered quickly.
**Sampling: Gemini checked every 2–3 min vs 30s for others due to rate-limiting (explains variance).

Uptime Breakdown

🟢 Claude (Anthropic) — 99.94%

Status: Operational
Best performer this week. One minor degradation event (< 1 min) on May 26 morning, auto-recovered.

What happened: Haiku 3.5 had a transient issue at 02:00 UTC. No user reports, high confidence it was a momentary glitch.

🟢 ChatGPT (OpenAI) — 99.82%

Status: Operational
Solid week. Two brief incidents: GPT-4 Turbo and GPT-4o Mini both had < 1 min hiccups.

What happened:

May 26, 02:23 UTC: GPT-4 Turbo degraded for 12 seconds
May 26, 00:40 UTC: GPT-4o Mini degraded for 19 seconds

Both resolved instantly. Pattern suggests brief load spikes, not underlying issues.

🟡 xAI (Grok) — 98.80%

Status: Operational
Reliable, but slower than Claude/ChatGPT. Three incidents, all brief.

What happened:

May 25, 22:54 UTC: Grok 3 timeout (30s) — 25-second event
May 25, 22:38 UTC: Grok 3 Mini degraded (< 1 min)
May 25, 22:20 UTC: Grok 3 Mini timeout — 19 seconds

All transient. Grok's latency is higher (1,358ms avg) but predictable.

🔴 Gemini (Google) — 70.27%

Status: Degraded (Rate-Limited)
⚠️ Important context: Gemini 2.5 Pro hit Google's daily quota (1,000 requests/day).

What happened:

May 26, 01:40 UTC: Pro model exceeded daily limit. Status: 429 (rate-limited)
Duration: 2+ hours (ongoing as of this report)
Root cause: Google's free tier API has a hard 1,000-request-per-day limit
Fix: Will reset at UTC midnight, or upgrade to paid tier

Gemini 2.5 Flash: Operating normally (100% uptime). It has a separate quota.

Why the uptime looks low:

We check Gemini every 2–3 minutes (vs 30s for others) due to rate-limiting
Fewer data points = higher variance
The 70% reflects rate-limit periods, not actual API failures
Real availability: Likely 99%+ when quota is available

Latency Comparison

xAI (Grok):     1,358ms ████████████████
Gemini:           915ms ███████████
Claude:         1,024ms █████████████
ChatGPT:          573ms ███████

Observations:

ChatGPT is fastest (reasonable for largest volume)
Claude is slightly slower but consistent
Grok has highest latency (expected for newer infrastructure)
Gemini latency is solid when not rate-limited

What This Means for You

For Production Apps

Priority ranking by reliability:

Claude — Best uptime, acceptable latency, most stable
ChatGPT — Proven reliability, fastest, occasional transients
Grok — Reliable but slower, consider for non-latency-critical paths
Gemini — Avoid for production until you upgrade beyond free tier

For Cost Optimization

Gemini free tier = Daily quota wall (1,000 req/day) — upgrade or use paid tier
Claude = Premium but most reliable
ChatGPT = Volume pricing, frequent brief spikes but always recovers
Grok = Budget option, higher latency but stable

For Monitoring

Set up alerts for:

ChatGPT: Consecutive errors (tends to spike then recover)
Claude: Anything unusual (normally rock-solid)
Grok: Latency thresholds (high baseline)
Gemini: Rate-limit errors (quota management issue, not API issue)

Methodology & Confidence

How we measure:

Real API calls every 30 seconds to each model
Health check: "Respond with exactly: OK" (minimal cost)
Status codes: 200 = Operational, 429 = Degraded, 5xx = Outage
Uptime % = Operational checks / Total checks

Confidence levels:

High (ChatGPT, Claude, Grok): 2,880+ data points/day (30s sampling)
Medium (Gemini): 480–720 data points/day (2–3 min sampling due to rate limits)

Gemini's lower confidence is due to sparse sampling, not API unreliability.

Next Week

Watch for:

Gemini quota reset — Will it re-stabilize post-midnight UTC?
Grok latency — Is 1,358ms the baseline, or a temporary spike?
ChatGPT transients — Pattern of brief spikes suggests load management

Track real-time status at IsItDown.ai

Published by the Is It Down AI Team. Questions? Open an issue or reach out.