1 min
AI OpenAI shows how they scaled a single-primary Azure PostgreSQL instance plus ~50 global read replicas to handle millions of mostly-read queries per second for 800M ChatGPT/API users. They aggressively offload and shard write-heavy workloads to systems like Cosmos DB, optimize expensive queries and ORMs, isolate noisy workloads, use PgBouncer and caching (with cache locking) to protect Postgres, carefully manage schema changes, and are working with Azure on cascading replication—achieving low double-digit ms p99 latency and 99.999% availability while postponing full Postgres sharding.
Comments