PostgreSQL: debugging a slow query and optimizing it

It's Friday at 6:30 PM. Your query takes 4 seconds to respond in prod. The client sent you a screenshot of their loading screen. Your phone is ringing. There's a cold beer waiting for you. That's the context in which you're going to debug a slow PostgreSQL query.

Good news: 80% of PostgreSQL performance problems have an identifiable cause in under 10 minutes with the right tools. This guide follows the logical order of a real investigation — not an exhaustive PG feature catalog, but a method that works.

EXPLAIN vs EXPLAIN ANALYZE: one predicts, the other executes

The classic first mistake: using EXPLAIN alone and wondering why the numbers don't match what you observe in prod.

EXPLAIN displays the execution plan that PostgreSQL thinks it will use, based on the statistics it has in memory. It doesn't execute the query. Costs are estimates, rows are predictions.

EXPLAIN ANALYZE actually executes the query and enriches the plan with measured values: real time, actual rows processed, number of loops. That's what you want. Warning: it actually runs the query — don't run it on a DELETE or UPDATE without a BEGIN/ROLLBACK.

-- Safe for write queries
BEGIN;
EXPLAIN ANALYZE
    UPDATE orders SET status = 'processed' WHERE created_at < NOW() - INTERVAL '30 days';
ROLLBACK;

-- For a SELECT, no special precautions needed
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
    SELECT u.email, COUNT(o.id) AS order_count
    FROM users u
    JOIN orders o ON o.user_id = u.id
    WHERE u.created_at >= '2025-01-01'
    GROUP BY u.email
    ORDER BY order_count DESC
    LIMIT 20;

The BUFFERS option is valuable: it shows how many blocks were read from cache (shared hit) vs from disk (read). A query with lots of disk reads on a table that should be cached is a strong signal.

What to look for in the output

Three things to spot immediately:

  • Seq Scan on a large table: PostgreSQL scans every row one by one. On a 50,000-row table, this is often acceptable. On 10 million, it's not.
  • Large gap between rows estimated and rows actual: if PostgreSQL expected to filter 50 rows and processed 80,000, its statistics are stale or the query is poorly written.
  • Node with disproportionate time: the plan is a tree. The slowest node is the culprit. The displayed time is cumulative — subtract the time of child nodes to isolate a node's own cost.
-- Example of problematic EXPLAIN ANALYZE output
Seq Scan on orders  (cost=0.00..48520.00 rows=2000 width=120)
                    (actual time=0.042..3841.223 rows=1847291 loops=1)
  Filter: (status = 'pending' AND created_at > '2024-01-01')
  Rows Removed by Filter: 152709
Planning Time: 1.2 ms
Execution Time: 4102.8 ms

-- What we see:
-- 1. Seq Scan on "orders" → no index used
-- 2. rows estimated = 2,000, rows actual = 1,847,291 → catastrophic statistics
-- 3. 4 seconds → matches exactly what the client is seeing

The usual suspects

1. Missing index on a filtered column

The most common case. An orders table with 2 million rows, a query filtering on user_id, no index. PostgreSQL does a full Seq Scan. The fix takes 30 seconds.

-- The slow query
SELECT id, total, status
FROM orders
WHERE user_id = 42
ORDER BY created_at DESC;

-- EXPLAIN ANALYZE shows:
-- Seq Scan on orders  (actual time=0.031..2341.5 rows=47 loops=1)
-- Filter: (user_id = 42)
-- Rows Removed by Filter: 1999953

-- The fix
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);
-- CONCURRENTLY: creates the index without locking the table for writes

-- After creation, the same query:
-- Index Scan using idx_orders_user_id on orders
-- (actual time=0.041..0.189 rows=47 loops=1)
-- 2.3 seconds → 0.2 ms

2. Index exists but isn't used

A trickier situation: the index exists, but PostgreSQL doesn't use it. Three main reasons.

Function in the WHERE clause: the moment you apply a function to an indexed column, the index is unusable — PostgreSQL can't scan the index on the transformed value.

-- Index on email, but unused
CREATE INDEX idx_users_email ON users(email);

-- The index does nothing here
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- Expression index — solves the problem
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Or rewrite the query if data is already lowercase
SELECT * FROM users WHERE email = 'alice@example.com';

LIKE with leading wildcard: LIKE '%dupont' can't use a B-tree index, because it requires scanning all values to find those that end with "dupont". LIKE 'dupont%', on the other hand, can.

-- Guaranteed Seq Scan, even with an index on last_name
SELECT * FROM customers WHERE last_name LIKE '%martin';

-- Uses the B-tree index
SELECT * FROM customers WHERE last_name LIKE 'martin%';

-- For "contains" searches, use pg_trgm + GIN
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_customers_last_name_trgm
    ON customers USING GIN(last_name gin_trgm_ops);

-- Now this LIKE uses the index
SELECT * FROM customers WHERE last_name LIKE '%martin%';

Poor cardinality: if a column has very few distinct values (e.g. status with 3 possible values on 2 million rows), PostgreSQL may estimate that a Seq Scan is faster than an Index Scan followed by 600,000 random lookups. It's often right. The solution: a partial index.

-- 2 million orders, 95% have status = 'completed', 5% status = 'pending'
-- A simple index on status is not very useful for 'completed'
-- But very useful for 'pending' (100,000 rows out of 2 million)

-- Partial index: only indexes pending rows
CREATE INDEX idx_orders_pending
    ON orders(created_at)
    WHERE status = 'pending';

-- This query now uses the partial index
SELECT id, user_id, total
FROM orders
WHERE status = 'pending'
  AND created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC;

3. The N+1 problem via ORM

Not specific to PostgreSQL, but PostgreSQL suffers from it as much as any other database. The ORM loads 100 articles, then for each article makes a separate query to load the author. Result: 101 queries instead of one.

-- What the ORM generates (N+1)
SELECT * FROM articles LIMIT 100;
SELECT * FROM users WHERE id = 1;
SELECT * FROM users WHERE id = 2;
-- ... 98 more times

-- What it should do
SELECT a.id, a.title, a.body, a.published_at,
       u.id AS author_id, u.name AS author_name, u.avatar_url
FROM articles a
JOIN users u ON u.id = a.author_id
ORDER BY a.published_at DESC
LIMIT 100;

Detection happens in the logs or via pg_stat_statements (see below): dozens of identical queries with just one parameter changing, executed in rapid succession. In Go: plain SQL with a single JOIN. In Laravel/Django: eager loading.

4. Stale statistics

PostgreSQL maintains statistics on data distribution in each table (via autovacuum). If these statistics are outdated — after a large import, a bulk DELETE — the planner makes bad decisions. Symptom: large gap between rows estimated and rows actual in EXPLAIN ANALYZE.

-- Update statistics without blocking writes
ANALYZE orders;

-- After a bulk load or data purge
VACUUM ANALYZE orders;

-- Check statistics freshness
SELECT schemaname, tablename, last_analyze, last_autoanalyze, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE tablename = 'orders';

5. Too many rows returned

Sometimes the query is correctly indexed but returns 50,000 rows when you're displaying 20. Add LIMIT and paginate properly. For sorted lists on large volumes, offset-based pagination itself becomes a problem beyond a few thousand pages: OFFSET 100000 LIMIT 20 forces PG to scan 100,020 rows to return 20.

-- Offset pagination — slow on large pages
SELECT id, title, published_at
FROM articles
ORDER BY published_at DESC
LIMIT 20 OFFSET 10000;

-- Cursor pagination (keyset pagination) — fast regardless of page number
SELECT id, title, published_at
FROM articles
WHERE published_at < '2024-06-15 10:23:00'  -- last value from the previous page
ORDER BY published_at DESC
LIMIT 20;

Choosing the right index type

PostgreSQL offers several index types. In practice, you use three of them.

B-tree (default)

Covers 95% of cases: equality, inequality, sorting, range queries. It's the default when you write CREATE INDEX without specifying a type. Effective on high-cardinality columns (identifiers, emails, dates, amounts).

CREATE INDEX idx_orders_created_at ON orders(created_at);
CREATE INDEX idx_users_email ON users(email);

-- Composite index: column order matters.
-- Useful for queries that filter on user_id AND sort on created_at.
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

GIN — Full-text search and JSONB

For searches in text (tsvector), arrays, or JSONB columns. A B-tree index on a JSONB field doesn't help for queries like WHERE metadata @> '{"role": "admin"}'.

-- GIN index for JSONB queries
CREATE INDEX idx_users_metadata ON users USING GIN(metadata);

-- Now efficient
SELECT * FROM users WHERE metadata @> '{"role": "admin", "active": true}';

-- GIN index for full-text search
ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles SET search_vector =
    to_tsvector('english', COALESCE(title, '') || ' ' || COALESCE(body, ''));

CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);

SELECT title FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql optimization index');

BRIN — Sequential timestamps

BRIN (Block Range INdex) is tiny in size and highly efficient for columns whose values grow naturally with insertion: timestamps, sequential IDs. The principle: instead of indexing each value, it stores min/max values per data block. On a log table of several hundred GB, it can drastically reduce query time with an index of just a few MB.

-- Log table with 500 million rows.
-- A B-tree index on created_at would weigh ~15 GB.
-- A BRIN index weighs a few MB.
CREATE INDEX idx_logs_created_at_brin ON access_logs USING BRIN(created_at);

-- Efficient for time range queries
SELECT COUNT(*), path
FROM access_logs
WHERE created_at >= '2026-02-01' AND created_at < '2026-03-01'
GROUP BY path
ORDER BY count DESC;

pg_stat_statements: finding expensive queries in production

Up to now we knew which query to analyze. In production, you often don't. pg_stat_statements accumulates statistics on all executed queries: total time, call count, mean time. This is where you find queries that run 10,000 times per hour and take 200ms each — less visible than a 4-second query, but far more impactful in aggregate.

-- Enable the extension (requires a PostgreSQL restart)
-- In postgresql.conf: shared_preload_libraries = 'pg_stat_statements'
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- The 10 most expensive queries by total time
SELECT
    LEFT(query, 100) AS query_short,
    calls,
    ROUND(total_exec_time::numeric, 2) AS total_ms,
    ROUND(mean_exec_time::numeric, 2) AS mean_ms,
    ROUND(stddev_exec_time::numeric, 2) AS stddev_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Queries with the highest standard deviation (time instability)
-- Possible symptom: different plan depending on parameters, or random cache miss
SELECT
    LEFT(query, 100) AS query_short,
    calls,
    ROUND(mean_exec_time::numeric, 2) AS mean_ms,
    ROUND(stddev_exec_time::numeric, 2) AS stddev_ms
FROM pg_stat_statements
WHERE calls > 100
ORDER BY stddev_exec_time DESC
LIMIT 10;

-- Reset stats (useful after a deployment or optimization)
SELECT pg_stat_statements_reset();

pg_stat_statements doesn't interrupt service and has no measurable performance impact. There's no reason not to enable it in production on all your PostgreSQL clusters.

Quick tips

Force PostgreSQL to show the plan with an index

Sometimes PostgreSQL chooses a Seq Scan when an index exists, and you want to see what the plan would look like with the index. SET enable_seqscan = off forces the planner to prefer index scans for the current session. Useful for diagnosis, not as a permanent fix.

-- In your session only — no global change
SET enable_seqscan = off;

EXPLAIN ANALYZE
    SELECT * FROM orders WHERE user_id = 42;

-- Restore to normal
SET enable_seqscan = on;

-- Interpretation:
-- If the plan with index is faster → create the index or run ANALYZE (stale stats).
-- If the plan with index is slower → PostgreSQL was right to do the Seq Scan.
--   Perhaps the query returns too many rows for an index to help.

work_mem for sorts and hash joins

When you see Sort Method: external merge Disk in EXPLAIN ANALYZE, PostgreSQL ran out of memory and had to write to disk for sorting. Increasing work_mem for the session can fix the problem immediately.

-- Default work_mem = 4MB — insufficient for large aggregations
SET work_mem = '64MB';

EXPLAIN ANALYZE
    SELECT user_id, SUM(total) AS revenue
    FROM orders
    WHERE created_at >= '2025-01-01'
    GROUP BY user_id
    ORDER BY revenue DESC;

-- With sufficient work_mem:
-- Sort Method: quicksort  Memory: 12kB  ← all in RAM, fast

-- Without sufficient work_mem:
-- Sort Method: external merge  Disk: 48MB  ← writes to disk, slow

-- Warning: don't increase work_mem globally without calculation.
-- It applies PER sort operation AND per concurrent connection.
-- 100 connections × 5 sorts × 64MB = potentially 32 GB consumed.
-- Reserve it for sessions that need it, or adjust max_connections.

VACUUM ANALYZE after a bulk load

-- After an import or bulk DELETE
VACUUM ANALYZE orders;

-- VACUUM reclaims space from dead tuples (after UPDATE/DELETE)
-- ANALYZE updates planner statistics
-- Both together: essential after any large data modification

-- Check table bloat
SELECT n_dead_tup, n_live_tup,
       ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables
WHERE tablename = 'orders';
-- If dead_pct > 10-20%, a VACUUM is needed

The method in summary

When a PostgreSQL query is slow, here's the investigation order that resolves the vast majority of cases:

  1. EXPLAIN (ANALYZE, BUFFERS) on the offending query. Identify the slowest node. Look for Seq Scan on a large table, gap between rows estimated / actual.
  2. Missing index? Create it with CONCURRENTLY in production. Verify the query is written to benefit from it — no function on the indexed column, no LIKE '%...'.
  3. Stale statistics? VACUUM ANALYZE on the affected table.
  4. In production with no specific query to target? pg_stat_statements to identify offenders by total time or call count.

80% of PostgreSQL performance problems are solved with EXPLAIN ANALYZE and a well-placed index. The remaining 20% involve configuration tuning (shared_buffers, work_mem, max_connections), rewriting complex queries, or table partitioning — but that's a topic for another article.

The beer can wait another 10 minutes. The client can't.

📄 Associated CLAUDE.md

Comments (0)