Why PostgreSQL Queries Use Slow Plans: Understanding Planner Statistics

Apr 17, 2026

I added an index on the status column. It should have made my query faster, but PostgreSQL kept doing a sequential scan. I was confused—why would the planner ignore my index?

The answer was in the statistics. PostgreSQL’s query planner doesn’t look at your data directly; it looks at statistics about your data. When those statistics are wrong or missing, the planner makes bad decisions.

The Problem: Index Exists, But Not Used

Here’s the scenario that drove me crazy:

-- I created this index
CREATE INDEX idx_orders_status ON orders(status);

-- But my query still used a sequential scan
EXPLAIN SELECT * FROM orders WHERE status = 'completed';
-- Result: Seq Scan on orders  (cost=0.00..150000.00 rows=5000000)

Why? The planner estimated 5 million rows would match. With that many rows, a sequential scan IS faster than an index scan—it would have to jump back and forth between index and table 5 million times.

But the actual count was only 50,000 rows. An index scan would have been perfect.

How Planner Statistics Work

PostgreSQL maintains statistics about each column to estimate query costs. These statistics are the planner’s only view of your data.

┌─────────────────────────────────────────────────────────────┐
│                    Column Statistics                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Most Common Values (MCV)                                    │
│  ├── List of frequently occurring values                    │
│  ├── Their frequencies (how often each appears)             │
│  └── Used for: equality conditions (WHERE col = 'value')    │
│                                                              │
│  Histogram                                                   │
│  ├── Distribution of values across buckets                  │
│  ├── Used for: range conditions (WHERE col > 100)           │
│  └── Helps estimate selectivity of comparisons              │
│                                                              │
│  Correlation                                                 │
│  ├── How well physical order matches value order            │
│  ├── Values from -1.0 to 1.0                                 │
│  └── Used for: deciding between index scan vs seq scan      │
│                                                              │
│  n_distinct                                                  │
│  ├── Estimated number of distinct values                     │
│  └── Used for: join planning, GROUP BY estimation            │
│                                                              │
└─────────────────────────────────────────────────────────────┘

These statistics are gathered by the ANALYZE command, which samples a portion of your table—not all rows.

Checking Your Statistics

The pg_stats view shows what the planner knows about your columns:

SELECT
    attname as column_name,
    n_distinct,
    most_common_vals as mcv,
    most_common_freqs as mcv_freqs,
    histogram_bounds,
    correlation
FROM pg_stats
WHERE tablename = 'orders'
AND attname = 'status';

Key columns to understand:

column_name  | status
n_distinct   | 5                      -- About 5 different status values
mcv          | {pending,completed}    -- Most common values
mcv_freqs    | {0.35,0.30}            -- 35% pending, 30% completed
histogram    | {cancelled,processing}-- Distribution boundaries
correlation  | 0.12                    -- Low correlation with physical order

When I ran this on my orders table, I saw the problem: mcv_freqs showed completed at 0.50 (50%), but my actual data had only 5% completed orders. The statistics were stale.

The Statistics Problem

Statistics can be wrong for several reasons:

1. Outdated Statistics

After bulk operations, statistics don’t automatically update:

-- This doesn't update statistics automatically
COPY orders FROM '/bulk_data.csv';

-- The planner still thinks the table has old row counts
-- You must run ANALYZE
ANALYZE orders;

2. Sampling Limitations

ANALYZE samples rows, it doesn’t count all of them:

Table with 1,000,000 rows
    │
    ▼
default_statistics_target = 100 (default)
    │
    ▼
ANALYZE samples ~30,000 rows (300 * target)
    │
    ▼
Estimates based on this sample

For skewed distributions, this can miss important patterns.

3. Single-Column Statistics Miss Correlations

The planner’s default statistics are per-column. They don’t capture relationships between columns:

-- Problem: Two columns that are correlated
SELECT * FROM orders
WHERE status = 'completed'    -- 30% of rows
AND region = 'west';          -- 40% of rows

-- Planner estimates: 30% * 40% = 12% of rows (assumes independence)
-- Reality: Most completed orders ARE in west region = 28% of rows

The planner assumed independence and underestimated. It chose a nested loop join when a hash join would be better.

The Solution: Extended Statistics

PostgreSQL 10+ lets you create statistics that capture column correlations:

-- Create extended statistics for correlated columns
CREATE STATISTICS orders_stats (dependencies, mcv)
ON status, region
FROM orders;

-- Now ANALYZE will gather multi-column statistics
ANALYZE orders;

-- Check what was collected
SELECT
    stxname,
    stxkind,
    stxndistinct,
    stxdependencies
FROM pg_statistics_ext
WHERE stxrelid = 'orders'::regclass;

The stxdependencies column shows the correlation strength between columns:

status => region: 0.85    -- Strong dependency! 85% of the time,
                          -- knowing status tells us about region

Now the planner knows that status = 'completed' usually means region = 'west'.

A Real Debugging Session

I had a query that suddenly became slow after a data migration:

SELECT * FROM orders
WHERE status = 'pending'
AND priority = 'high'
ORDER BY created_at
LIMIT 50;

Step 1: Check the plan

EXPLAIN ANALYZE SELECT * FROM orders
WHERE status = 'pending' AND priority = 'high'
ORDER BY created_at LIMIT 50;

Step 2: Look at row estimates vs actual

Index Scan using idx_orders_created on orders
    (cost=0.43..8500.00 rows=50) (actual time=0.05..4500.00 rows=50000)
    Filter: ((status = 'pending') AND (priority = 'high'))
    Rows Removed by Filter: 500000

The planner thought 50 rows would match. It got 50,000. Wrong plan chosen.

Step 3: Check statistics

SELECT attname, n_distinct, most_common_vals, most_common_freqs
FROM pg_stats WHERE tablename = 'orders';

-- Result: Statistics showed old data distribution
-- My migration changed the data, but not the stats

Step 4: Fix with ANALYZE and extended statistics

-- Update basic statistics
ANALYZE orders;

-- Create extended statistics for correlated columns
CREATE STATISTICS orders_status_priority (mcv)
ON status, priority FROM orders;

ANALYZE orders;

Step 5: Verify the fix

Index Scan using idx_orders_status_priority_created on orders
    (cost=0.43..150.00 rows=50) (actual time=0.05..2.00 rows=50)

From 4.5 seconds to 2 milliseconds.

When to Update Statistics

┌─────────────────────────────────────────┐
│        Significant data change?           │
└────────────────┬────────────────────────┘
                 │
        ┌────────▼────────┐
        │        │         │
       YES       NO        │
        │        │         │
        ▼        ▼         │
  Run ANALYZE   Check      │
  immediately   plan        │
        │        │          │
        │        ▼          │
        │  Row estimates    │
        │  way off?         │
        │        │          │
        │   ┌────▼────┐     │
        │  YES       NO     │
        │   │         │     │
        │   ▼         ▼     │
        │ ANALYZE   OK      │
        └──┴────────────────┘

Run ANALYZE after:

Bulk INSERT, UPDATE, or DELETE
COPY operations
Data migrations
Significant changes in data distribution

Statistics Configuration

You can tune how detailed statistics get:

-- Increase statistics detail for a specific column
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;

-- Default is 100, higher = more accurate but slower ANALYZE
-- Range is 0 to 10000

-- For the whole database
SET default_statistics_target = 200;

-- Larger sample = better stats but longer ANALYZE time

Common Mistakes to Avoid

1. Assuming ANALYZE happens automatically

Autovacuum does run ANALYZE, but only after certain thresholds. After bulk operations, run it manually.

2. Ignoring correlation between columns

If you query on multiple columns together, create extended statistics.

3. Not checking pg_stats

Before assuming an index should be used, check what the planner knows:

-- If planner thinks your filter matches 90% of rows,
-- it will choose a seq scan even with an index
SELECT attname, n_distinct, most_common_vals, most_common_freqs
FROM pg_stats WHERE tablename = 'your_table';

4. Assuming statistics are exact

ANALYZE samples. For small tables with skewed data, estimates can be off. Consider increasing STATISTICS target.

Quick Diagnostic Checklist

When a query uses the wrong plan:

Run EXPLAIN ANALYZE to see estimated vs actual rows
Check pg_stats for the columns in your WHERE clause
Look for large discrepancies between estimates and reality
Run ANALYZE table_name if statistics might be stale
Consider extended statistics if filtering on multiple correlated columns
Verify your index is actually selective (matches small % of rows)

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!