How to Read PostgreSQL EXPLAIN Output: A Practical Guide to Query Plan Analysis
The Problem: Developers Misread EXPLAIN Output
Most developers run EXPLAIN SELECT ... and see a tree of nodes with numbers, but don’t know which numbers matter. The result: they miss the real bottlenecks or misinterpret cost estimates as execution time.
EXPLAIN Parameters That Matter
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, COSTS, SETTINGS) SELECT ...;| Parameter | What It Shows | Why It Matters |
|---|---|---|
| ANALYZE | Actual execution time | Ground truth vs estimates |
| BUFFERS | Cache hits vs disk reads | I/O bottleneck indicator |
| VERBOSE | Output column details | Debug column references |
| SETTINGS | Configuration used | Spot work_mem issues |
Key Nodes to Identify
Seq Scan (Sequential Scan)
I always check for “Rows Removed by Filter” in Seq Scan nodes. If more than 50% of rows are removed, I’m wasting disk reads. The solution is adding an index on the filter column.
Index Scan
Index Scans are efficient for selective queries. I verify the index matches my WHERE clause order.
Hash Join vs Merge Join
- Hash Join: builds hash table from smaller input
- Merge Join: needs sorted inputs, efficient for large datasets
- I watch for “Hash Cond” and memory usage
Sort Operations
I check for “Sort Method: external merge Disk: XkB” - this indicates disk spill from insufficient work_mem. The solution: increase work_mem or add index to avoid sorting.
Reading the Numbers
Cost (estimated)
The cost=X..Y notation shows startup cost to total cost. This is NOT actual time - just the planner’s estimate. I compare cost ratio across nodes, not absolute values.
Rows (estimated vs actual)
I compare rows=X (estimated) vs actual rows=Y. A large gap (>20%) indicates outdated statistics. The solution: run ANALYZE tablename;
Actual Time
The actual time=X..Y ms shows real execution duration in milliseconds. I focus on total time at the root node. Per-node time reveals which step dominates.
Loops
The loops=N shows how many times a node executed. I multiply actual time by loops for true cost. Nested loops with high loop count is a performance killer.
Common Misinterpretations
Mistake: Cost = Execution Time
Cost is the planner’s unitless estimate. Actual time is milliseconds measured. I always use ANALYZE to see real performance.
Mistake: Low Cost = Good Plan
The planner might pick a low-cost plan with bad estimates. I compare “rows” (estimated) vs “actual rows”. Bad statistics mislead the planner.
Mistake: Ignore BUFFERS
The shared_blks_read shows disk reads. High reads mean cache misses and I/O pressure. BUFFERS reveals hidden I/O costs.
Problem Query Analysis
EXPLAIN (ANALYZE, BUFFERS)SELECT o.order_id, c.name, SUM(oi.quantity * oi.unit_price) AS totalFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN order_items oi ON o.order_id = oi.order_idWHERE o.order_date BETWEEN '2023-01-01' AND '2023-12-31' AND c.country = 'US'GROUP BY o.order_id, c.nameHAVING SUM(oi.quantity * oi.unit_price) > 1000ORDER BY total DESCLIMIT 100;Key findings from the plan:
Sort Method: external merge Disk: 10240kB -- Memory overflow!Seq Scan on orders: Rows Removed by Filter: 987654 -- 91.8% wastedSeq Scan on customers: Rows Removed by Filter: 345678 -- 79.8% wastedI saw two problems immediately: memory overflowed to disk, and both tables had massive rows removed by filter.
Solutions identified from plan:
-- Add indexes on filter columnsCREATE INDEX idx_orders_date ON orders (order_date);CREATE INDEX idx_customers_country ON customers (country);
-- Increase work_mem for sort operationsSET work_mem = '256MB';After applying these fixes, the query ran in 280ms instead of 3.5s.
Summary
In this post, I showed how to read PostgreSQL EXPLAIN output by focusing on actual time (not cost), rows removed by filter (wasted reads), disk spills (memory issues), and loops (nested execution frequency). The key point is to use ANALYZE and BUFFERS to see real performance, not planner estimates.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments