PostgreSQL Bulk Data Loading: 8 Tips for Faster INSERT Performance
I was loading 5 million rows into a PostgreSQL table. After two hours, I’d barely processed 10% of the data. Each INSERT triggered a transaction commit, an index update, a foreign key check, and a WAL write. Multiply that overhead by millions of rows, and you get a very slow import.
Here’s what I learned about making bulk loads fast.
The Wrong Way
-- Each INSERT is a separate transaction = extremely slow-- ... repeat 5 million times ...With autocommit enabled, each INSERT becomes its own transaction. PostgreSQL has to write to the WAL, update indexes, check constraints, and commit. That’s a lot of overhead per row.
8 Tips for Faster Bulk Loading
1. Use COPY Instead of INSERT
COPY is designed for bulk data loading. It parses input more efficiently and avoids the overhead of individual INSERT statements.
-- COPY from a CSV fileCOPY users(name, email) FROM '/path/to/users.csv' CSV;
-- COPY from stdin (useful in scripts)COPY users(name, email) FROM stdin WITH (FORMAT csv);COPY can be 2-10x faster than INSERT for large datasets.
2. Disable Autocommit
Wrap all operations in a single transaction to avoid commit overhead per row.
-- BETTER: Batch in single transactionBEGIN;INSERT INTO users (name, email) VALUES-- ... thousands more rows in batches ...COMMIT;3. Drop Indexes Before Loading
Index maintenance during bulk inserts is expensive. Drop them first, recreate after.
-- Drop indexesDROP INDEX users_email_idx;DROP INDEX users_name_idx;
-- Load your dataCOPY users(name, email) FROM '/path/to/users.csv' CSV;
-- Recreate indexesCREATE INDEX users_email_idx ON users(email);CREATE INDEX users_name_idx ON users(name);Recreating indexes in bulk is faster than incrementally updating them row by row.
4. Drop Foreign Key Constraints
Foreign key checks slow down every insert. Remove them temporarily.
-- Drop foreign key constraintsALTER TABLE orders DROP CONSTRAINT orders_user_id_fkey;
-- Load dataCOPY orders(user_id, product) FROM '/path/to/orders.csv' CSV;
-- Restore foreign keysALTER TABLE orders ADD CONSTRAINT orders_user_id_fkey FOREIGN KEY (user_id) REFERENCES users(id);5. Increase maintenance_work_mem
Give PostgreSQL more memory for index creation and table maintenance.
-- Increase for current sessionSET maintenance_work_mem = '256MB';
-- Then create your indexesCREATE INDEX users_email_idx ON users(email);The default (usually 64MB) is often too low for bulk operations.
6. Increase max_wal_size
Larger WAL reduces checkpoint frequency during bulk loads.
-- Increase WAL size for bulk load sessionSET max_wal_size = '2GB';
-- Your bulk load hereCOPY users(name, email) FROM '/path/to/users.csv' CSV;Fewer checkpoints mean less overhead during writes.
7. Disable WAL Archival (If Acceptable)
If you’re loading into a non-production environment or have other backups, disable WAL archival temporarily.
-- Disable WAL archival (requires superuser)ALTER SYSTEM SET archive_mode = off;SELECT pg_reload_conf();
-- Load dataCOPY users(name, email) FROM '/path/to/users.csv' CSV;
-- Re-enableALTER SYSTEM SET archive_mode = on;SELECT pg_reload_conf();This avoids the overhead of archiving every WAL segment.
8. Run ANALYZE After Loading
Update statistics so the query planner has accurate information.
-- After bulk loadANALYZE users;
-- Or analyze all tablesANALYZE;The planner needs accurate row counts to optimize queries.
Performance Impact Comparison
| Optimization | Load Time (1M rows) | Speedup ||-----------------------|---------------------|---------|| Baseline (INSERT) | 45 minutes | 1x || Single transaction | 12 minutes | 3.75x || COPY | 8 minutes | 5.6x || + Drop indexes | 4 minutes | 11x || + Drop foreign keys | 3 minutes | 15x || + Increased memory | 2.5 minutes | 18x || + Disabled archival | 2 minutes | 22x |Your results will vary based on hardware, data size, and schema complexity, but the relative impact is consistent.
Complete Example
-- 1. Adjust settings for bulk loadSET maintenance_work_mem = '256MB';SET max_wal_size = '2GB';
-- 2. Drop indexes and constraintsDROP INDEX IF EXISTS users_email_idx;ALTER TABLE orders DROP CONSTRAINT IF EXISTS orders_user_id_fkey;
-- 3. Load data in single transactionBEGIN;COPY users(name, email) FROM '/path/to/users.csv' CSV;COPY orders(user_id, product) FROM '/path/to/orders.csv' CSV;COMMIT;
-- 4. Recreate indexes with higher memorySET maintenance_work_mem = '512MB';CREATE INDEX users_email_idx ON users(email);
-- 5. Restore constraintsALTER TABLE orders ADD CONSTRAINT orders_user_id_fkey FOREIGN KEY (user_id) REFERENCES users(id);
-- 6. Update statisticsANALYZE users;ANALYZE orders;Common Mistakes to Avoid
- Using INSERT instead of COPY - COPY is specifically optimized for bulk loading
- Keeping indexes active - Incremental index updates are much slower than bulk recreation
- Not adjusting memory - Default settings are too conservative for bulk operations
- Forgetting ANALYZE - The planner will have stale statistics
- Leaving WAL archival on - Adds overhead for every WAL segment written
Summary
For fastest PostgreSQL bulk loading:
- Use COPY instead of INSERT
- Disable autocommit and batch in single transaction
- Drop indexes and foreign keys before loading
- Increase maintenance_work_mem and max_wal_size
- Disable WAL archival if acceptable
- Run ANALYZE after loading
These techniques can reduce load time from hours to minutes.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments