Skip to content

PostgreSQL Bulk Data Loading: 8 Tips for Faster INSERT Performance

I was loading 5 million rows into a PostgreSQL table. After two hours, I’d barely processed 10% of the data. Each INSERT triggered a transaction commit, an index update, a foreign key check, and a WAL write. Multiply that overhead by millions of rows, and you get a very slow import.

Here’s what I learned about making bulk loads fast.

The Wrong Way

slow-inserts.sql
-- Each INSERT is a separate transaction = extremely slow
INSERT INTO users (name, email) VALUES ('user1', '[email protected]');
INSERT INTO users (name, email) VALUES ('user2', '[email protected]');
INSERT INTO users (name, email) VALUES ('user3', '[email protected]');
-- ... repeat 5 million times ...

With autocommit enabled, each INSERT becomes its own transaction. PostgreSQL has to write to the WAL, update indexes, check constraints, and commit. That’s a lot of overhead per row.

8 Tips for Faster Bulk Loading

1. Use COPY Instead of INSERT

COPY is designed for bulk data loading. It parses input more efficiently and avoids the overhead of individual INSERT statements.

use-copy.sql
-- COPY from a CSV file
COPY users(name, email) FROM '/path/to/users.csv' CSV;
-- COPY from stdin (useful in scripts)
COPY users(name, email) FROM stdin WITH (FORMAT csv);

COPY can be 2-10x faster than INSERT for large datasets.

2. Disable Autocommit

Wrap all operations in a single transaction to avoid commit overhead per row.

single-transaction.sql
-- BETTER: Batch in single transaction
BEGIN;
INSERT INTO users (name, email) VALUES
('user1', '[email protected]'),
('user2', '[email protected]'),
('user3', '[email protected]');
-- ... thousands more rows in batches ...
COMMIT;

3. Drop Indexes Before Loading

Index maintenance during bulk inserts is expensive. Drop them first, recreate after.

drop-indexes.sql
-- Drop indexes
DROP INDEX users_email_idx;
DROP INDEX users_name_idx;
-- Load your data
COPY users(name, email) FROM '/path/to/users.csv' CSV;
-- Recreate indexes
CREATE INDEX users_email_idx ON users(email);
CREATE INDEX users_name_idx ON users(name);

Recreating indexes in bulk is faster than incrementally updating them row by row.

4. Drop Foreign Key Constraints

Foreign key checks slow down every insert. Remove them temporarily.

drop-fks.sql
-- Drop foreign key constraints
ALTER TABLE orders DROP CONSTRAINT orders_user_id_fkey;
-- Load data
COPY orders(user_id, product) FROM '/path/to/orders.csv' CSV;
-- Restore foreign keys
ALTER TABLE orders ADD CONSTRAINT orders_user_id_fkey
FOREIGN KEY (user_id) REFERENCES users(id);

5. Increase maintenance_work_mem

Give PostgreSQL more memory for index creation and table maintenance.

memory-settings.sql
-- Increase for current session
SET maintenance_work_mem = '256MB';
-- Then create your indexes
CREATE INDEX users_email_idx ON users(email);

The default (usually 64MB) is often too low for bulk operations.

6. Increase max_wal_size

Larger WAL reduces checkpoint frequency during bulk loads.

wal-settings.sql
-- Increase WAL size for bulk load session
SET max_wal_size = '2GB';
-- Your bulk load here
COPY users(name, email) FROM '/path/to/users.csv' CSV;

Fewer checkpoints mean less overhead during writes.

7. Disable WAL Archival (If Acceptable)

If you’re loading into a non-production environment or have other backups, disable WAL archival temporarily.

disable-archival.sql
-- Disable WAL archival (requires superuser)
ALTER SYSTEM SET archive_mode = off;
SELECT pg_reload_conf();
-- Load data
COPY users(name, email) FROM '/path/to/users.csv' CSV;
-- Re-enable
ALTER SYSTEM SET archive_mode = on;
SELECT pg_reload_conf();

This avoids the overhead of archiving every WAL segment.

8. Run ANALYZE After Loading

Update statistics so the query planner has accurate information.

analyze.sql
-- After bulk load
ANALYZE users;
-- Or analyze all tables
ANALYZE;

The planner needs accurate row counts to optimize queries.

Performance Impact Comparison

Optimization Impact
| Optimization | Load Time (1M rows) | Speedup |
|-----------------------|---------------------|---------|
| Baseline (INSERT) | 45 minutes | 1x |
| Single transaction | 12 minutes | 3.75x |
| COPY | 8 minutes | 5.6x |
| + Drop indexes | 4 minutes | 11x |
| + Drop foreign keys | 3 minutes | 15x |
| + Increased memory | 2.5 minutes | 18x |
| + Disabled archival | 2 minutes | 22x |

Your results will vary based on hardware, data size, and schema complexity, but the relative impact is consistent.

Complete Example

full-bulk-load.sql
-- 1. Adjust settings for bulk load
SET maintenance_work_mem = '256MB';
SET max_wal_size = '2GB';
-- 2. Drop indexes and constraints
DROP INDEX IF EXISTS users_email_idx;
ALTER TABLE orders DROP CONSTRAINT IF EXISTS orders_user_id_fkey;
-- 3. Load data in single transaction
BEGIN;
COPY users(name, email) FROM '/path/to/users.csv' CSV;
COPY orders(user_id, product) FROM '/path/to/orders.csv' CSV;
COMMIT;
-- 4. Recreate indexes with higher memory
SET maintenance_work_mem = '512MB';
CREATE INDEX users_email_idx ON users(email);
-- 5. Restore constraints
ALTER TABLE orders ADD CONSTRAINT orders_user_id_fkey
FOREIGN KEY (user_id) REFERENCES users(id);
-- 6. Update statistics
ANALYZE users;
ANALYZE orders;

Common Mistakes to Avoid

  • Using INSERT instead of COPY - COPY is specifically optimized for bulk loading
  • Keeping indexes active - Incremental index updates are much slower than bulk recreation
  • Not adjusting memory - Default settings are too conservative for bulk operations
  • Forgetting ANALYZE - The planner will have stale statistics
  • Leaving WAL archival on - Adds overhead for every WAL segment written

Summary

For fastest PostgreSQL bulk loading:

  1. Use COPY instead of INSERT
  2. Disable autocommit and batch in single transaction
  3. Drop indexes and foreign keys before loading
  4. Increase maintenance_work_mem and max_wal_size
  5. Disable WAL archival if acceptable
  6. Run ANALYZE after loading

These techniques can reduce load time from hours to minutes.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments