When Should I Use JSONB in PostgreSQL Instead of Traditional Columns?
I threw everything into a JSONB field because it was “flexible.” Six months later, my queries were crawling, storage had bloated by 40%, and I couldn’t figure out why my “simple” filter took 15 seconds.
The Problem
PostgreSQL’s JSONB type is seductive. No schema migrations, no ALTER TABLE statements, just dump your JSON and move on. But this convenience comes with costs that hit hard at scale:
- Query performance: JSONB queries require parsing the binary format
- Storage bloat: Key names stored in every row
- No constraints: Type safety gone, errors caught only at runtime
- Tooling friction: ORMs struggle, IDEs can’t autocomplete
The core question: How do I decide what goes in columns versus what goes in JSONB?
My Trial-and-Error Journey
Attempt 1: Everything in JSONB (Bad Idea)
-- My "flexible" schemaCREATE TABLE orders ( id SERIAL PRIMARY KEY, data JSONB NOT NULL -- customer_id, total, status, items all here);This felt elegant. No schema constraints! I could add fields anytime!
Reality check:
-- Find pending ordersSELECT * FROM orders WHERE data->>'status' = 'pending';-- Execution time: 3.2 seconds on 50K rows
-- Sum totals by customerSELECT data->>'customer_id', SUM((data->>'total')::decimal)FROM orders GROUP BY data->>'customer_id';-- Execution time: 12 secondsThe queries were slow, and worse: I had no foreign key enforcement on customer_id. Invalid customer IDs silently entered my database.
Attempt 2: GIN Indexes (Partial Fix)
CREATE INDEX idx_orders_data ON orders USING GIN (data);
-- Now the query:SELECT * FROM orders WHERE data @> '{"status": "pending"}';-- Execution time: 0.8 secondsBetter, but still slower than a column index would be. And I still couldn’t enforce constraints.
Attempt 3: Hybrid Approach (The Right Answer)
-- Core searchable data in columnsCREATE TABLE orders ( id SERIAL PRIMARY KEY, customer_id INTEGER NOT NULL REFERENCES customers(id), status VARCHAR(20) NOT NULL DEFAULT 'pending', total DECIMAL(10, 2) NOT NULL, items JSONB NOT NULL, -- Flexible line items here created_at TIMESTAMP DEFAULT NOW());
CREATE INDEX idx_orders_status ON orders(status);CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Now the queries:SELECT * FROM orders WHERE status = 'pending';-- Execution time: 0.02 seconds (100x faster)
SELECT customer_id, SUM(total) FROM orders GROUP BY customer_id;-- Execution time: 0.1 seconds (120x faster)The Decision Framework
Here’s a mental model I now use:
┌─────────────────────┐ │ Does this data │ │ need to be │ │ SEARCHED? │ └─────────────────────┘ │ ┌───────────────┴───────────────┐ │ │ YES NO │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────────┐ │ Use COLUMN │ │ Is schema truly │ │ with INDEX │ │ flexible per row? │ └─────────────────┘ └─────────────────────┘ │ ┌───────────────┴───────────────┐ │ │ YES NO │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Use JSONB │ │ Still use │ │ (with GIN │ │ COLUMN │ │ if queried) │ │ (consistency) │ └─────────────────┘ └─────────────────┘Use Traditional Columns When:
- Data needs searching/filtering/sorting - WHERE, ORDER BY, GROUP BY
- Data participates in JOINs - Foreign keys to other tables
- Data integrity required - NOT NULL, CHECK constraints, foreign keys
- Same fields in many rows - Avoid key name repetition bloat
- Analytics/reporting - Aggregations, statistical queries
Use JSONB When:
- Schema truly flexible - Fields vary per row (user preferences, surveys)
- Rarely queried as whole objects - Store and retrieve, rarely filter
- Caching external data - API response caches
- Configuration/settings - Application configs, feature flags
- Prototyping - Schema not finalized yet
The Storage Bloat Problem
I didn’t realize this until I hit 1 million rows:
Column approach:┌────────────────────────────────┐│ id | email | created_at │ Schema metadata: stored ONCE│ 1 | bob@email... | 2024-01-01 ││ 2 | alice@em... | 2024-01-02 │└────────────────────────────────┘
JSONB approach:┌─────────────────────────────────────────────────────────────┐│ id | data ││ 1 | {"email": "bob@email...", "created_at": "2024-01-01"} │ Key names repeated│ 2 | {"email": "alice@em...", "created_at": "2024-01-02"} │ in EVERY row!└─────────────────────────────────────────────────────────────┘With 1M rows storing {"user_preference_theme": "dark"}:
- Column: ~4MB (just values)
- JSONB: ~26MB (values + repeated 23-char key name)
That’s 22MB wasted on key name repetition alone.
Real-World Examples
Good JSONB Use: User Preferences
CREATE TABLE user_settings ( user_id INTEGER PRIMARY KEY REFERENCES users(id), preferences JSONB DEFAULT '{"theme": "light", "notifications": true}'::jsonb);
-- Perfect: flexible, rarely queried, user-definedUPDATE user_settingsSET preferences = jsonb_set(preferences, '{theme}', '"dark"')WHERE user_id = 42;Good JSONB Use: API Cache
CREATE TABLE api_cache ( cache_key VARCHAR(255) PRIMARY KEY, response JSONB NOT NULL, expires_at TIMESTAMP NOT NULL);
-- Perfect: no schema needed, just cache and retrieveINSERT INTO api_cache VALUES ('github:user:123', '{"login": "octocat", ...}', NOW() + INTERVAL '1 hour');Bad JSONB Use: Core Business Data
-- DON'T DO THISCREATE TABLE products ( id SERIAL PRIMARY KEY, data JSONB -- contains: sku, name, price, category_id);
-- Why bad:-- 1. sku needs unique constraint -> impossible in JSONB-- 2. category_id needs foreign key -> impossible in JSONB-- 3. price needs numeric type -> JSONB stores as text-- 4. Query by category -> full JSONB scan
-- DO THIS instead:CREATE TABLE products ( id SERIAL PRIMARY KEY, sku VARCHAR(50) UNIQUE NOT NULL, name VARCHAR(255) NOT NULL, price DECIMAL(10, 2) NOT NULL, category_id INTEGER REFERENCES categories(id), attributes JSONB DEFAULT '{}'::jsonb -- flexible extras here);Scale Considerations
From real-world experience:
| Scale | JSONB Performance | Recommendation |
|---|---|---|
| < 100K rows | Acceptable | JSONB OK for flexibility |
| 100K - 1M rows | Noticeable slowdown | Extract hot paths to columns |
| > 1M rows | Significant impact | Minimize JSONB, strategic use only |
At 500K+ rows, even GIN-indexed JSONB queries become noticeably slower than column queries. Plan your schema before you hit that scale.
The Golden Rule
Use JSONB for flexibility, columns for queryability.
More specifically:
Core business data → Columns (with indexes and constraints)User-defined fields → JSONB (preferences, custom attributes)API response caches → JSONB (whole object storage)Configuration → JSONB (settings, feature flags)Prototyping → JSONB (temporary, refactor later)When a JSONB field becomes a “hot path” in your queries (queried frequently), extract it to a column. This isn’t premature optimization - it’s schema evolution.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments