Skip to content

When Should I Use JSONB in PostgreSQL Instead of Traditional Columns?

I threw everything into a JSONB field because it was “flexible.” Six months later, my queries were crawling, storage had bloated by 40%, and I couldn’t figure out why my “simple” filter took 15 seconds.

The Problem

PostgreSQL’s JSONB type is seductive. No schema migrations, no ALTER TABLE statements, just dump your JSON and move on. But this convenience comes with costs that hit hard at scale:

  • Query performance: JSONB queries require parsing the binary format
  • Storage bloat: Key names stored in every row
  • No constraints: Type safety gone, errors caught only at runtime
  • Tooling friction: ORMs struggle, IDEs can’t autocomplete

The core question: How do I decide what goes in columns versus what goes in JSONB?

My Trial-and-Error Journey

Attempt 1: Everything in JSONB (Bad Idea)

-- My "flexible" schema
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
data JSONB NOT NULL -- customer_id, total, status, items all here
);

This felt elegant. No schema constraints! I could add fields anytime!

Reality check:

-- Find pending orders
SELECT * FROM orders WHERE data->>'status' = 'pending';
-- Execution time: 3.2 seconds on 50K rows
-- Sum totals by customer
SELECT data->>'customer_id', SUM((data->>'total')::decimal)
FROM orders GROUP BY data->>'customer_id';
-- Execution time: 12 seconds

The queries were slow, and worse: I had no foreign key enforcement on customer_id. Invalid customer IDs silently entered my database.

Attempt 2: GIN Indexes (Partial Fix)

CREATE INDEX idx_orders_data ON orders USING GIN (data);
-- Now the query:
SELECT * FROM orders WHERE data @> '{"status": "pending"}';
-- Execution time: 0.8 seconds

Better, but still slower than a column index would be. And I still couldn’t enforce constraints.

Attempt 3: Hybrid Approach (The Right Answer)

-- Core searchable data in columns
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers(id),
status VARCHAR(20) NOT NULL DEFAULT 'pending',
total DECIMAL(10, 2) NOT NULL,
items JSONB NOT NULL, -- Flexible line items here
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Now the queries:
SELECT * FROM orders WHERE status = 'pending';
-- Execution time: 0.02 seconds (100x faster)
SELECT customer_id, SUM(total) FROM orders GROUP BY customer_id;
-- Execution time: 0.1 seconds (120x faster)

The Decision Framework

Here’s a mental model I now use:

┌─────────────────────┐
│ Does this data │
│ need to be │
│ SEARCHED? │
└─────────────────────┘
┌───────────────┴───────────────┐
│ │
YES NO
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────────┐
│ Use COLUMN │ │ Is schema truly │
│ with INDEX │ │ flexible per row? │
└─────────────────┘ └─────────────────────┘
┌───────────────┴───────────────┐
│ │
YES NO
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Use JSONB │ │ Still use │
│ (with GIN │ │ COLUMN │
│ if queried) │ │ (consistency) │
└─────────────────┘ └─────────────────┘

Use Traditional Columns When:

  1. Data needs searching/filtering/sorting - WHERE, ORDER BY, GROUP BY
  2. Data participates in JOINs - Foreign keys to other tables
  3. Data integrity required - NOT NULL, CHECK constraints, foreign keys
  4. Same fields in many rows - Avoid key name repetition bloat
  5. Analytics/reporting - Aggregations, statistical queries

Use JSONB When:

  1. Schema truly flexible - Fields vary per row (user preferences, surveys)
  2. Rarely queried as whole objects - Store and retrieve, rarely filter
  3. Caching external data - API response caches
  4. Configuration/settings - Application configs, feature flags
  5. Prototyping - Schema not finalized yet

The Storage Bloat Problem

I didn’t realize this until I hit 1 million rows:

Column approach:
┌────────────────────────────────┐
│ id | email | created_at │ Schema metadata: stored ONCE
│ 1 | bob@email... | 2024-01-01 │
│ 2 | alice@em... | 2024-01-02 │
└────────────────────────────────┘
JSONB approach:
┌─────────────────────────────────────────────────────────────┐
│ id | data │
│ 1 | {"email": "bob@email...", "created_at": "2024-01-01"} │ Key names repeated
│ 2 | {"email": "alice@em...", "created_at": "2024-01-02"} │ in EVERY row!
└─────────────────────────────────────────────────────────────┘

With 1M rows storing {"user_preference_theme": "dark"}:

  • Column: ~4MB (just values)
  • JSONB: ~26MB (values + repeated 23-char key name)

That’s 22MB wasted on key name repetition alone.

Real-World Examples

Good JSONB Use: User Preferences

CREATE TABLE user_settings (
user_id INTEGER PRIMARY KEY REFERENCES users(id),
preferences JSONB DEFAULT '{"theme": "light", "notifications": true}'::jsonb
);
-- Perfect: flexible, rarely queried, user-defined
UPDATE user_settings
SET preferences = jsonb_set(preferences, '{theme}', '"dark"')
WHERE user_id = 42;

Good JSONB Use: API Cache

CREATE TABLE api_cache (
cache_key VARCHAR(255) PRIMARY KEY,
response JSONB NOT NULL,
expires_at TIMESTAMP NOT NULL
);
-- Perfect: no schema needed, just cache and retrieve
INSERT INTO api_cache VALUES ('github:user:123', '{"login": "octocat", ...}', NOW() + INTERVAL '1 hour');

Bad JSONB Use: Core Business Data

-- DON'T DO THIS
CREATE TABLE products (
id SERIAL PRIMARY KEY,
data JSONB -- contains: sku, name, price, category_id
);
-- Why bad:
-- 1. sku needs unique constraint -> impossible in JSONB
-- 2. category_id needs foreign key -> impossible in JSONB
-- 3. price needs numeric type -> JSONB stores as text
-- 4. Query by category -> full JSONB scan
-- DO THIS instead:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
sku VARCHAR(50) UNIQUE NOT NULL,
name VARCHAR(255) NOT NULL,
price DECIMAL(10, 2) NOT NULL,
category_id INTEGER REFERENCES categories(id),
attributes JSONB DEFAULT '{}'::jsonb -- flexible extras here
);

Scale Considerations

From real-world experience:

ScaleJSONB PerformanceRecommendation
< 100K rowsAcceptableJSONB OK for flexibility
100K - 1M rowsNoticeable slowdownExtract hot paths to columns
> 1M rowsSignificant impactMinimize JSONB, strategic use only

At 500K+ rows, even GIN-indexed JSONB queries become noticeably slower than column queries. Plan your schema before you hit that scale.

The Golden Rule

Use JSONB for flexibility, columns for queryability.

More specifically:

Core business data → Columns (with indexes and constraints)
User-defined fields → JSONB (preferences, custom attributes)
API response caches → JSONB (whole object storage)
Configuration → JSONB (settings, feature flags)
Prototyping → JSONB (temporary, refactor later)

When a JSONB field becomes a “hot path” in your queries (queried frequently), extract it to a column. This isn’t premature optimization - it’s schema evolution.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments