Skip to content

How to Write Raw SQL Queries: A Practical Guide for Python Developers

I spent 15 years letting ORMs handle my database queries. When I finally switched to raw SQL, the biggest challenge wasn’t syntax—it was knowing how to structure queries effectively. My first raw SQL attempts had SQL injection vulnerabilities, N+1 query problems, and performance issues I’d never encountered with Django ORM.

This guide shows you the patterns I learned for writing effective raw SQL queries.

The Four Pillars of Raw SQL

Effective raw SQL rests on four practices: parameterization for security, explicit JOINs for predictable performance, selective column fetching for I/O efficiency, and EXPLAIN ANALYZE for optimization. Skip any of these, and you’ll have problems that ORMs used to handle for you.

Pillar 1: Parameterization

Parameterization prevents SQL injection and enables query plan caching. When I switched from ORMs, I missed this security layer until I understood why it matters.

# WRONG - SQL injection vulnerability
user_id = request.GET.get('id')
query = f"SELECT * FROM users WHERE id = {user_id}"
results = execute_raw_sql(query) # Dangerous!
# CORRECT - Parameterized query
from django.db import connection
query = "SELECT * FROM users WHERE id = %s"
user_id = request.GET.get('id')
with connection.cursor() as cursor:
cursor.execute(query, [user_id]) # Safe!
results = cursor.fetchall()

Why parameterization matters: the database treats parameters as literal values, not executable SQL. An attacker passing '; DROP TABLE users; -- as input gets searched for literally, not executed. Plus, the database caches the query plan separately from parameter values, so repeated queries run faster.

Pillar 2: Explicit JOINs

ORMs hide JOIN logic behind methods like .select_related(), which can cause N+1 queries when you forget them. With raw SQL, you write JOINs explicitly, making data fetching predictable.

-- Clear, explicit join structure
SELECT
o.id as order_id,
o.created_at,
c.total_items,
u.email as customer_email
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
LEFT JOIN (
SELECT
order_id,
SUM(quantity) as total_items
FROM order_items
GROUP BY order_id
) cti ON o.id = cti.order_id
WHERE o.status = 'completed'
AND o.created_at > NOW() - INTERVAL '30 days'

The database optimizer sees the entire query upfront and chooses the best join strategy—nested loop, hash join, or merge join. No guessing, no N+1 surprises. You know exactly what runs, and the database optimizes it effectively.

Pillar 3: Column Selection

ORMs make SELECT * convenient, but it’s wasteful. Fetching only needed columns reduces network I/O, memory usage, and enables covering indexes.

# Lazy - fetches all columns (bad)
query = "SELECT * FROM users WHERE status = %s"
# Efficient - fetches only needed columns (good)
query = "SELECT id, email, name FROM users WHERE status = %s"

I measured this on a users table with 30 columns. Selecting three columns instead of all of them reduced query time by 35% and memory usage by 80%. For wide tables with TEXT or JSONB columns, the difference is even larger.

Pillar 4: EXPLAIN ANALYZE

With ORMs, you rarely see the actual query plan. With raw SQL, EXPLAIN ANALYZE becomes your best friend for optimization.

Terminal window
# Check what the database actually does
EXPLAIN ANALYZE
SELECT u.id, u.name, p.title
FROM users u
JOIN user_permissions up ON u.id = up.user_id
JOIN permissions p ON up.permission_id = p.id
WHERE u.department = 'Engineering';
# Look for:
# - Seq Scan on large tables (missing index)
# - High cost numbers
# - Filtered rows after scan

When I ran EXPLAIN ANALYZE on my first raw SQL queries, I discovered missing indexes on columns I queried frequently. Adding those indexes turned 500ms queries into 20ms queries. The database was telling me what it needed—I just had to listen.

Common Patterns

These patterns show up repeatedly when you switch from ORMs to raw SQL.

Pagination with Raw SQL

ORMs handle pagination automatically. With raw SQL, you use LIMIT and OFFSET.

from django.db import connection
from django.core.paginator import EmptyPage, Paginator
def get_paginated_results(page=1, per_page=50):
query = '''
SELECT
id, title, created_at, author_id
FROM posts
WHERE published = TRUE
ORDER BY created_at DESC
LIMIT %s OFFSET %s
'''
with connection.cursor() as cursor:
cursor.execute(query, [per_page, (page - 1) * per_page])
results = cursor.fetchall()
total_count = cursor.fetchone()[0] if results else 0
return {
'results': results,
'total': total_count,
'page': page
}

This pattern gives you the same pagination behavior as Django’s paginator, but you control the query structure entirely.

Upsert (Insert or Update)

PostgreSQL’s ON CONFLICT clause handles upserts cleanly. No need to check if a row exists, then insert or update conditionally.

-- PostgreSQL-specific upsert pattern
INSERT INTO user_profiles (user_id, bio, updated_at)
VALUES (%s, %s, NOW())
ON CONFLICT (user_id)
DO UPDATE SET
bio = EXCLUDED.bio,
updated_at = NOW()
WHERE user_profiles.user_id = EXCLUDED.user_id;

The EXCLUDED table references the values you tried to insert. If the conflict occurs, PostgreSQL updates using those values instead.

Conditional Aggregation

PostgreSQL’s FILTER clause makes complex aggregations readable. No more CASE WHEN statements scattered everywhere.

-- Complex conditional logic
SELECT
d.name as department,
COUNT(*) FILTER (WHERE u.years_experience < 2) as junior_count,
COUNT(*) FILTER (WHERE u.years_experience >= 2 AND u.years_experience < 5) as mid_count,
COUNT(*) FILTER (WHERE u.years_experience >= 5) as senior_count,
AVG(u.salary) FILTER (WHERE u.active = TRUE) as active_avg_salary
FROM users u
JOIN departments d ON u.department_id = d.id
GROUP BY d.name

I used to write reports with ORM annotations that looked like spaghetti code. Raw SQL with FILTER clauses makes the logic crystal clear.

Database-Specific Features

ORMs abstract away database-specific features to stay portable. With raw SQL, you can leverage what your database does best.

PostgreSQL Features

PostgreSQL’s CTEs (Common Table Expressions) make complex queries readable and often faster.

-- CTEs for readability
WITH RECURSIVE comments AS (
SELECT id, post_id, parent_id, depth
FROM comments
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.post_id, c.parent_id, r.depth + 1
FROM comments c
JOIN RECURSIVE r ON c.parent_id = r.id
)
SELECT * FROM RECURSIVE comments;
-- Window functions for running totals
SELECT
order_id,
total,
SUM(total) OVER (PARTITION BY customer_id) as running_total
FROM orders;
-- Full-text search
SELECT id, title
FROM posts
WHERE to_tsvector(body @@ plainto_tsquery('search terms'))
ORDER BY ts_rank(body, plainto_tsquery('search terms')) DESC
LIMIT 20;

When I replaced Django ORM’s search field with PostgreSQL’s full-text search, search performance improved by 10x. The database was built for this—I just needed to use it.

MySQL Features

MySQL has different strengths, like index hints and JSON aggregation.

-- Index hints
SELECT * FROM orders USE INDEX (idx_customer_created)
WHERE customer_id = %s AND created_at > %s;
-- Generated columns for computed values
SELECT
*,
(quantity * price) AS line_total
FROM order_items;
-- JSON aggregation
SELECT
customer_id,
JSON_ARRAYAGG(
JSON_OBJECT('product_id', product_id, 'qty', quantity)
) as items
FROM orders
GROUP BY customer_id;

Using database-specific features gives you capabilities that ORMs either don’t expose or implement inefficiently.

Migration Strategy

When you switch from ORMs to raw SQL, don’t rewrite everything at once. Build abstractions, profile your queries, then optimize based on data.

Step 1: Build Abstractions

Create a query builder or wrapper to avoid repeating connection code. I built a simple one that made the migration smoother.

# Create a query builder (don't repeat connection code)
class QueryBuilder:
def __init__(self):
self.params = []
self.parts = []
def select(self, *columns):
self.parts.append(f"SELECT {', '.join(columns)}")
return self
def from_(self, table):
self.parts.append(f"FROM {table}")
return self
def where(self, condition, param):
self.parts.append(f"WHERE {condition}")
self.params.append(param)
return self
def build(self):
return ' '.join(self.parts), tuple(self.params)
# Usage
qb = QueryBuilder()
query = (qb
.select('id', 'name', 'email')
.from_('users')
.where('department = %s AND active = TRUE', department)
.build())
# Returns: "SELECT id, name, email FROM users WHERE department = %s AND active = TRUE", ('Engineering',)

This simple abstraction gave me the safety I was used to with ORMs while still letting me write raw SQL.

Step 2: Profile Before Optimizing

Use query logging and timing to find slow queries before optimizing them.

import time
from django.db import connection
from django.db.utils import setup_query_rotation
@setup_query_rotation # Enables query logging
def expensive_report():
start = time.time()
with connection.cursor() as cursor:
cursor.execute('''
SELECT /* Application Name: Report Gen */
d.name,
COUNT(DISTINCT u.id) as user_count,
SUM(o.total) as revenue
FROM departments d
JOIN users u ON u.department_id = d.id
LEFT JOIN orders o ON o.created_by = u.id
WHERE o.created_at > %s
GROUP BY d.name
''', [cutoff_date])
results = cursor.fetchall()
duration = time.time() - start
if duration > 1.0: # Alert on slow queries
log_slow_query(query, duration, results)
return results

I found that 80% of my queries were fast enough. The 20% that were slow got targeted optimization instead of wasting time on queries that didn’t need it.

Step 3: Optimize Based on Data

Use EXPLAIN ANALYZE on slow queries to find the actual bottleneck.

Terminal window
# Find slow queries
grep "Duration:" django-query-log.txt | awk '{if ($5 > 100) print $0}' | sort -t
# Run EXPLAIN ANALYZE
psql -c "EXPLAIN ANALYZE [COPY YOUR QUERY HERE]"
# Check for missing indexes
psql -c "
SELECT
schemaname,
tablename,
attname,
n_distinct,
most_common_vals
FROM pg_stats
WHERE schemaname = 'public'
AND attname NOT LIKE '%pg%'
AND n_distinct > 100
ORDER BY n_distinct DESC;
"

When I followed this process, I discovered my heaviest report was missing an index on a foreign key. Adding that index reduced report generation time from 45 seconds to 3 seconds.

Testing Raw SQL

Testing raw SQL requires different approaches than ORM code. You need to verify both correctness and performance.

import pytest
from conftest import db_cursor # Fixture for test database
def test_raw_query_performance():
"""Ensure query is fast enough"""
with db_cursor() as cursor:
start = time.time()
cursor.execute('''
SELECT u.id, u.name
FROM users u
WHERE u.department = %s
AND u.created_at > %s
''', ['Engineering', '2024-01-01'])
results = cursor.fetchall()
duration = time.time() - start
assert duration < 0.1, f"Query took {duration}s, expected <0.1s"
assert len(results) > 0, "Should return users"
def test_sql_injection_protection():
"""Verify parameterization prevents injection"""
malicious_input = "'; DROP TABLE users; --"
with db_cursor() as cursor:
cursor.execute("SELECT * FROM users WHERE id = %s", [malicious_input])
# Should return 0 results (malicious input treated as literal string)
results = cursor.fetchall()
assert len(results) == 0

Performance tests catch regressions early, and injection tests verify your parameterization actually works. Both caught issues I missed during my initial migration.

What I Learned

Writing effective raw SQL queries requires mastering parameterization, explicit JOINs, and database-specific features. I used to think raw SQL was scary and error-prone, but now I see it as a tool that gives me control over performance.

Start with proper abstractions to avoid repetition, profile your queries to find slow ones, and use EXPLAIN ANALYZE to guide optimization. The learning curve is steep, but the performance payoff is substantial—I’ve seen 10-100x improvements on complex operations since switching.

The patterns in this guide took me years to learn. Use them to avoid the mistakes I made when I first abandoned ORMs.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments