Skip to content

How to Process Large Email Batches with AI: Context Window Limits Explained

The Problem

I had 15,000 emails in my inbox and wanted AI to help clean them up. My first attempt failed spectacularly - I fed everything to Claude and hit the context window limit. The AI simply couldn’t process that much data at once.

LLMs have finite context windows (200K tokens for Claude). When you’re processing thousands of emails, you need a batching strategy. Here’s what I learned about making it work.

Why Batching Matters

Each email takes roughly 500-2000 tokens depending on length. With Claude’s 200K token limit:

Safe batch calculation:
- Context window: 200,000 tokens
- Average email: 1,500 tokens
- Room for prompt/response: 50,000 tokens
- Safe batch size: ~100 emails

Try to process more, and you’ll get truncated results or errors.

Batch Processing Strategies

Strategy 1: Linear Batch Processing O(n)

The simplest approach - process everything in fixed-size batches.

linear_batch.py
def process_inbox_linear(emails, batch_size=100):
results = []
for i in range(0, len(emails), batch_size):
batch = emails[i:i + batch_size]
# Send to AI for classification
batch_results = classify_batch(batch)
results.extend(batch_results)
return results

ASCII diagram of the flow:

+--------+ +--------+ +--------+ +--------+
| Batch 1 | -> | Batch 2 | -> | Batch 3 | -> | Batch N |
| 100 msgs| | 100 msgs| | 100 msgs| | 100 msgs|
+--------+ +--------+ +--------+ +--------+
| | | |
v v v v
[Classify] [Classify] [Classify] [Classify]
| | | |
+--------------+--------------+--------------+
|
v
[Aggregated Results]

Pros: Simple, predictable Cons: Processes everything, including irrelevant emails Best for: Full inbox analysis, initial classification

Strategy 2: Filter-First Processing O(n) + filter

Reduce the dataset before batching. Gmail API lets you filter by sender, date, label, etc.

filter_first.py
def process_with_filters(service, filter_criteria):
# Step 1: Use Gmail API to reduce dataset
query = build_gmail_query(filter_criteria)
filtered_ids = service.users().messages().list(
userId='me',
q=query # e.g., "from:newsletter@* older_than:1y"
).execute()
# Step 2: Process only matching emails
emails = fetch_emails_batch(service, filtered_ids['messages'])
return process_in_batches(emails)

Flow diagram:

[Gmail Query Filter]
|
v
+------------------+
| Reduced Dataset | (e.g., newsletters > 1 year old)
| ~500 emails |
+------------------+
|
v
[Batch Processing: 5 batches of 100]
|
v
[Action: Archive/Delete]

Pros: Reduces total processing Cons: Requires knowing filter criteria upfront Best for: Targeted cleanup (e.g., “old newsletters”)

Strategy 3: Sender-Based Aggregation O(n) + group

Group by sender first, then batch-delete by category.

sender_group.py
def process_by_sender(emails):
# Step 1: Group emails by sender (one pass)
senders = {}
for email in emails:
sender = extract_sender(email)
senders.setdefault(sender, []).append(email)
# Step 2: Classify senders in batches
sender_categories = classify_senders(list(senders.keys()))
# Step 3: Apply bulk actions by category
for category, sender_list in sender_categories.items():
if category == 'spam':
delete_emails_from_senders(sender_list)

ASCII flow:

[All Emails]
|
v
[Extract Unique Senders] <-- One pass, O(n)
|
v
+-------------------+
| Senders to Claude | <-- Batch classify ~100 senders
+-------------------+
|
v
[Categories: spam, newsletters, promotions, important]
|
v
[Bulk Delete by Category]

Pros: Efficient for spam/marketing cleanup Cons: May miss legitimate emails from spammy senders Best for: Bulk sender-based cleanup

Strategy 4: Hierarchical Processing

Two-phase approach: quick metadata scan, then detailed content processing.

hierarchical.py
def hierarchical_process(emails):
# Phase 1: Quick metadata scan (more emails per batch)
metadata_results = []
for batch in chunk(emails, 200): # Larger batches for metadata
results = analyze_metadata_only(batch)
metadata_results.extend(results)
# Phase 2: Full content for flagged emails only
flagged = [r for r in metadata_results if r.needs_review]
detailed_results = []
for batch in chunk(flagged, 50): # Smaller batches for full content
results = analyze_full_content(batch)
detailed_results.extend(results)
return detailed_results

Flow visualization:

Phase 1: Metadata Scan (Fast)
+------------------+ +------------------+
| Batch 1: 200 | | Batch 2: 200 |
| [sender,subject, | | [sender,subject, |
| date,label] | | date,label] |
+------------------+ +------------------+
| |
v v
[Flag: needs_review?] [Flag: needs_review?]
| |
+-------+---------------+
|
v
[Flagged: ~50 emails]
Phase 2: Full Content Analysis (Deep)
+------------------+
| Full Content |
| of 50 flagged |
+------------------+
|
v
[Final Decision]

Pros: Minimizes expensive full-content processing Cons: Two-phase approach, more complex Best for: Large inboxes with mixed content types

Choosing Your Strategy

Here’s a quick decision guide:

+------------------+
| How many emails? |
+--------+---------+
|
+--------------+--------------+
| |
< 1,000 > 1,000
| |
v v
+------------------+ +------------------+
| Linear Batch | | Do you know what |
| Strategy 1 | | you're looking |
+------------------+ | for? |
+--------+---------+
|
+--------------+--------------+
| |
YES NO
| |
v v
+------------------+ +------------------+
| Filter-First | | Sender-Based |
| Strategy 2 | | Strategy 3 |
+------------------+ +------------------+

Common Mistakes to Avoid

Batch size too large: You’ll hit context limits and get truncated results.

# WRONG: Will overflow context
batch = emails[:500] # Too many!
result = claude.classify(batch)
# CORRECT: Stay within limits
batch = emails[:100] # Safe
result = claude.classify(batch)

No aggregation between batches: Each batch starts fresh, duplicating analysis.

# WRONG: No context between batches
for batch in chunks(emails, 100):
classify(batch) # AI has no memory of previous batches
# CORRECT: Aggregate state
all_results = []
for batch in chunks(emails, 100):
results = classify(batch, previous_context=all_results)
all_results.extend(results)

Processing full content when metadata suffices: Wastes tokens on irrelevant emails.

# WRONG: Expensive for large inboxes
full_content = fetch_all_email_bodies(emails)
# CORRECT: Filter first
metadata = fetch_metadata_only(emails)
candidates = filter_by_metadata(metadata)
full_content = fetch_bodies(candidates) # Much smaller

Working with Gmail API

Gmail API has its own batching limits. You can group up to 100 API calls per batch request:

gmail_batch.py
from googleapiclient.http import BatchHttpRequest
def batch_delete_emails(service, message_ids):
batch = BatchHttpRequest()
for msg_id in message_ids:
batch.add(
service.users().messages().delete(
userId='me',
id=msg_id
)
)
batch.execute()
# Match API batch size to AI batch size
for batch_ids in chunk(message_ids, 100):
batch_delete_emails(service, batch_ids)

Summary

Batch processing with AI requires three things:

  1. Know your limits - Context window size determines batch size
  2. Choose the right strategy - Linear, filter-first, sender-based, or hierarchical
  3. Aggregate state - Maintain context between batches

The Big O thinking from computer science applies here: O(n) linear scans for complete analysis, O(log n) for targeted searches, and strategic batching to fit within AI constraints.

Start small - test with 100 emails first. Once your batching logic works, scale up. The context window isn’t going anywhere, so design your system to respect it from day one.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments