Context Quality vs Quantity: How Much Should You Index for AI Agents?

Mar 17, 2026

Problem

My AI agent’s retrieval was broken. I’d ask it about a decision we made last week, and it would surface irrelevant old threads from months ago. When I asked about a project status, it would return memes and random chatter instead of the actual discussion.

Here’s what my indexing setup looked like:

index:
  slack:
    channels: "*"
  telegram:
    groups: "*"
  discord:
    servers: "*"
  gmail:
    labels: "*"

I indexed everything because I thought “more data means smarter AI.” The result was the opposite. My agent was drowning in noise.

Query: "What did we decide about the API redesign?"
Results:
  1. [2025-08-15] #random: "Anyone want lunch?"
  2. [2025-10-22] #fun: "Check out this meme!"
  3. [2025-12-05] #general: "Office closed Friday"
  4. [2026-01-10] #engineering: "API redesign meeting notes"  <-- BURIED

The relevant result was there, but it was buried under months of irrelevant content.

What happened?

I found a Reddit discussion where someone described the exact same problem. They had bulk-indexed a large communication archive and reported:

“Retrieval started surfacing irrelevant old threads and the agent’s reasoning degraded—too much noise crowding out the signal.”

The issue was clear: I was feeding my AI garbage and expecting gold. When 90% of indexed content is noise, retrieval returns 90% noise.

Comprehensive Indexing:
+------------------------------------------+
|  ########################################  | 90% noise
|  #                                      #  |
|  #   Signal (10%)                       #  |
|  #   - Project decisions                #  |
|  #   - Technical discussions            #  |
|  #                                      #  |
|  #   Noise (90%)                        #  |
|  #   - Memes, random chat               #  |
|  #   - Bot messages, forwards           #  |
|  #   - Old archived projects            #  |
|  #                                      #  |
+------------------------------------------+

Selective Indexing:
+------------------------------------------+
|                    ########################| 30% noise
|                    #                      #
|   Signal (70%)    #    Noise (30%)       #
|   - Project work  #    - Some off-topic  #
|   - Decisions     #    - Occasional chat #
|   - Tech talk     #                      #
|                    ########################
+------------------------------------------+

How to solve it?

I switched to selective indexing. Instead of “index everything,” I started with “index what matters.”

Step 1: Identify high-signal sources

I listed what actually helped me make decisions:

High Signal (Index These):
- #engineering: Project discussions, bug reports, PR reviews
- #product: Feature requests, roadmap discussions
- #project-alpha: Team coordination, action items
- Linear comments: Design decisions, blocked items
- Work emails: Client communications, contracts

Low Signal (Skip These):
- #random, #fun, #memes: Social chatter
- #announcements: Bot-posted messages
- Gaming Discord servers: Not work-related
- Promotional emails: Newsletters, marketing

Step 2: Create whitelist config

I rewrote my config with explicit whitelists:

index:
  slack:
    channels:
      - "#engineering"
      - "#product"
      - "#project-alpha"
      - "#team-standup"
    exclude_patterns:
      - "*-random"
      - "*-social"
      - "*-test"

  telegram:
    groups:
      - "Work Team"
      - "Project Coordination"
    exclude:
      - "Family Group"
      - "News Channel"

  discord:
    servers:
      - "Work Server"
    channels:
      - "general"
      - "development"
    exclude:
      - "memes"
      - "off-topic"
      - "voice-chat"

  gmail:
    labels:
      - "work"
      - "clients"
    exclude:
      - "promotions"
      - "social"
      - "updates"

Step 3: Add content-based filtering

Whitelists weren’t enough. Some channels had mixed content. I added a content filter:

def should_index(message: dict) -> bool:
    """Determine if message is agent-relevant."""

    # Skip automated messages
    if message.get('bot_id'):
        return False

    # Skip short/low-content messages
    content = message.get('content', '')
    if len(content) < 20:
        return False

    # Skip certain message types
    if message.get('subtype') in ['channel_join', 'channel_leave', 'file_shared']:
        return False

    # Check for action indicators
    action_keywords = ['todo', 'action item', 'deadline', 'decided', 'agreed', 'blocked']
    if any(kw in content.lower() for kw in action_keywords):
        return True

    # Check for technical discussion
    tech_patterns = ['bug', 'feature', 'deploy', 'PR', 'merge', 'api', 'fix', 'issue']
    if any(p in content.lower() for p in tech_patterns):
        return True

    # Check if it's a question or answer
    if '?' in content or content.startswith(('yes', 'no', 'done', 'completed')):
        return True

    return False  # Default to skip

Step 4: Add relevance scoring

Even with filtering, I needed to rank results. I built a scoring system:

from datetime import datetime
from collections import defaultdict

HIGH_SIGNAL_CHANNELS = ['#engineering', '#product', '#project-alpha']

def get_context(query: str, max_items: int = 20):
    """Retrieve context with quality scoring."""

    results = search_all_channels(query)

    # Score each result
    scored_results = []
    for result in results:
        score = 0

        # Recency bonus (newer = more relevant)
        age_days = (datetime.now() - result.timestamp).days
        if age_days < 7:
            score += 10
        elif age_days < 30:
            score += 5
        elif age_days < 90:
            score += 2

        # Channel quality weight
        if result.channel in HIGH_SIGNAL_CHANNELS:
            score += 5

        # Keyword density
        score += count_relevant_keywords(result.content, query)

        # Thread depth (deeper = more discussion)
        score += min(result.thread_depth, 5)

        scored_results.append((result, score))

    # Return top-scored items
    return sorted(scored_results, key=lambda x: x[1], reverse=True)[:max_items]

After implementing this, my retrieval results improved dramatically:

Query: "What did we decide about the API redesign?"
Results:
  1. [2026-03-15] #engineering: "API redesign: decided on REST over GraphQL" (score: 25)
  2. [2026-03-14] #product: "API redesign requirements discussion" (score: 22)
  3. [2026-03-10] #project-alpha: "API versioning strategy agreed" (score: 18)
  4. [2026-03-08] #engineering: "API performance benchmarks review" (score: 15)

The reason

Selective indexing works because it aligns with how context windows function.

Context window is not storage

When I load context into an AI agent, I’m not “storing” it—I’m giving the model information to reason with. Every token in the context competes for attention. More irrelevant tokens means less attention on what matters.

With Noisy Context:
+--------------------------------------------------+
|  ###############                                 |
|  # Irrelevant  #   Signal gets drowned out      |
|  # content     #   by noise. Model can't         |
|  # consuming   #   distinguish what matters.     |
|  # most        #                                  |
|  # attention   #   Result: Degraded reasoning    |
|  ###############                                 |
+--------------------------------------------------+

With Clean Context:
+--------------------------------------------------+
|              #######################             |
|              # Signal content      #             |
|              # gets full attention #             |
|              # Model reasons well  #             |
|              #######################             |
|                                                  |
|              Result: Better responses            |
+--------------------------------------------------+

RAG recall vs precision trade-off

In traditional search, you balance recall (finding everything) vs precision (finding only relevant results). For AI agents, I learned to optimize for precision:

Comprehensive indexing:
- High recall: Finds everything
- Low precision: Too much noise
- Bad for AI: Degrades reasoning

Selective indexing:
- Lower recall: Might miss some edge cases
- High precision: Most results relevant
- Good for AI: Improves reasoning

The feedback loop matters

I added tracking to measure retrieval quality:

from collections import defaultdict
from datetime import datetime

class ContextQualityTracker:
    """Track and improve context quality over time."""

    def __init__(self):
        self.feedback = []  # (query, results, user_accepted)

    def log_result(self, query: str, results: list, accepted: bool):
        self.feedback.append({
            'query': query,
            'results': results,
            'accepted': accepted,
            'timestamp': datetime.now()
        })

    def get_low_signal_sources(self):
        """Identify sources that produce low-quality results."""
        source_scores = defaultdict(lambda: {'hits': 0, 'accepts': 0})
        for entry in self.feedback:
            for result in entry['results']:
                source = result['platform'] + ':' + result['channel']
                source_scores[source]['hits'] += 1
                if entry['accepted']:
                    source_scores[source]['accepts'] += 1

        # Return sources with <50% acceptance
        return [
            source for source, stats in source_scores.items()
            if stats['accepts'] / stats['hits'] < 0.5
        ]

This helped me continuously refine my source list.

Metrics to watch

After implementing selective indexing, I tracked these metrics:

Metric	Target	Warning Sign
Query result relevance	>80% useful	AI returns irrelevant threads
Context window usage	<50% filled	AI ignores most context
User correction rate	<20% queries	User constantly re-asking
Retrieval speed	<500ms	Slow queries degrade UX

Common mistakes

I made several mistakes along the way:

Mistake 1: Indexing everything because “storage is cheap”

Storage is cheap, but AI attention isn’t. Every irrelevant token in context degrades reasoning.

Mistake 2: No feedback loop

Without tracking, I couldn’t tell if my filters were working. I needed explicit user acceptance tracking.

Mistake 3: One-size-fits-all approach

Different use cases need different contexts. Project planning context is not the same as code review context.

Mistake 4: Forgetting about old but relevant content

Some old messages are still relevant (decisions, architecture choices). I added manual “pin” tags for these.

Summary

In this post, I explained why selective indexing beats comprehensive coverage for AI agent memory. The key point is that context quality matters more than context quantity—a smaller, high-quality context beats a larger, noisy one.

The practical steps:

Identify high-signal sources (work channels, decisions, technical discussions)
Create explicit whitelists instead of “index everything”
Add content-based filtering for mixed channels
Implement relevance scoring with recency and source weighting
Track retrieval quality with user acceptance feedback

When I switched from indexing everything to indexing selectively, my AI agent went from confused to helpful. The signal finally broke through the noise.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!