How World Monitor Aggregates 435+ News Feeds with AI-Powered Analysis

Mar 17, 2026

The Problem with Manual News Monitoring

I used to spend hours each morning opening dozens of tabs: BBC for world news, Al Jazeera for Middle East coverage, Reuters for wire reports, Defense One for military analysis, and on and on. By the time I’d scanned everything, half my morning was gone and I still felt like I was missing important stories.

The real problem wasn’t the time—it was the signal-to-noise ratio. Every source had its own biases, its own editorial priorities, and its own blind spots. I needed a way to aggregate multiple sources, deduplicate related stories, and focus on what actually mattered.

World Monitor solves this by aggregating 435+ curated RSS feeds across 15 categories, clustering similar stories together, and applying AI-powered analysis to surface what’s important. Here’s how the news aggregation system works.

The Feed Architecture

The first decision I made was to curate sources rather than crawl everything. Quality over quantity. The feeds are organized into 15 categories:

World / Geopolitical    → BBC, Reuters, AP, Guardian, NPR, Politico
Middle East / MENA      → Al Jazeera, BBC ME, Guardian ME, Al Arabiya
Africa                  → BBC Africa, News24, Google News aggregation
Latin America           → BBC Latin America, Guardian Americas
Asia-Pacific            → BBC Asia, South China Morning Post
Energy & Resources      → Google News (oil/gas, nuclear, mining)
Technology              → Hacker News, Ars Technica, The Verge, MIT Tech Review
AI / ML                 → ArXiv, VentureBeat AI, MIT Tech Review
Finance                 → CNBC, MarketWatch, Financial Times, Yahoo Finance
Government              → White House, State Dept, Pentagon, Treasury, Fed
Intel Feed              → Defense One, Breaking Defense, Bellingcat
Think Tanks              → Foreign Policy, Atlantic Council, CSIS, RAND
Crisis Watch            → International Crisis Group, IAEA, WHO, UNHCR
Regional Sources        → Xinhua, TASS, Kyiv Independent, Moscow Times

Each category serves a specific intelligence purpose. The Intel Feed category, for example, pulls from defense-focused sources that wouldn’t appear in mainstream news. The Think Tanks category captures analysis that takes days or weeks to produce, not hours.

Source Filtering: The Feature I Didn’t Know I Needed

After using the system for a few weeks, I realized I was ignoring certain sources. Some were too sensational. Others had paywalls. A few just didn’t align with my focus areas.

I added a source filtering system that lets me toggle individual sources on or off:

┌─────────────────────────────────────────────────────────────┐
│  SOURCES                                    [Select All]   │
│                                             [Select None]  │
├─────────────────────────────────────────────────────────────┤
│  Search: [________________________]                         │
├─────────────────────────────────────────────────────────────┤
│  [x] BBC News                                                │
│  [x] Reuters                                                 │
│  [ ] Daily Mail                    ← disabled               │
│  [x] Al Jazeera                                             │
│  [x] Defense One                                            │
│  ...                                                        │
├─────────────────────────────────────────────────────────────┤
│  45/77 sources enabled                                       │
└─────────────────────────────────────────────────────────────┘

The key implementation detail: disabled sources are filtered at fetch time, not display time. This means I’m not wasting bandwidth or processing power on sources I don’t want. Settings persist to localStorage, so my preferences survive page refreshes.

When a panel has all its sources disabled, it shows a message instead of an empty state. This prevents confusion about whether the system is broken or just filtered.

The Clustering Algorithm: Deduplication That Actually Works

The biggest problem with aggregating 435+ feeds is duplication. The same story appears in BBC, Reuters, AP, and Guardian within hours of each other. I needed a way to group related articles without creating false positives.

I implemented Jaccard similarity clustering with a 0.6 threshold:

Raw Headlines from 435+ Feeds
            │
            ▼
┌─────────────────────────────────────────────────────────────┐
│  Preprocessing                                               │
│  - Lowercase normalization                                   │
│  - Remove common stop words                                  │
│  - Tokenize into word sets                                   │
└─────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────┐
│  Jaccard Similarity Calculation                              │
│  J(A,B) = |A ∩ B| / |A ∪ B|                                 │
│  Threshold: 0.6 (tuned for news headlines)                  │
└─────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────┐
│  Cluster Formation                                           │
│  - Group headlines with similarity > 0.6                     │
│  - Select canonical headline (earliest or most complete)    │
│  - Store cluster metadata for UI display                     │
└─────────────────────────────────────────────────────────────┘

The 0.6 threshold was the result of trial and error. Lower values created false clusters (unrelated stories grouped together). Higher values missed legitimate duplicates (the same story with slightly different headlines).

The clustering runs in analysis.worker.ts as a Web Worker, keeping the main thread responsive. Cross-domain correlation detection identifies when the same story appears across different categories—for example, a tech story about AI that also has financial implications.

Why Clustering Matters for AI Processing

Here’s something I didn’t anticipate: clustering dramatically reduces AI costs. Before clustering, I was sending hundreds of duplicate headlines to the LLM for summarization. After clustering, the prompt size dropped by 20-40%.

Before Clustering:
  500 headlines × ~50 tokens = 25,000 tokens per summary

After Clustering:
  150 unique clusters × ~50 tokens = 7,500 tokens per summary

Savings: 70% reduction in prompt tokens

This isn’t just about cost—it’s about quality. When the LLM sees the same story five times, it tends to over-weight that story in the summary. Clustering ensures each story gets equal consideration.

Custom Monitors: Personalized Keyword Alerts

The source filtering handles which sources I trust. But I also needed a way to track specific topics across all sources. I implemented custom monitors:

Monitor: "nvidia, gpu, chip shortage"
    │
    ├── Assigned unique color (auto-generated)
    │
    ├── Scans all incoming headlines for matches
    │
    ├── Highlights matching articles in Monitor panel
    │
    └── Matching articles in clusters inherit monitor color

I have monitors set up for:

Specific companies I’m tracking
Geographic regions I’m focused on
Technical topics I’m researching
People I’m following

The monitors persist via localStorage, so they survive page refreshes. Each monitor gets a unique color, making it easy to spot relevant articles at a glance.

Live News Streams: Television in the Browser

Sometimes I want background news while I’m working. I embedded YouTube live streams with channel switching:

Bloomberg    → Business & financial news
Sky News     → UK & international
Euronews     → European perspective
DW News      → German international
France 24    → French global news
Al Arabiya   → Middle East (Arabic)
Al Jazeera   → Middle East & international

The implementation uses the YouTube IFrame Player API rather than raw iframes. This gives me programmatic control over playback:

// Persistent player - no reload on mute/play/channel change
player.mute();
player.setVolume(50);
player.loadVideoById(channelVideoId);

// Idle detection - pause when tab hidden or 5 min idle
document.addEventListener('visibilitychange', () => {
  if (document.hidden) {
    player.pauseVideo();
  }
});

The player persists across channel changes, avoiding the jarring reload that comes with raw iframe embeds. When the tab is hidden or I’ve been idle for 5 minutes, playback pauses automatically.

Activity Tracking: What’s New vs. What I’ve Seen

With hundreds of headlines flowing through the system, I needed a way to track what I’d already viewed. I implemented a three-tier activity system:

┌─────────────────────────────────────────────────────────────┐
│  NEW Badge          │ 2 minutes  │ Bright badge on new items│
├─────────────────────────────────────────────────────────────┤
│  Glow Highlight     │ 30 seconds │ Animation draws attention │
├─────────────────────────────────────────────────────────────┤
│  Panel Badge        │ Until viewed │ Count in collapsed panels│
└─────────────────────────────────────────────────────────────┘

The “seen” detection uses IntersectionObserver:

const observer = new IntersectionObserver((entries) => {
  entries.forEach(entry => {
    if (entry.intersectionRatio > 0.5) {
      // Item is >50% visible
      const visibleTime = Date.now();
      setTimeout(() => {
        if (stillVisible && visibleTime > 500) {
          markAsSeen(entry.target);
        }
      }, 500);
    }
  });
}, { threshold: 0.5 });

An item needs to be more than 50% visible for more than 500ms to be marked as seen. This prevents accidental marks from scrolling past quickly. Each panel maintains independent activity state.

Regional Intelligence Panels

The 15 feed categories map to regional intelligence panels:

┌─────────────────────────────────────────────────────────────┐
│  Middle East    │ MENA region                               │
│                 │ Israel-Gaza, Iran, Gulf states, Red Sea   │
├─────────────────────────────────────────────────────────────┤
│  Africa         │ Sub-Saharan focus                         │
│                 │ Sahel instability, coups, insurgencies   │
├─────────────────────────────────────────────────────────────┤
│  Latin America  │ Central & South America                   │
│                 │ Venezuela, drug trafficking               │
├─────────────────────────────────────────────────────────────┤
│  Asia-Pacific   │ East & Southeast Asia                     │
│                 │ China-Taiwan, Korean peninsula            │
├─────────────────────────────────────────────────────────────┤
│  Energy         │ Global energy markets                     │
│                 │ Oil markets, nuclear, mining              │
└─────────────────────────────────────────────────────────────┘

Each panel shows headlines from its assigned sources, with clustering applied within the panel. I can expand a panel to see more detail or collapse it to just see the count badge.

Data Export: Taking Intelligence Offline

Sometimes I need to analyze the data outside the browser. The export feature generates CSV and JSON snapshots:

CSV Export:
  - Headline, source, timestamp, category, threat level
  - Importable into Excel, Google Sheets, Python pandas

JSON Export:
  - Full article metadata
  - Cluster relationships
  - AI-generated summaries
  - Entity extractions

There’s also a historical playback feature that loads snapshots from the past 7 days. The system automatically cleans up snapshots older than 7 days to manage storage.

What I Learned Building This

After running World Monitor for several months, a few insights stand out:

Curation beats crawling. 435 well-chosen sources provide better signal than 10,000 random feeds. I spent more time curating sources than I expected, but the quality improvement is worth it.
Clustering is essential for AI. The 20-40% token reduction from deduplication isn’t just cost savings—it improves summary quality by preventing over-weighting of duplicate stories.
Source filtering needs to be granular. The ability to disable individual sources at fetch time, not display time, matters for both bandwidth and processing efficiency.
Activity tracking reduces anxiety. Knowing what I’ve seen versus what’s new eliminates the “did I already read this?” mental overhead.
Regional panels enable focus. I can monitor the Middle East closely while keeping an eye on other regions. The panel structure matches how I actually think about global events.

In This Post

In this post, I showed how World Monitor aggregates 435+ RSS feeds into actionable intelligence. The key components are: curated source selection across 15 categories, Jaccard similarity clustering for deduplication, granular source filtering at fetch time, custom keyword monitors for personalized tracking, and activity tracking to manage information overload. The clustering algorithm reduces AI token usage by 20-40% while improving summary quality.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 World Monitor
👨‍💻 Jaccard Similarity
👨‍💻 RSS Specification

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!