Skip to content

How to Build an AI Agent That Monitors Multiple Social Platforms Automatically

The Problem

I spent two hours every morning checking Reddit, Twitter, YouTube, and Hacker News for trending topics in my niche. That time could have been spent coding.

I tried building separate scrapers for each platform. The code was messy. Each scraper had its own authentication logic, rate limiting rules, and output format. When I needed to add Polymarket, I had to write another entirely new module.

Then I found last30days-skill on GitHub. It gained 2,685 stars in one day. The reason? One agent handles all platforms with a unified approach.

This post shows how to build your own multi-platform monitoring agent. The key point is using a unified agent pattern instead of fragmented scrapers.

My Failed Approach

I started with separate scrapers:

failed_approach.py
# Reddit scraper
def scrape_reddit():
# Custom Reddit logic
pass
# Twitter scraper
def scrape_twitter():
# Custom Twitter logic
pass
# YouTube scraper
def scrape_youtube():
# Custom YouTube logic
pass
# Hacker News scraper
def scrape_hn():
# Custom HN logic
pass
# Call them separately
reddit_data = scrape_reddit()
twitter_data = scrape_twitter()
youtube_data = scrape_youtube()
hn_data = scrape_hn()

This approach had three problems:

  1. Each scraper needed different authentication
  2. Rate limits were different for each platform
  3. Output formats were inconsistent

I was maintaining four different codebases for the same goal.

The Unified Agent Pattern

I switched to a unified approach. One agent handles all platforms through a standardized interface.

The pattern works like this:

agent-architecture.txt
+------------------+
| Unified Agent |
+------------------+
|
+--------------+--------------+
| | |
+-------+------+ +----+-----+ +----+-----+
| PlatformTool | | PlatformTool | | PlatformTool |
+-------+------+ +----+-----+ +----+-----+
| | |
+-------+------+ +----+-----+ +----+-----+
| Reddit API | | Twitter API | | YouTube API |
+--------------+ +-------------+ +-------------+

Each platform becomes a “tool” that the agent can call. The agent decides when to use each tool based on what you ask it to do.

Build the Base Agent

I started with a LangGraph agent structure:

agent_base.py
from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Dict
import asyncio
class AgentState(TypedDict):
platforms: List[str]
results: Dict[str, List[dict]]
summary: str
def create_agent():
workflow = StateGraph(AgentState)
# Define nodes
workflow.add_node("fetch_reddit", fetch_reddit_node)
workflow.add_node("fetch_twitter", fetch_twitter_node)
workflow.add_node("fetch_youtube", fetch_youtube_node)
workflow.add_node("fetch_hn", fetch_hn_node)
workflow.add_node("summarize", summarize_node)
# Define edges - all fetches happen in parallel
workflow.add_edge("fetch_reddit", "summarize")
workflow.add_edge("fetch_twitter", "summarize")
workflow.add_edge("fetch_youtube", "summarize")
workflow.add_edge("fetch_hn", "summarize")
workflow.add_edge("summarize", END)
# Set entry point
workflow.set_entry_point("fetch_reddit")
return workflow.compile()

The key is parallel execution. All platforms fetch simultaneously, then merge results for summarization.

Create Platform Tools

Each platform needs a standardized tool interface:

platform_tools.py
from abc import ABC, abstractmethod
from typing import List, Dict
class PlatformTool(ABC):
"""Base class for all platform tools"""
@abstractmethod
async def fetch_trending(self, limit: int = 10) -> List[Dict]:
"""Fetch trending content from platform"""
pass
@abstractmethod
def de_watermark(self, content: str) -> str:
"""Remove watermarks and clean content"""
pass
@property
@abstractmethod
def platform_name(self) -> str:
"""Return platform name"""
pass
class RedditTool(PlatformTool):
platform_name = "reddit"
async def fetch_trending(self, limit: int = 10) -> List[Dict]:
import aiohttp
url = "https://www.reddit.com/r/all/hot.json"
headers = {"User-Agent": "MultiPlatformAgent/1.0"}
async with aiohttp.ClientSession() as session:
async with session.get(url, headers=headers) as resp:
data = await resp.json()
posts = data["data"]["children"][:limit]
return [
{
"title": post["data"]["title"],
"url": post["data"]["url"],
"score": post["data"]["score"],
"subreddit": post["data"]["subreddit"],
"source": self.platform_name
}
for post in posts
]
def de_watermark(self, content: str) -> str:
# Reddit posts don't have watermarks
return content.strip()
class HackerNewsTool(PlatformTool):
platform_name = "hacker_news"
async def fetch_trending(self, limit: int = 10) -> List[Dict]:
import aiohttp
async with aiohttp.ClientSession() as session:
# Get top stories IDs
async with session.get(
"https://hacker-news.firebaseio.com/v0/topstories.json"
) as resp:
ids = await resp.json()
# Fetch top N stories
stories = []
for id in ids[:limit]:
async with session.get(
f"https://hacker-news.firebaseio.com/v0/item/{id}.json"
) as resp:
story = await resp.json()
stories.append({
"title": story["title"],
"url": story.get("url", ""),
"score": story["score"],
"source": self.platform_name
})
return stories
def de_watermark(self, content: str) -> str:
return content.strip()

The abstract base class forces consistency. Each tool implements fetch_trending and de_watermark the same way.

Add YouTube Integration

YouTube was tricky because it needs API credentials:

youtube_tool.py
class YouTubeTool(PlatformTool):
platform_name = "youtube"
async def fetch_trending(self, limit: int = 10) -> List[Dict]:
from googleapiclient.discovery import build
import os
api_key = os.environ.get("YOUTUBE_API_KEY")
if not api_key:
raise ValueError("YOUTUBE_API_KEY not set")
youtube = build("youtube", "v3", developerKey=api_key)
request = youtube.videos().list(
part="snippet,statistics",
chart="mostPopular",
maxResults=limit,
regionCode="US"
)
response = request.execute()
return [
{
"title": item["snippet"]["title"],
"url": f"https://youtube.com/watch?v={item['id']}",
"score": int(item["statistics"].get("viewCount", 0)),
"source": self.platform_name
}
for item in response["items"]
]
def de_watermark(self, content: str) -> str:
# Remove common YouTube title patterns
import re
content = re.sub(r'\s*\[HD\]\s*', '', content)
content = re.sub(r'\s*\[Official\]\s*', '', content)
return content.strip()

I made the mistake of hardcoding the API key initially. The agent failed when I tried to deploy it. Environment variables fixed this.

The Summarization Node

After fetching, the agent summarizes everything:

summarize_node.py
from anthropic import Anthropic
def summarize_node(state: AgentState) -> AgentState:
client = Anthropic()
# Combine all results
all_content = []
for platform, items in state["results"].items():
for item in items:
all_content.append(f"[{platform}] {item['title']} (score: {item['score']})")
prompt = f"""Summarize these trending topics across platforms.
Group similar topics together. Highlight the most important ones.
Content:
{chr(10).join(all_content)}
Provide a concise summary with top 5 trending topics."""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
return {**state, "summary": response.content[0].text}

The LLM does the heavy lifting. It groups topics across platforms automatically.

Run the Complete Agent

I put everything together:

complete_agent.py
import asyncio
from typing import Dict, List
async def run_multi_platform_agent():
# Initialize tools
tools = {
"reddit": RedditTool(),
"hacker_news": HackerNewsTool(),
"youtube": YouTubeTool()
}
# Fetch from all platforms simultaneously
results: Dict[str, List[dict]] = {}
async def fetch_platform(name: str, tool: PlatformTool):
try:
data = await tool.fetch_trending(limit=10)
results[name] = [
{**item, "cleaned": tool.de_watermark(item["title"])}
for item in data
]
except Exception as e:
results[name] = []
print(f"Error fetching {name}: {e}")
# Run all fetches in parallel
await asyncio.gather(*[
fetch_platform(name, tool)
for name, tool in tools.items()
])
# Summarize
state = {"results": results, "summary": ""}
final_state = summarize_node(state)
return final_state["summary"]
if __name__ == "__main__":
summary = asyncio.run(run_multi_platform_agent())
print(summary)

Running this gives me a unified summary from all platforms in about 10 seconds.

I added Polymarket to track prediction market trends:

polymarket_tool.py
class PolymarketTool(PlatformTool):
platform_name = "polymarket"
async def fetch_trending(self, limit: int = 10) -> List[Dict]:
import aiohttp
url = "https://clob.polymarket.com/markets"
params = {"limit": limit, "closed": False}
async with aiohttp.ClientSession() as session:
async with session.get(url, params=params) as resp:
markets = await resp.json()
return [
{
"title": market["question"],
"url": f"https://polymarket.com/event/{market['slug']}",
"score": float(market.get("volume", 0)),
"source": self.platform_name
}
for market in markets[:limit]
]
def de_watermark(self, content: str) -> str:
return content.strip()

Prediction markets reveal what topics people care about before they hit mainstream news.

The De-Watermarking Problem

Some platforms add watermarks to content titles. YouTube videos often have “[HD]” or “[Official]” appended. Twitter posts sometimes include attribution text.

I built a cleaning function:

cleaning.py
import re
def clean_title(title: str, platform: str) -> str:
patterns = {
"youtube": [
r'\s*\[HD\]\s*',
r'\s*\[Official\]\s*',
r'\s*\[Official Video\]\s*',
r'\s*-\s*YouTube\s*$'
],
"twitter": [
r'\s*via\s+@\w+\s*$',
r'\s*—\s+\w+\s*$'
],
"reddit": [], # Usually clean
"hacker_news": [], # Usually clean
"polymarket": [] # Usually clean
}
cleaned = title
for pattern in patterns.get(platform, []):
cleaned = re.sub(pattern, '', cleaned, flags=re.IGNORECASE)
return cleaned.strip()

This makes the summary cleaner and more readable.

Deployment Issues I Encountered

When I deployed the agent, I hit three errors:

Error 1: Rate Limits

Reddit blocked my requests after a few runs. I added caching:

caching.py
import time
from functools import lru_cache
@lru_cache(maxsize=100)
def cached_reddit_fetch(timestamp: int):
# Cache for 5 minutes
return asyncio.run(RedditTool().fetch_trending())
def get_reddit_with_cache():
cache_key = int(time.time() // 300) # 5 minute buckets
return cached_reddit_fetch(cache_key)

Error 2: Missing API Keys

YouTube failed without the API key. I added validation:

validation.py
def validate_environment():
required_keys = ["YOUTUBE_API_KEY", "ANTHROPIC_API_KEY"]
missing = [k for k in required_keys if not os.environ.get(k)]
if missing:
raise ValueError(f"Missing environment variables: {missing}")

Error 3: Network Failures

Some platforms occasionally timed out. I added retries:

retries.py
import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
async def fetch_with_retry(tool: PlatformTool):
return await tool.fetch_trending()

These fixes made the agent reliable enough for daily use.

Schedule Automatic Runs

I set up a cron job to run the agent every morning:

crontab
# Run at 6 AM daily
0 6 * * * cd /path/to/agent && python complete_agent.py >> /var/log/agent.log 2>&1

Or use a Python scheduler:

scheduler.py
import schedule
import time
def daily_monitor():
summary = asyncio.run(run_multi_platform_agent())
# Send to Slack or save to file
save_summary(summary)
schedule.every().day.at("06:00").do(daily_monitor)
while True:
schedule.run_pending()
time.sleep(60)

Now I wake up to a summary instead of spending two hours manually checking platforms.

What I Learned

Building this agent taught me:

  • Unified patterns beat fragmented scrapers
  • Parallel execution saves time
  • Environment variables prevent deployment failures
  • Caching handles rate limits
  • Retry logic handles transient failures

The agent runs in 10 seconds. It used to take me two hours manually.

Summary

This post showed how to build an AI agent that monitors Reddit, Twitter, YouTube, Hacker News, and Polymarket simultaneously. The key point is using a unified agent pattern instead of fragmented scrapers.

The architecture uses platform tools that implement a standard interface. The agent fetches all platforms in parallel, cleans the content with de-watermarking, and summarizes everything with an LLM.

I added caching for rate limits, retry logic for failures, and scheduling for daily automation. The result: 10 seconds instead of 2 hours of manual checking.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments