How Can You Reduce Token Usage When Web Scraping with AI Agents?

Mar 19, 2026

Problem

I built an AI agent to scrape product information from 50 e-commerce sites. My first version worked—but it cost $47 in tokens for a single run.

When I looked at where the tokens went, I found the problem:

Token Breakdown (Naive Approach):
- Raw HTML content: 847,000 tokens (89%)
- Agent reasoning: 52,000 tokens (5%)
- Planning and extraction: 53,000 tokens (6%)
Total: 952,000 tokens (~$47 at GPT-4 rates)

89% of my token budget went to raw HTML. The AI was processing navigation menus, footers, scripts, and ads just to extract product names and prices.

I needed a better approach.

The Naive Approach (Expensive)

My original architecture sent entire web pages to the AI:

import openai

async def scrape_product(url: str):
    # Fetch raw HTML
    response = await fetch(url)
    html = response.text

    # Send entire page to AI
    prompt = f"""
    Extract product information from this HTML:

    {html}

    Return JSON with: name, price, description, availability
    """

    result = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )

    return parse_json(result.choices[0].message.content)

This worked but was expensive for several reasons:

Large context: Raw HTML includes scripts, styles, navigation
Duplicate processing: Same elements processed repeatedly
No caching: Every request starts from scratch
Token-heavy prompts: Sending entire pages each time

The Hybrid Approach (Cheap)

I redesigned the system to separate concerns:

Firecrawl handles extraction, cleaning, and structuring
AI Agent handles planning, reasoning, and validation

Here’s the new architecture:

Hybrid Architecture:

URL → Firecrawl (clean HTML, extract structure)
    → Structured Data (markdown, minimal)
    → AI Agent (plan extraction strategy)
    → Extracted Fields

The implementation:

from firecrawl import FirecrawlApp
import openai

class HybridWebScraper:
    def __init__(self, firecrawl_api_key: str, openai_api_key: str):
        self.firecrawl = FirecrawlApp(api_key=firecrawl_api_key)
        openai.api_key = openai_api_key

    async def scrape_product(self, url: str) -> dict:
        # Step 1: Firecrawl extracts and cleans
        # Returns structured markdown, not raw HTML
        firecrawl_result = await self.firecrawl.scrape_url(
            url,
            params={
                'formats': ['markdown', 'html'],
                'onlyMainContent': True,  # Skip nav, footer, ads
                'excludeTags': ['nav', 'footer', 'aside', 'script', 'style']
            }
        )

        markdown = firecrawl_result['markdown']

        # Step 2: AI plans extraction strategy
        plan = await self.plan_extraction(markdown[:5000])  # Use first 5K chars

        # Step 3: AI extracts based on plan
        product = await self.extract_fields(markdown, plan)

        return product

    async def plan_extraction(self, sample_content: str) -> dict:
        """AI plans how to extract data - small context"""
        prompt = f"""
        Analyze this product page sample and plan extraction:

        {sample_content}

        Return JSON with extraction strategy:
        {{
            "price_pattern": "regex or selector hint",
            "name_location": "likely location in content",
            "description_section": "marker to find description"
        }}
        """

        result = await openai.chat.completions.create(
            model="gpt-4o-mini",  # Cheaper model for planning
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )

        return parse_json(result.choices[0].message.content)

    async def extract_fields(self, content: str, plan: dict) -> dict:
        """Extract fields using plan - targeted extraction"""
        prompt = f"""
        Extract product info using this strategy:
        {json.dumps(plan, indent=2)}

        Content:
        {content[:10000]}

        Return JSON:
        {{ "name": "", "price": "", "description": "", "availability": "" }}
        """

        result = await openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )

        return parse_json(result.choices[0].message.content)

Cost Comparison

I ran both approaches on the same 50 URLs:

Naive Approach:
- Raw HTML per page: ~17,000 tokens average
- AI processing: ~19,000 tokens per page
- Total: 950,000 tokens
- Cost: $47.00

Hybrid Approach:
- Firecrawl processing: ~$0.02 per page = $1.00
- Cleaned markdown per page: ~3,000 tokens average
- AI planning: ~500 tokens per page
- AI extraction: ~2,500 tokens per page
- Total AI tokens: 150,000 tokens
- AI cost: $7.50
- Total cost: $8.50

Savings: 82%

The token reduction came from:

Token Reduction Breakdown:
- Removing scripts/styles: -40% tokens
- Removing nav/footer/ads: -25% tokens
- Converting HTML to markdown: -15% tokens
- Using smaller model for planning: -60% cost on that step
- Targeted extraction vs full parsing: -20% tokens

Advanced Optimization: Caching Plans

I noticed many e-commerce sites use similar layouts. Instead of planning extraction for each page, I cache plans by site template:

import hashlib
from functools import lru_cache

class CachedHybridScraper:
    def __init__(self):
        self.plan_cache = {}  # Domain -> extraction plan

    async def scrape_with_cache(self, url: str) -> dict:
        domain = extract_domain(url)

        # Check cache for extraction plan
        if domain in self.plan_cache:
            plan = self.plan_cache[domain]
        else:
            # First time seeing this domain
            sample = await self.get_sample_content(url)
            plan = await self.plan_extraction(sample)
            self.plan_cache[domain] = plan

        # Extract using cached plan
        content = await self.get_clean_content(url)
        return await self.extract_fields(content, plan)

    async def get_sample_content(self, url: str) -> str:
        """Get minimal sample for planning"""
        result = await self.firecrawl.scrape_url(
            url,
            params={'formats': ['markdown'], 'onlyMainContent': True}
        )
        return result['markdown'][:2000]  # Just first 2K chars

With caching, I reduced costs further:

Cached Hybrid Approach (50 URLs, 5 unique sites):
- Firecrawl: $1.00 (same)
- AI planning: 5 plans x 500 tokens = 2,500 tokens
- AI extraction: 50 pages x 2,500 tokens = 125,000 tokens
- Total AI tokens: 127,500 tokens
- AI cost: $6.37
- Total cost: $7.37

Additional savings: 13%
Overall savings vs naive: 84%

When to Use This Approach

The hybrid approach works best when:

Ideal Use Cases:
- Structured data extraction (products, articles, listings)
- Multiple pages from same sites (caching helps)
- High-volume scraping (cost savings multiply)
- Regular monitoring tasks (weekly price checks, etc.)

Less Suitable:
- One-off pages (overhead doesn't pay off)
- Unstructured content (creative writing, opinions)
- Sites that block scrapers (Firecrawl handles some, not all)

Complete Implementation

Here’s the full production-ready scraper:

import asyncio
from dataclasses import dataclass
from firecrawl import FirecrawlApp
import openai
from urllib.parse import urlparse

@dataclass
class ScraperConfig:
    firecrawl_api_key: str
    openai_api_key: str
    max_retries: int = 3
    cache_enabled: bool = True
    planning_model: str = "gpt-4o-mini"
    extraction_model: str = "gpt-4o-mini"

class ProductionWebScraper:
    def __init__(self, config: ScraperConfig):
        self.config = config
        self.firecrawl = FirecrawlApp(api_key=config.firecrawl_api_key)
        openai.api_key = config.openai_api_key
        self.plan_cache: dict[str, dict] = {}

    async def scrape_batch(self, urls: list[str]) -> list[dict]:
        """Scrape multiple URLs with rate limiting"""
        results = []
        for i, url in enumerate(urls):
            try:
                result = await self.scrape_single(url)
                results.append({"url": url, "data": result, "success": True})
            except Exception as e:
                results.append({"url": url, "error": str(e), "success": False})

            # Rate limiting
            if i < len(urls) - 1:
                await asyncio.sleep(1)

        return results

    async def scrape_single(self, url: str) -> dict:
        """Scrape single URL with hybrid approach"""
        # Step 1: Get cleaned content via Firecrawl
        content = await self._fetch_clean_content(url)

        # Step 2: Get or create extraction plan
        plan = await self._get_plan(url, content[:2000])

        # Step 3: Extract data
        data = await self._extract(content, plan)

        return data

    async def _fetch_clean_content(self, url: str) -> str:
        """Fetch and clean content via Firecrawl"""
        result = await self.firecrawl.scrape_url(
            url,
            params={
                'formats': ['markdown'],
                'onlyMainContent': True,
                'excludeTags': ['nav', 'footer', 'aside', 'script', 'style', 'header']
            }
        )
        return result['markdown']

    async def _get_plan(self, url: str, sample: str) -> dict:
        """Get cached plan or create new one"""
        domain = urlparse(url).netloc

        if self.config.cache_enabled and domain in self.plan_cache:
            return self.plan_cache[domain]

        plan = await self._create_plan(sample)

        if self.config.cache_enabled:
            self.plan_cache[domain] = plan

        return plan

    async def _create_plan(self, sample: str) -> dict:
        """Create extraction plan using AI"""
        prompt = f"""
        Analyze this content sample and create an extraction plan:

        {sample}

        Return JSON with:
        - content_type: "product" | "article" | "listing" | "other"
        - primary_fields: list of fields to extract
        - field_hints: location hints for each field
        """

        result = await openai.chat.completions.create(
            model=self.config.planning_model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )

        return json.loads(result.choices[0].message.content)

    async def _extract(self, content: str, plan: dict) -> dict:
        """Extract fields using plan"""
        prompt = f"""
        Extract data using this plan:
        {json.dumps(plan, indent=2)}

        Content:
        {content[:15000]}

        Return extracted data as JSON.
        """

        result = await openai.chat.completions.create(
            model=self.config.extraction_model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )

        return json.loads(result.choices[0].message.content)

Summary

In this post, I showed how to reduce AI agent web scraping costs by 70-90% using a hybrid approach. The key strategies:

Use Firecrawl for cleaning: Remove scripts, styles, navigation before AI sees it
Separate planning from execution: Small model plans, cheaper model extracts
Cache extraction plans: Similar pages don’t need re-planning
Use smaller models: gpt-4o-mini is 20x cheaper than gpt-4 for extraction tasks

The hybrid approach separates concerns: Firecrawl handles the heavy lifting of parsing and cleaning HTML, while AI handles the intelligent work of understanding structure and extracting data. This division of labor dramatically reduces token consumption while maintaining extraction quality.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Firecrawl Documentation
👨‍💻 Token optimization strategies

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!