How to Build Custom AI Agents with CrewAI When SaaS Tools Don't Fit Your Workflow

Mar 19, 2026

I needed an AI agent that could scrape product data from competitor websites, clean the messy HTML output, and save it to both Notion for my team and Postgres for analytics. I tried three different SaaS platforms. Each one failed at a different step.

One couldn’t access the websites due to IP restrictions. Another couldn’t write to my specific Notion database schema. The third worked but cost $200/month for the volume I needed.

So I decided to build my own.

The Problem with Generic AI Agents

SaaS AI agents promise to automate everything. But when you try to use them for real work, you hit walls:

One-size-fits-all approach: They’re designed for average use cases, not your specific workflow
Data privacy concerns: Your data flows through third-party servers
Limited customization: You can’t modify core behavior or add custom tools
Vendor lock-in: Switching costs become prohibitive over time
Cost at scale: Per-seat or per-action pricing explodes when you automate real workloads

I needed an agent that understood my database schema, followed my team’s naming conventions, and handled edge cases specific to my industry. Generic solutions couldn’t do that.

Why I Chose CrewAI

I had two options: build everything from scratch using raw LLM APIs, or find a framework.

Building from scratch meant writing orchestration logic, error handling, retry mechanisms, tool abstractions, and state management. I estimated two weeks minimum.

Then I found a Reddit comment that changed my approach:

“I built a scraper last week that feeds cleaned data into Notion and Postgres using CrewAI. Set up multi-agent teams in like 10 lines of Python, and it worked through errors on its own. Total sleeper compared to the big noisy ones.”

CrewAI abstracts the infrastructure so you focus on defining what agents should do, not how they coordinate.

Core Concepts

CrewAI has four main building blocks:

┌─────────────────────────────────────────────────┐
│                     Crew                         │
│  (A team of agents working together)              │
│                                                   │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐          │
│  │ Agent 1 │──│ Agent 2 │──│ Agent 3 │          │
│  │(Scraper)│  │(Cleaner)│  │ (Writer)│          │
│  └────┬────┘  └────┬────┘  └────┬────┘          │
│       │            │            │                │
│       ▼            ▼            ▼                │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐          │
│  │  Task 1 │  │  Task 2 │  │  Task 3 │          │
│  └─────────┘  └─────────┘  └─────────┘          │
│                                                   │
│       Tools: Web scraping, Notion API, Postgres   │
└─────────────────────────────────────────────────┘

Agents: Autonomous units with roles, goals, and backstories
Tasks: Define what needs to be done with expected outputs
Crews: Teams of agents working together on related tasks
Tools: Functions agents can use (web scraping, database queries, API calls)

My First CrewAI Setup

I started with the basics. Install and create a simple two-agent team:

pip install crewai crewai-tools

from crewai import Agent, Task, Crew
from crewai_tools import ScrapeWebsiteTool, NL2SQLTool

# Define agents with specific roles
scraper = Agent(
    role="Data Extractor",
    goal="Extract structured product data from e-commerce websites",
    backstory="""You are an expert web scraper who specializes in
    e-commerce data extraction. You understand HTML structure,
    handle rate limits gracefully, and always validate data quality.""",
    tools=[ScrapeWebsiteTool()],
    verbose=True
)

cleaner = Agent(
    role="Data Cleaner",
    goal="Clean and validate extracted data",
    backstory="""You ensure data quality by removing duplicates,
    fixing formatting issues, and validating against expected schemas.""",
    verbose=True
)

# Define tasks with clear expected outputs
scrape_task = Task(
    description="Extract product data from {url}",
    expected_output="JSON array with product name, price, description, and image URL",
    agent=scraper
)

clean_task = Task(
    description="Clean and validate the extracted product data",
    expected_output="Clean JSON array with validated fields",
    agent=cleaner
)

# Assemble and run
crew = Crew(
    agents=[scraper, cleaner],
    tasks=[scrape_task, clean_task]
)

result = crew.kickoff(inputs={"url": "https://example.com/products"})
print(result)

This worked. But I needed more—I needed to write to Notion and Postgres.

Adding Notion and Postgres Integration

I added a third agent to handle persistence:

from crewai import Agent, Task, Crew
from crewai_tools import ScrapeWebsiteTool, NL2SQLTool
import os

# Set up database connection
DATABASE_URL = os.environ.get("DATABASE_URL")

# Create tools
scrape_tool = ScrapeWebsiteTool()
db_tool = NL2SQLTool(db_uri=DATABASE_URL)

# Define all three agents
scraper = Agent(
    role="Data Extractor",
    goal="Extract structured data from websites",
    backstory="Expert web scraper with attention to detail",
    tools=[scrape_tool]
)

notion_writer = Agent(
    role="Notion Publisher",
    goal="Format and publish content to Notion database",
    backstory="Content organizer skilled in knowledge management",
    tools=[NotionTool()]  # Custom tool
)

db_agent = Agent(
    role="Database Manager",
    goal="Store and query data in PostgreSQL",
    backstory="Database specialist ensuring data integrity",
    tools=[db_tool]
)

# Define the task chain
scrape_task = Task(
    description="Extract product data from {url}",
    expected_output="JSON with product details",
    agent=scraper
)

notion_task = Task(
    description="""Format the scraped data and save to Notion.
    Use the Product Database with these fields: Name, Price, URL, Image.""",
    expected_output="Notion page URL",
    agent=notion_writer,
    context=[scrape_task]  # This task depends on scrape_task output
)

db_task = Task(
    description="""Store the scraped data in the products table.
    Schema: id (serial), name (text), price (decimal), url (text),
    scraped_at (timestamp), image_url (text)""",
    expected_output="Number of rows inserted",
    agent=db_agent,
    context=[scrape_task]
)

# Run the crew
crew = Crew(
    agents=[scraper, notion_writer, db_agent],
    tasks=[scrape_task, notion_task, db_task]
)

result = crew.kickoff(inputs={"url": "https://example.com/products"})

The first time I ran this, it failed. The agents couldn’t coordinate properly. I had skipped defining the context parameter that links tasks together.

Once I added context=[scrape_task] to the dependent tasks, the agents started passing data correctly.

Creating Custom Tools

The built-in tools cover common cases, but I needed a custom Notion tool:

from crewai_tools import tool
from notion_client import Client

@tool("Save to Notion database")
def save_to_notion(database_id: str, data: dict) -> str:
    """
    Save data to a Notion database.

    Args:
        database_id: The Notion database ID
        data: Dictionary with field names and values

    Returns:
        URL of the created page
    """
    notion = Client(auth=os.environ.get("NOTION_API_KEY"))

    # Transform data to Notion format
    properties = {}
    for key, value in data.items():
        if isinstance(value, str):
            properties[key] = {"title": [{"text": {"content": value}}]}
        elif isinstance(value, (int, float)):
            properties[key] = {"number": value}

    # Create the page
    page = notion.pages.create(
        parent={"database_id": database_id},
        properties=properties
    )

    return page["url"]

What Surprised Me: Self-Healing Behavior

The most impressive part wasn’t the initial setup. It was how CrewAI handled errors.

When my scraper hit a rate limit on a competitor’s site, I expected the whole pipeline to crash. Instead, the agent:

Detected the 429 Too Many Requests error
Waited and retried with exponential backoff
Eventually succeeded after three attempts

I didn’t write any error handling code. The framework handled it.

[Agent: Data Extractor] Scraping https://example.com/products...
[Agent: Data Extractor] Error: 429 Too Many Requests
[Agent: Data Extractor] Waiting 5 seconds before retry...
[Agent: Data Extractor] Retrying scrape...
[Agent: Data Extractor] Error: 429 Too Many Requests
[Agent: Data Extractor] Waiting 15 seconds before retry...
[Agent: Data Extractor] Retrying scrape...
[Agent: Data Extractor] Successfully extracted 47 products
[Agent: Data Cleaner] Validating data quality...
[Agent: Notion Publisher] Creating Notion page...

This self-healing behavior is built into CrewAI’s task execution loop.

Common Mistakes I Made

Mistake 1: Starting with too many agents

I initially created five agents: scraper, cleaner, validator, formatter, and publisher. The coordination overhead made the system slower and harder to debug. I consolidated to three agents and it worked better.

Mistake 2: Writing elaborate backstories

I spent an hour crafting detailed backstories for each agent. Turns out, short and focused backstories work just as well. The AI doesn’t need a novel—it needs clear goals.

Mistake 3: Skipping the expected_output field

I left expected_output vague initially. The agents produced inconsistent formats. Once I specified exact outputs like “JSON array with fields: name, price, url”, the quality improved dramatically.

Mistake 4: Not testing tools individually

I built a custom Postgres tool and immediately threw it into a crew. It failed in confusing ways. I should have tested the tool standalone first:

# Test tool in isolation before using in crew
result = db_tool._run("SELECT * FROM products LIMIT 1")
print(result)  # Verify output format

Mistake 5: Ignoring token costs

My first crew burned through $50 in API credits in one day. I added verbose=True to debug—and forgot to turn it off in production. The extra logging tripled my token usage.

CrewAI vs LangGraph

After building with CrewAI, I tried the same workflow in LangGraph. Here’s what I found:

| Aspect           | CrewAI                    | LangGraph                    |
|------------------|---------------------------|------------------------------|
| Setup time       | ~1 hour                   | ~4 hours                     |
| Lines of code    | ~50                       | ~150                         |
| Error handling   | Built-in                  | Manual                       |
| Control level    | Declarative               | Explicit state management    |
| Best for         | Quick prototyping, teams  | Complex workflows, precision |

CrewAI is faster to start. LangGraph gives you more control when you need it.

For my scraping pipeline, CrewAI was the right choice. But when I needed conditional branching based on document types, I switched to LangGraph for that specific workflow.

Summary

In this post, I showed how to build a multi-agent system with CrewAI that scrapes websites, cleans data, and writes to both Notion and Postgres. The key point is CrewAI handles orchestration and error recovery automatically, so you focus on defining agent roles and tasks.

The Reddit insight about “10 lines of Python” wasn’t an exaggeration. The core crew definition really is that simple. What takes time is building custom tools and tuning your specific workflow.

If you’re tired of SaaS agents that almost work but don’t quite fit your needs, building your own with CrewAI is a practical alternative. Start with one simple task, add complexity gradually, and let the framework handle the plumbing.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 CrewAI GitHub
👨‍💻 CrewAI Documentation
👨‍💻 CrewAI Tools

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!