Skip to content

How to Use Lightpanda MCP Server for AI Agent Web Browsing

The Problem: AI Agents Can’t Browse

I wanted my AI agents to browse the web, but every solution felt heavyweight. Puppeteer requires a driver library. Playwright needs installation. Selenium is… well, Selenium. Then I found Lightpanda’s built-in MCP server.

When building AI workflows, I kept hitting the same wall. My agents needed to:

  • Read web pages
  • Extract data
  • Fill forms
  • Click buttons

But every browser automation tool required me to write orchestration code. The agent couldn’t directly control the browser - it needed me as a middleman.

Then I discovered Lightpanda ships with an MCP server that exposes browser capabilities as tools AI agents can invoke directly.

What is MCP and Why It Matters

The Model Context Protocol (MCP) is a standard for AI models to interact with external tools. Think of it as a universal plugin system for AI agents.

Instead of this flow:

AI Agent -> My Code -> Puppeteer -> Browser

MCP enables this:

AI Agent -> MCP Tool -> Browser

The agent decides what to do. No orchestration code required.

Setting Up Lightpanda MCP Server

First, I needed to understand what tools were available. Looking at the source code in src/mcp/tools.zig, I found 10 tools exposed through the MCP server:

ToolPurpose
gotoNavigate to a URL
markdownExtract page content as markdown
linksGet all links from the page
evaluateRun JavaScript in page context
semantic_treeGet simplified DOM structure
interactiveElementsFind clickable/fillable elements
structuredDataExtract JSON-LD and OpenGraph data
clickClick an element
fillFill text into input fields
scrollScroll the page

These tools are designed for AI consumption, not human developers.

My First Attempt: Basic Navigation

I tried navigating to a URL and extracting content:

MCP tool call - navigate
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "goto",
"arguments": {
"url": "https://example.com"
}
}
}

The tool returned a success response. Then I called:

MCP tool call - extract markdown
{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/call",
"params": {
"name": "markdown"
}
}

I got clean, formatted markdown back. No HTML parsing. No boilerplate removal. Just content ready for an LLM to process.

The Semantic Tree Advantage

What impressed me most was the semantic_tree tool. Traditional web scraping gives you raw HTML - a mess of divs, spans, and nested elements.

The semantic tree returns a simplified structure:

Semantic tree output example
{
"type": "document",
"children": [
{
"type": "heading",
"level": 1,
"text": "Main Title"
},
{
"type": "paragraph",
"text": "Some content here..."
},
{
"type": "button",
"text": "Submit",
"nodeId": 42
}
]
}

This is optimized for AI reasoning. The agent sees structure, not syntax.

Finding Interactive Elements

I needed to fill a search form. First, I asked to find interactive elements:

MCP tool call - find elements
{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "interactiveElements"
}
}

The response listed all clickable and fillable elements with their backend node IDs. Then I could:

MCP tool call - fill form
{
"jsonrpc": "2.0",
"id": 4,
"method": "tools/call",
"params": {
"name": "fill",
"arguments": {
"backendNodeId": 15,
"value": "my search query"
}
}
}
MCP tool call - click button
{
"jsonrpc": "2.0",
"id": 5,
"method": "tools/call",
"params": {
"name": "click",
"arguments": {
"backendNodeId": 22
}
}
}

The agent navigated the form without me writing a single line of browser automation code.

MCP vs CDP: When to Use Each

Chrome DevTools Protocol (CDP) is powerful but low-level. You need to understand:

  • Domains (DOM, Page, Network, etc.)
  • Events and listeners
  • Session management

MCP abstracts this away. The tools are:

  • Self-documenting
  • High-level
  • Designed for agent workflows

Use MCP when:

  • Building AI agents that browse the web
  • You want declarative tool calls
  • The agent controls the flow

Use CDP when:

  • You need fine-grained control
  • Building developer tools
  • Writing custom browser automation

Real Use Case: Research Agent

I built an agent that researches topics across multiple websites. The workflow:

  1. goto a search engine
  2. fill the search query
  3. click search
  4. markdown to extract results
  5. goto each promising link
  6. structuredData to get metadata
  7. Synthesize findings

No Python. No Node.js. Just MCP tool calls from the agent.

Extracting Structured Data

The structuredData tool extracts JSON-LD, OpenGraph, and other metadata:

MCP tool call - structured data
{
"jsonrpc": "2.0",
"id": 6,
"method": "tools/call",
"params": {
"name": "structuredData"
}
}

This returns schema.org data, social media metadata, and other structured information that helps agents understand page content semantically.

Common Pitfalls

I made these mistakes:

  1. Not waiting for navigation - After goto, pages may still load. I learned to check for expected content before proceeding.

  2. Ignoring nodeId - The click and fill tools need node IDs from interactiveElements or semantic_tree. I initially tried using CSS selectors, which don’t work.

  3. Over-extracting - markdown returns everything. For research tasks, I found semantic_tree gives cleaner context for LLMs.

Summary

In this post, I showed how Lightpanda’s MCP server bridges AI agents with web browsing capabilities. The key points are:

  • MCP tools are designed for AI consumption - No orchestration code needed
  • Semantic trees provide cleaner context than raw HTML
  • Use node IDs from interactiveElements for click and fill operations
  • Start with goto + markdown for simple extraction, add semantic_tree for complex reasoning

Lightpanda’s MCP server changed how I think about AI agent web browsing. Instead of writing orchestration code, I configure tools and let the agent decide.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments