How to Use Lightpanda MCP Server for AI Agent Web Browsing
The Problem: AI Agents Can’t Browse
I wanted my AI agents to browse the web, but every solution felt heavyweight. Puppeteer requires a driver library. Playwright needs installation. Selenium is… well, Selenium. Then I found Lightpanda’s built-in MCP server.
When building AI workflows, I kept hitting the same wall. My agents needed to:
- Read web pages
- Extract data
- Fill forms
- Click buttons
But every browser automation tool required me to write orchestration code. The agent couldn’t directly control the browser - it needed me as a middleman.
Then I discovered Lightpanda ships with an MCP server that exposes browser capabilities as tools AI agents can invoke directly.
What is MCP and Why It Matters
The Model Context Protocol (MCP) is a standard for AI models to interact with external tools. Think of it as a universal plugin system for AI agents.
Instead of this flow:
AI Agent -> My Code -> Puppeteer -> BrowserMCP enables this:
AI Agent -> MCP Tool -> BrowserThe agent decides what to do. No orchestration code required.
Setting Up Lightpanda MCP Server
First, I needed to understand what tools were available. Looking at the source code in src/mcp/tools.zig, I found 10 tools exposed through the MCP server:
| Tool | Purpose |
|---|---|
goto | Navigate to a URL |
markdown | Extract page content as markdown |
links | Get all links from the page |
evaluate | Run JavaScript in page context |
semantic_tree | Get simplified DOM structure |
interactiveElements | Find clickable/fillable elements |
structuredData | Extract JSON-LD and OpenGraph data |
click | Click an element |
fill | Fill text into input fields |
scroll | Scroll the page |
These tools are designed for AI consumption, not human developers.
My First Attempt: Basic Navigation
I tried navigating to a URL and extracting content:
{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "goto", "arguments": { "url": "https://example.com" } }}The tool returned a success response. Then I called:
{ "jsonrpc": "2.0", "id": 2, "method": "tools/call", "params": { "name": "markdown" }}I got clean, formatted markdown back. No HTML parsing. No boilerplate removal. Just content ready for an LLM to process.
The Semantic Tree Advantage
What impressed me most was the semantic_tree tool. Traditional web scraping gives you raw HTML - a mess of divs, spans, and nested elements.
The semantic tree returns a simplified structure:
{ "type": "document", "children": [ { "type": "heading", "level": 1, "text": "Main Title" }, { "type": "paragraph", "text": "Some content here..." }, { "type": "button", "text": "Submit", "nodeId": 42 } ]}This is optimized for AI reasoning. The agent sees structure, not syntax.
Finding Interactive Elements
I needed to fill a search form. First, I asked to find interactive elements:
{ "jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": { "name": "interactiveElements" }}The response listed all clickable and fillable elements with their backend node IDs. Then I could:
{ "jsonrpc": "2.0", "id": 4, "method": "tools/call", "params": { "name": "fill", "arguments": { "backendNodeId": 15, "value": "my search query" } }}{ "jsonrpc": "2.0", "id": 5, "method": "tools/call", "params": { "name": "click", "arguments": { "backendNodeId": 22 } }}The agent navigated the form without me writing a single line of browser automation code.
MCP vs CDP: When to Use Each
Chrome DevTools Protocol (CDP) is powerful but low-level. You need to understand:
- Domains (DOM, Page, Network, etc.)
- Events and listeners
- Session management
MCP abstracts this away. The tools are:
- Self-documenting
- High-level
- Designed for agent workflows
Use MCP when:
- Building AI agents that browse the web
- You want declarative tool calls
- The agent controls the flow
Use CDP when:
- You need fine-grained control
- Building developer tools
- Writing custom browser automation
Real Use Case: Research Agent
I built an agent that researches topics across multiple websites. The workflow:
gotoa search enginefillthe search queryclicksearchmarkdownto extract resultsgotoeach promising linkstructuredDatato get metadata- Synthesize findings
No Python. No Node.js. Just MCP tool calls from the agent.
Extracting Structured Data
The structuredData tool extracts JSON-LD, OpenGraph, and other metadata:
{ "jsonrpc": "2.0", "id": 6, "method": "tools/call", "params": { "name": "structuredData" }}This returns schema.org data, social media metadata, and other structured information that helps agents understand page content semantically.
Common Pitfalls
I made these mistakes:
-
Not waiting for navigation - After
goto, pages may still load. I learned to check for expected content before proceeding. -
Ignoring nodeId - The
clickandfilltools need node IDs frominteractiveElementsorsemantic_tree. I initially tried using CSS selectors, which don’t work. -
Over-extracting -
markdownreturns everything. For research tasks, I foundsemantic_treegives cleaner context for LLMs.
Summary
In this post, I showed how Lightpanda’s MCP server bridges AI agents with web browsing capabilities. The key points are:
- MCP tools are designed for AI consumption - No orchestration code needed
- Semantic trees provide cleaner context than raw HTML
- Use node IDs from
interactiveElementsfor click and fill operations - Start with goto + markdown for simple extraction, add semantic_tree for complex reasoning
Lightpanda’s MCP server changed how I think about AI agent web browsing. Instead of writing orchestration code, I configure tools and let the agent decide.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments