How to create custom LangExtract provider plugin from scratch

Feb 13, 2026

Purpose

I wanted to use Claude with LangExtract, but there’s no built-in Claude provider. The official provider documentation has 580+ lines that mix concepts with implementation details, and the entry point configuration is buried in the middle. I couldn’t find a complete working example from scratch to publish.

So I figured out how to create a custom provider plugin by reading the source code and testing locally. This post shows the complete process.

Environment

Python 3.11
langextract 1.0.0+
anthropic 0.18.0+

The Provider System

LangExtract uses Python entry points to discover providers dynamically. When you install a provider package, it registers itself through pyproject.toml, and LangExtract can use it immediately.

Here’s how it works:

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│ User Code   │ ──→ │ LangExtract  │ ──→ │ Discovery   │
│             │     │ Registry     │     │ (Entry Pt.) │
└─────────────┘     └──────────────┘     └─────────────┘
                                                  ↓
                                          ┌─────────────┐
                                          │ Your Plugin │
                                          └─────────────┘

The key parts are:

Entry point: Registers your provider in pyproject.toml
Registry pattern: Decorator that matches model IDs
BaseLanguageModel: Interface you implement

Step 1: Create Package Structure

I created this directory layout:

langextract-claude/
├── pyproject.toml          # Entry points + metadata
├── README.md
├── LICENSE
└── langextract_claude/
    ├── __init__.py          # Exports provider
    └── provider.py           # Main implementation

The structure is minimal. You only need pyproject.toml for metadata and entry points, plus the provider implementation.

Step 2: Configure pyproject.toml

Here’s the complete configuration file:

[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "langextract-claude"
version = "0.1.0"
description = "Claude provider for LangExtract"
dependencies = [
    "langextract>=1.0.0",
    "anthropic>=0.18.0"
]

# CRITICAL: This registers your provider
[project.entry-points."langextract.providers"]
claude = "langextract_claude:ClaudeLanguageModel"

The most important part is the [project.entry-points."langextract.providers"] section. This tells LangExtract:

When someone uses model_id="claude-3-5-sonnet"
Load the ClaudeLanguageModel class from langextract_claude

I missed this at first. I thought just importing the module would work, but LangExtract needs the entry point to discover providers automatically.

Step 3: Implement Provider

Here’s the minimal provider implementation:

import os
from typing import Generator, List

import langextract as lx
from langextract.inference import ScoredOutput

@lx.providers.registry.register(
    r'^claude',                    # Matches: claude-3-5-sonnet
    r'^anthropic',                # Matches: anthropic-custom
    priority=10                     # Higher than built-ins
)
class ClaudeLanguageModel(lx.inference.BaseLanguageModel):
    """Claude AI provider for LangExtract."""

    def __init__(
        self,
        model_id: str,
        api_key: str = None,
        **kwargs
    ):
        super().__init__()
        self.model_id = model_id

        # API key: Check Claude-specific then generic
        self.api_key = api_key or os.environ.get(
            'ANTHROPIC_API_KEY',
            os.environ.get('LANGEXTRACT_API_KEY')
        )

        if not self.api_key:
            raise ValueError(
                "ANTHROPIC_API_KEY or LANGEXTRACT_API_KEY required"
            )

        # Initialize Anthropic client
        from anthropic import Anthropic
        self.client = Anthropic(api_key=self.api_key)

        # Claude-specific settings
        self.max_tokens = kwargs.get('max_tokens', 4096)
        self.temperature = kwargs.get('temperature', 0.0)

    def infer(
        self,
        batch_prompts: List[str],
        **kwargs
    ) -> Generator[List[ScoredOutput], None, None]:
        """Run inference on batch of prompts."""

        for prompt in batch_prompts:
            try:
                # Call Claude API
                response = self.client.messages.create(
                    model=self.model_id,
                    max_tokens=self.max_tokens,
                    temperature=self.temperature,
                    messages=[{"role": "user", "content": prompt}]
                )

                # Extract JSON from response
                output_text = response.content[0].text

                # LangExtract expects ScoredOutput
                yield [ScoredOutput(
                    score=1.0,
                    output=output_text
                )]

            except Exception as e:
                # Wrap errors for LangExtract
                raise lx.InferenceError(
                    f"Claude API error: {e}"
                ) from e

The key parts:

@lx.providers.registry.register(): Registers patterns that match model IDs
infer() method: Must yield List[ScoredOutput] for each prompt
Error handling: Wrap exceptions in lx.InferenceError

The regex patterns r'^claude' and r'^anthropic' mean any model ID starting with “claude” or “anthropic” will use this provider.

Step 4: Export from init.py

I made sure to export the provider class:

from langextract_claude.provider import ClaudeLanguageModel

__all__ = ["ClaudeLanguageModel"]

Step 5: Test Locally

First, install in editable mode:

cd langextract-claude
pip install -e .

Then verify the provider is registered:

import langextract as lx
print('Registered providers:', lx.providers.registry.list_entries())

I got this output:

Registered providers: [('openai', 0), ('claude', 10), ('anthropic', 10)]

You can see that my claude provider shows up with priority 10, which is higher than the built-in openai provider (priority 0).

Step 6: Test Extraction

Here’s a test script:

import langextract as lx

result = lx.extract(
    text="Jane Doe, age 32, prescribed Lisinopril 10mg",
    prompt_description="Extract patient name, age, and medications",
    examples=[
        lx.data.ExampleData(
            text="John Smith, 45, takes Metformin 500mg",
            extractions=[
                lx.data.Extraction(
                    extraction_class="patient",
                    extraction_text="John Smith",
                    attributes={"age": "45"}
                ),
                lx.data.Extraction(
                    extraction_class="medication",
                    extraction_text="Metformin 500mg",
                    attributes={}
                )
            ]
        )
    ],
    model_id="claude-3-5-sonnet-20241022"
)

print("Extractions:", result.extractions)

When I ran this, it worked:

Extractions: [Extraction(extraction_class='patient', extraction_text='Jane Doe', attributes={'age': '32'}), Extraction(extraction_class='medication', extraction_text='Lisinopril 10mg', attributes={})]

Step 7: Build and Publish

To publish to PyPI:

pip install build twine
python -m build
ls dist/
# dist/langextract_claude-0.1.0-py3-none-any.whl

twine upload dist/*

Now users can install it:

pip install langextract-claude

Common Issues

I ran into some issues while testing:

Issue: Provider not discovered

I checked the entry point name was lowercase (claude not Claude). Then verified with:

pip show -f langextract-claude | grep entry-points

Issue: Import error for dependencies

I had anthropic in dev-dependencies at first. I moved it to dependencies because users need it to use the provider.

Issue: Pattern doesn’t match

I used the registry CLI to debug:

import langextract as lx
lx.providers.registry.list_entries()
# Should show your new patterns

Summary

In this post, I showed how to create a custom LangExtract provider plugin from scratch. The key point is configuring entry points in pyproject.toml and implementing the BaseLanguageModel.infer() method to yield ScoredOutput objects.

Once you understand the entry point system and registry pattern, you can add support for any LLM (Mistral, local models, custom APIs) to LangExtract.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!