How to resolve LangExtract Ollama timeout and connection errors

Feb 13, 2026

Problem

When I tried to use LangExtract with Ollama for local LLM inference, I got these errors:

Error 1 - Connection refused:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434):
Max retries exceeded with url: /api/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8a4c3d9e80>:
Failed to establish a new connection: [Errno 111] Connection refused'))

Error 2 - Timeout with large models:

requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=11434):
Read timed out. (read timeout=120)

Error 3 - Invalid JSON output:

json.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

These errors occurred with different models and configurations, making it hard to debug.

Environment

Python 3.11
LangExtract 0.5.0
Ollama 0.1.24
macOS 14.5

What happened?

I wanted to use LangExtract with local Ollama models to extract structured data from text. I started with a basic setup:

import langextract as lx

result = lx.extract(
    text="Apple Inc. was founded by Steve Jobs in 1976.",
    prompt_description="Extract company entities",
    examples=[
        {
            "text": "Microsoft was founded by Bill Gates.",
            "entities": [{"company": "Microsoft", "founder": "Bill Gates"}]
        }
    ],
    model_id="gemma2:2b",
    model_url="http://localhost:11434"
)
print(result)

I ran this with Ollama running in one terminal:

# Terminal 1
ollama serve

And my Python script in another:

# Terminal 2
python extract.py

But I got the connection refused error. I realized Ollama wasn’t actually running. I fixed that by checking the service status first.

Then I tried with a larger model and got timeout errors:

result = lx.extract(
    text="Long text with many paragraphs...",
    prompt_description="Extract all entities",
    examples=[...],
    model_id="llama3.1:70b",  # 70 billion parameters
    model_url="http://localhost:11434",
    # timeout is 120 seconds by default
)

The script timed out after 120 seconds. Large models need more time.

Then I tried with gpt-oss:20b and got JSON parse errors. The model output wasn’t valid JSON.

How to solve it?

I tackled each error separately.

Solution 1: Test Ollama connection first

Before running LangExtract, I verified Ollama is running:

import requests

def test_ollama_connection(url="http://localhost:11434"):
    try:
        response = requests.get(f"{url}/api/tags", timeout=5)
        if response.status_code == 200:
            print("✓ Ollama is running")
            models = response.json().get("models", [])
            print(f"Available models: {[m['name'] for m in models]}")
            return True
    except requests.exceptions.ConnectionError:
        print("✗ Ollama is not running. Start it with: ollama serve")
    except Exception as e:
        print(f"✗ Error: {e}")
    return False

test_ollama_connection()

I run this before extraction to catch connection issues early.

Solution 2: Increase timeout for large models

I added the timeout parameter based on model size:

import langextract as lx

# Timeout guidelines based on my testing:
# 2B-8B models: 60-120 seconds
# 13B-30B models: 120-180 seconds
# 70B models: 300-600 seconds

result = lx.extract(
    text="your text here...",
    prompt_description="Extract entities",
    examples=[...],
    model_id="llama3.1:70b",
    model_url="http://localhost:11434",
    timeout=300,  # 5 minutes for 70B model
)

I created a simple helper function:

def get_timeout_for_model(model_id: str) -> int:
    """Return recommended timeout in seconds based on model size."""
    if "70b" in model_id.lower():
        return 300
    if "30b" in model_id.lower() or "13b" in model_id.lower():
        return 180
    return 120  # default for smaller models

Solution 3: Use fenced output for non-schema models

Some models like gpt-oss:20b don’t follow JSON schema well. I use fence_output=True to wrap output in markdown code blocks:

result = lx.extract(
    text="your text here...",
    prompt_description="Extract entities",
    examples=[...],
    model_id="gpt-oss:20b",
    model_url="http://localhost:11434",
    timeout=180,
    fence_output=True,  # Force markdown code fence
    use_schema_constraints=False  # Don't enforce JSON schema
)

This forces the model to output in markdown code blocks, which LangExtract can parse more reliably.

Docker setup for consistent environment

I use Docker Compose to run Ollama consistently:

version: '3.8'
services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0

volumes:
  ollama_data:

Then start Ollama:

docker-compose up -d
docker exec -it <container-id> ollama pull llama3.1:70b

The reason

I think these errors occur for specific reasons:

Connection refused: Ollama service isn’t running. The default port 11434 needs the ollama serve command or Docker container to be active.

Timeout: Default timeout is 120 seconds. Large models like llama3.1:70b need 5+ minutes on consumer hardware. The request times out before the model finishes inference.

Invalid JSON: Some models (like gpt-oss:20b) aren’t fine-tuned for structured output. They may add extra text, miss quotes, or produce malformed JSON. Using fence_output=True helps by asking for markdown-wrapped output instead.

Here’s a quick troubleshooting flow:

┌─────────────────────┐
│ LangExtract errors? │
└──────────┬──────────┘
           │
           ▼
    ┌──────────────┐
    │ Can you curl │
    │ localhost:   │
    │ 11434?       │
    └──────┬───────┘
           │
     No ───┴─── Yes
     │            │
     ▼            ▼
┌─────────┐  ┌────────────┐
│Start    │  │Which error?│
│Ollama   │  └──────┬─────┘
└─────────┘         │
           ┌────────┼────────┐
           ▼        ▼        ▼
      ┌──────┐ ┌────────┐ ┌──────┐
      │Timeout│ │Invalid│ │Other │
      └───┬───┘ └───┬────┘ └───┬──┘
          │         │          │
          ▼         ▼          ▼
     Increase   fence_output   Check
     timeout    =True        logs

Summary

In this post, I solved common LangExtract Ollama provider errors. The key points are:

Test Ollama connection before running extraction
Increase timeout for large models (70B needs 5+ minutes)
Use fence_output=True for models that don’t follow JSON schema well
Consider Docker for consistent Ollama environment

These fixes resolved the errors I encountered with gemma2:2b, llama3.1:70b, and gpt-oss:20b models.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!