Skip to content

How to resolve LangExtract Ollama timeout and connection errors

Problem

When I tried to use LangExtract with Ollama for local LLM inference, I got these errors:

Error 1 - Connection refused:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434):
Max retries exceeded with url: /api/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8a4c3d9e80>:
Failed to establish a new connection: [Errno 111] Connection refused'))

Error 2 - Timeout with large models:

requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=11434):
Read timed out. (read timeout=120)

Error 3 - Invalid JSON output:

json.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

These errors occurred with different models and configurations, making it hard to debug.

Environment

  • Python 3.11
  • LangExtract 0.5.0
  • Ollama 0.1.24
  • macOS 14.5

What happened?

I wanted to use LangExtract with local Ollama models to extract structured data from text. I started with a basic setup:

extract.py
import langextract as lx
result = lx.extract(
text="Apple Inc. was founded by Steve Jobs in 1976.",
prompt_description="Extract company entities",
examples=[
{
"text": "Microsoft was founded by Bill Gates.",
"entities": [{"company": "Microsoft", "founder": "Bill Gates"}]
}
],
model_id="gemma2:2b",
model_url="http://localhost:11434"
)
print(result)

I ran this with Ollama running in one terminal:

Terminal window
# Terminal 1
ollama serve

And my Python script in another:

Terminal window
# Terminal 2
python extract.py

But I got the connection refused error. I realized Ollama wasn’t actually running. I fixed that by checking the service status first.

Then I tried with a larger model and got timeout errors:

large_model.py
result = lx.extract(
text="Long text with many paragraphs...",
prompt_description="Extract all entities",
examples=[...],
model_id="llama3.1:70b", # 70 billion parameters
model_url="http://localhost:11434",
# timeout is 120 seconds by default
)

The script timed out after 120 seconds. Large models need more time.

Then I tried with gpt-oss:20b and got JSON parse errors. The model output wasn’t valid JSON.

How to solve it?

I tackled each error separately.

Solution 1: Test Ollama connection first

Before running LangExtract, I verified Ollama is running:

test_connection.py
import requests
def test_ollama_connection(url="http://localhost:11434"):
try:
response = requests.get(f"{url}/api/tags", timeout=5)
if response.status_code == 200:
print("✓ Ollama is running")
models = response.json().get("models", [])
print(f"Available models: {[m['name'] for m in models]}")
return True
except requests.exceptions.ConnectionError:
print("✗ Ollama is not running. Start it with: ollama serve")
except Exception as e:
print(f"✗ Error: {e}")
return False
test_ollama_connection()

I run this before extraction to catch connection issues early.

Solution 2: Increase timeout for large models

I added the timeout parameter based on model size:

timeout_config.py
import langextract as lx
# Timeout guidelines based on my testing:
# 2B-8B models: 60-120 seconds
# 13B-30B models: 120-180 seconds
# 70B models: 300-600 seconds
result = lx.extract(
text="your text here...",
prompt_description="Extract entities",
examples=[...],
model_id="llama3.1:70b",
model_url="http://localhost:11434",
timeout=300, # 5 minutes for 70B model
)

I created a simple helper function:

def get_timeout_for_model(model_id: str) -> int:
"""Return recommended timeout in seconds based on model size."""
if "70b" in model_id.lower():
return 300
if "30b" in model_id.lower() or "13b" in model_id.lower():
return 180
return 120 # default for smaller models

Solution 3: Use fenced output for non-schema models

Some models like gpt-oss:20b don’t follow JSON schema well. I use fence_output=True to wrap output in markdown code blocks:

fenced_output.py
result = lx.extract(
text="your text here...",
prompt_description="Extract entities",
examples=[...],
model_id="gpt-oss:20b",
model_url="http://localhost:11434",
timeout=180,
fence_output=True, # Force markdown code fence
use_schema_constraints=False # Don't enforce JSON schema
)

This forces the model to output in markdown code blocks, which LangExtract can parse more reliably.

Docker setup for consistent environment

I use Docker Compose to run Ollama consistently:

docker-compose.yml
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0
volumes:
ollama_data:

Then start Ollama:

Terminal window
docker-compose up -d
docker exec -it <container-id> ollama pull llama3.1:70b

The reason

I think these errors occur for specific reasons:

Connection refused: Ollama service isn’t running. The default port 11434 needs the ollama serve command or Docker container to be active.

Timeout: Default timeout is 120 seconds. Large models like llama3.1:70b need 5+ minutes on consumer hardware. The request times out before the model finishes inference.

Invalid JSON: Some models (like gpt-oss:20b) aren’t fine-tuned for structured output. They may add extra text, miss quotes, or produce malformed JSON. Using fence_output=True helps by asking for markdown-wrapped output instead.

Here’s a quick troubleshooting flow:

┌─────────────────────┐
│ LangExtract errors? │
└──────────┬──────────┘
┌──────────────┐
│ Can you curl │
│ localhost: │
│ 11434? │
└──────┬───────┘
No ───┴─── Yes
│ │
▼ ▼
┌─────────┐ ┌────────────┐
│Start │ │Which error?│
│Ollama │ └──────┬─────┘
└─────────┘ │
┌────────┼────────┐
▼ ▼ ▼
┌──────┐ ┌────────┐ ┌──────┐
│Timeout│ │Invalid│ │Other │
└───┬───┘ └───┬────┘ └───┬──┘
│ │ │
▼ ▼ ▼
Increase fence_output Check
timeout =True logs

Summary

In this post, I solved common LangExtract Ollama provider errors. The key points are:

  1. Test Ollama connection before running extraction
  2. Increase timeout for large models (70B needs 5+ minutes)
  3. Use fence_output=True for models that don’t follow JSON schema well
  4. Consider Docker for consistent Ollama environment

These fixes resolved the errors I encountered with gemma2:2b, llama3.1:70b, and gpt-oss:20b models.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments