Skip to content

How to resolve LangExtract OpenAI setup errors when switching from Gemini

Problem

When I tried using LangExtract with OpenAI’s GPT models after setting it up with Gemini, I got these errors:

Traceback (most recent call last):
File "/path/to/test.py", line 5, in <module>
import langextract as lx
File "/path/to/langextract/__init__.py", line 10, in <module>
from .extract import extract
File "/path/to/langextract/extract.py", line 15, in <module>
from openai import OpenAI
ModuleNotFoundError: No module named 'openai'

Then after installing the missing module, I got authentication errors:

openai.AuthenticationError: Error code: 401 - Incorrect API key provided

And after fixing the API key, the output parsing failed:

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Environment

  • Python 3.11
  • LangExtract 0.1.0
  • OpenAI GPT-4o
  • macOS 14.5

What happened?

I was using LangExtract with Gemini and it worked fine. Here’s my working Gemini setup:

gemini_example.py
import langextract as lx
import os
# Set API key
os.environ['LANGEXTRACT_API_KEY'] = 'my-gemini-key'
result = lx.extract(
text="Patient prescribed Amoxicillin 500mg twice daily.",
prompt_description="Extract medications, dosages, and frequency",
examples=[
lx.data.ExampleData(
text="Take Aspirin 100mg orally once daily",
extractions=[
lx.data.Extraction(
extraction_class="medication",
extraction_text="Aspirin",
attributes={
"dosage": "100mg",
"route": "orally",
"frequency": "once daily"
}
)
]
)
],
model_id="gemini-1.5-flash"
)

I can explain the key parts:

  • Using LANGEXTRACT_API_KEY environment variable
  • Default parameters work for Gemini
  • No special installation needed

So I tried switching to OpenAI by changing the model_id:

os.environ['LANGEXTRACT_API_KEY'] = 'sk-proj-...' # OpenAI key
result = lx.extract(
text="Patient prescribed Amoxicillin 500mg twice daily.",
prompt_description="Extract medications, dosages, and frequency",
examples=[...], # Same examples
model_id="gpt-4o" # Changed to OpenAI
)

But when I ran this, I got the ModuleNotFoundError: No module named 'openai'.

How to solve it?

Solution #1: Install OpenAI Extra

I checked the LangExtract documentation and found OpenAI support is an optional dependency. I tried installing it:

Terminal
pip install langextract[openai]

Or for development installs:

Terminal
pip install -e ".[openai]"

This fixed the first error.

Solution #2: Fix API Key Variable

After installing the OpenAI dependency, I ran the code again and got AuthenticationError: 401.

I checked the LangExtract source code and found OpenAI uses a different environment variable. I changed from LANGEXTRACT_API_KEY to OPENAI_API_KEY:

openai_example_fixed.py
import langextract as lx
import os
# WRONG for OpenAI
# os.environ['LANGEXTRACT_API_KEY'] = 'sk-proj-...'
# CORRECT for OpenAI
os.environ['OPENAI_API_KEY'] = 'sk-proj-...'

Solution #3: Disable Schema Constraints

After fixing the API key, the code ran but returned empty extractions or JSON parsing errors. I tried checking the actual LLM output and found OpenAI was returning unfenced JSON.

LangExtract has two important parameters for OpenAI:

result = lx.extract(
text="Patient prescribed Amoxicillin 500mg twice daily.",
prompt_description="Extract medications, dosages, and frequency",
examples=[...],
model_id="gpt-4o",
api_key=os.environ['OPENAI_API_KEY'],
use_schema_constraints=False, # OpenAI doesn't support this
fence_output=True # Wrap JSON in markdown code blocks
)

The key changes:

  • use_schema_constraints=False: OpenAI doesn’t support structured output constraints the way Gemini does
  • fence_output=True: Tells the LLM to wrap JSON in markdown code blocks for reliable parsing

Now test again:

print(f"Extracted: {result.extractions}")
# Output: Extracted: [Extraction(extraction_class='medication', extraction_text='Amoxicillin', attributes={'dosage': '500mg', 'frequency': 'twice daily'})]

You can see that I succeeded to extract the medication data with OpenAI.

The reason

I think the key reason for these errors is that LangExtract defaults are optimized for Gemini, not OpenAI:

  1. Missing dependency: OpenAI support is opt-in via [openai] extra, not installed by default
  2. API key mismatch: OpenAI SDK looks for OPENAI_API_KEY, not LANGEXTRACT_API_KEY
  3. Schema constraints: Gemini supports structured output constraints via schema validation, but OpenAI doesn’t have this feature
  4. Output fencing: GPT models often output raw JSON without markdown code blocks, which breaks LangExtract’s parser

Here’s the API key priority matrix:

ProviderPrimary Env VarFallback Env VarNotes
GeminiGEMINI_API_KEYLANGEXTRACT_API_KEYDefault
OpenAIOPENAI_API_KEYLANGEXTRACT_API_KEYRequires [openai] install
Ollama(none needed)(none needed)Local only

The fence_output parameter is critical for OpenAI. Without it:

GPT output: {"entity": "value"} # Raw JSON

With fence_output=True:

GPT output: ```json
{"entity": "value"}
LangExtract's parser knows to extract JSON from the code block.
## Complete Working Example
Here's a complete example with error handling and proper configuration:
```python title="openai_complete_example.py"
import langextract as lx
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Verify API key is set
api_key = os.environ.get('OPENAI_API_KEY')
if not api_key:
raise ValueError(
"OPENAI_API_KEY not set. "
"OpenAI requires specific env variable, not LANGEXTRACT_API_KEY."
)
# Configure OpenAI-specific parameters
result = lx.extract(
text="Patient prescribed Amoxicillin 500mg twice daily.",
prompt_description="Extract medications, dosages, and frequency",
examples=[
lx.data.ExampleData(
text="Take Aspirin 100mg orally once daily",
extractions=[
lx.data.Extraction(
extraction_class="medication",
extraction_text="Aspirin",
attributes={
"dosage": "100mg",
"route": "orally",
"frequency": "once daily"
}
)
]
)
],
model_id="gpt-4o",
api_key=api_key,
fence_output=True, # Required for OpenAI
use_schema_constraints=False # Not supported for OpenAI
)
print(f"Extracted: {result.extractions}")

Summary

In this post, I showed how to resolve LangExtract OpenAI setup errors when switching from Gemini. The key point is OpenAI requires the [openai] extra installation, OPENAI_API_KEY environment variable, and specific parameters (fence_output=True, use_schema_constraints=False) to work properly with LangExtract.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments