Skip to content

LangGraph Production Building Blocks: Session Management, Retry, Auth, and Scaling

A Reddit post caught my eye a few weeks ago. Someone asked: “I use SpringBoot for my backend — it gives me session management, retry, auth, and horizontal scaling out of the box. Does LangGraph provide these building blocks, or do I build them myself?”

Good question. I’ve been running LangGraph agents in production for about six months, and I hit the exact same wall. The docs show you how to build a graph. They don’t show you how to run it at scale.

Here’s what I found.

What LangGraph Does and Doesn’t Give You

LangGraph handles agent state well. It saves checkpoints, manages conversation threads, and resumes work. Everything else — auth, retries for external services, scaling — you supply yourself.

Think of LangGraph as the engine. You still need the chassis, wheels, and steering wheel.

Let me walk through each building block.

Session Management

This is LangGraph’s strongest offering.

Every conversation gets a thread_id. LangGraph saves the graph state (messages, variables, node positions) after each step. Give it the same thread_id again, and it loads the saved state.

checkpoint_saver.py
from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver.from_conn_string(
"postgresql://user:pass@host:5432/langgraph?pool_size=20"
)
config = {"configurable": {"thread_id": "user-123-session-456"}}
result = graph.invoke({"messages": [msg]}, config)
resumed = graph.invoke(
{"messages": [{"role": "user", "content": "follow up"}]},
config
)

Same thread_id = same session. That’s it.

For production, use PostgreSQL or MongoDB as the checkpoint backend. SQLite works for local dev but falls apart under concurrency. One thing I learned the hard way: session data grows. A 50-turn conversation with tool outputs can blow past hundreds of kilobytes. Schedule periodic checkpoint cleanup, or summarize old turns before they accumulate.

Retry Mechanisms

LangGraph does not retry failed nodes. It crashes and raises the exception. You build retry at two levels.

Level 1: LLM call retry. LangChain’s chat models have max_retries and request_timeout. This handles transient API failures — network blips, rate limit 429s.

llm_retry.py
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
max_retries=3,
request_timeout=60,
)

Level 2: Node-level retry. Wrap your node functions with tenacity or backoff.

node_retry.py
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def fetch_external_data(query: str) -> str:
return external_api.get_data(query)

For graph-level retry, I use a dead-letter queue pattern. Failed graph invocations go into a DLQ table. A separate worker picks them up and retries. No built-in circuit breaker, either — I use pybreaker at the API gateway layer.

Authentication

LangGraph has no auth. Not even basic auth. It’s a pure library — it just runs the graph.

The pattern is straightforward: put auth at the web framework level.

Request → API Gateway (JWT validation) → FastAPI middleware → LangGraph worker
auth_middleware.py
from fastapi import FastAPI, Depends, HTTPException
from fastapi.security import HTTPBearer
app = FastAPI()
security = HTTPBearer()
async def verify_token(credentials = Depends(security)):
if not validate_jwt(credentials.credentials):
raise HTTPException(status_code=401)
@app.post("/graph/invoke")
async def invoke_graph(_, token=Depends(verify_token)):
result = graph.invoke(input_data)
return result

LangGraph Cloud gives you API key-based auth for managed deployments. Self-hosted? You implement JWT, OAuth2, or API keys yourself.

Scaling to Thousands of Users

LangGraph scales horizontally, but the design matters.

Workers are stateless. All state lives in the checkpoint store. Deploy identical workers behind a load balancer — no session affinity needed.

scaling_architecture.txt
Load Balancer
|
+---------+---------+
| | |
Worker1 Worker2 Worker3
| | |
+---- Shared DB ----+

Each worker uses thread_id to load the correct state from the shared database.

The checkpoint store is the bottleneck. I learned this when my PostgreSQL connection pool ran dry at 50 concurrent sessions. Tune your pool size, add read replicas, and watch your query latency. For sub-millisecond checkpoint access, LangGraph has an experimental RedisSaver.

LangGraph Cloud manages auto-scaling for you — but it limits customization. If you need control, self-host.

Summary

In this post, I broke down what you get and what you build yourself. LangGraph gives you session management through thread_id and checkpointing with PostgreSQL or MongoDB. Retry lives at the LLM layer and your node functions. Auth belongs in your web framework. Scaling works if you keep workers stateless and share a single checkpoint store.

The Reddit OP was right to ask. LangGraph is the engine, not the car. You still need to build the rest.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments