How to integrate memsearch with LangChain, LlamaIndex, and CrewAI

Mar 3, 2026

The Problem

I started building AI agents and hit the memory fragmentation problem. Each framework had its own memory solution:

LangChain with InMemoryStore
LlamaIndex with VectorStoreIndex
CrewAI with built-in opaque storage

When I wanted to switch frameworks, I had to rewrite all the memory logic. When I needed to debug what my agent “remembered,” I was stuck with opaque databases that required specific tools to inspect.

The real problem: framework-specific memory creates lock-in and makes it hard to understand what your agent actually knows.

Framework-Agnostic Memory

The solution I found is memsearch—a Python library that treats markdown files as source of truth for memory. The vector database (Milvus) is just a rebuildable index.

Here’s the core API:

from memsearch import MemSearch

mem = MemSearch(paths=["./memory"])
await mem.index()  # Index markdown files
results = await mem.search("query", top_k=3)  # Semantic search

Memory lives in markdown files you can read and edit:

## Team
- Alice: frontend lead
- Bob: backend lead
- Charlie: devops

## Technical Decisions
- Chose Redis for caching over Memcached
- Using PostgreSQL for primary database
- Deploying on AWS ECS

This approach gives you:

Human-readable memory
Git version control
Zero vendor lock-in
Same API across all frameworks

LangChain Integration

LangChain uses retrievers as the interface for memory. You create a BaseRetriever that wraps memsearch:

import asyncio
from typing import List
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever
from langchain_core.callbacks import CallbackManagerForRetrieverRun
from memsearch import MemSearch


class MemSearchRetriever(BaseRetriever):
    """LangChain BaseRetriever wrapper for memsearch."""

    def __init__(self, memory_paths: List[str] = ["./memory"], top_k: int = 3):
        super().__init__()
        self.mem = MemSearch(paths=memory_paths)
        self.top_k = top_k

    def _get_relevant_documents(
        self,
        query: str,
        *,
        run_manager: CallbackManagerForRetrieverRun,
    ) -> List[Document]:
        """Retrieve documents synchronously."""
        results = asyncio.run(self.mem.search(query, top_k=self.top_k))
        return [
            Document(
                page_content=result["content"],
                metadata={
                    "score": result["score"],
                    "source": result.get("source", ""),
                    "chunk_id": result.get("chunk_id", ""),
                },
            )
            for result in results
        ]

    async def _aget_relevant_documents(
        self,
        query: str,
        *,
        run_manager: CallbackManagerForRetrieverRun,
    ) -> List[Document]:
        """Retrieve documents asynchronously."""
        results = await self.mem.search(query, top_k=self.top_k)
        return [
            Document(
                page_content=result["content"],
                metadata={
                    "score": result["score"],
                    "source": result.get("source", ""),
                    "chunk_id": result.get("chunk_id", ""),
                },
            )
            for result in results
        ]

Use it in a LangChain LCEL chain:

import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from memsearch import MemSearch

async def langchain_example():
    # Initialize memsearch and index
    mem = MemSearch(paths=["./memory"])
    await mem.index()

    # Create retriever
    retriever = MemSearchRetriever(memory_paths=["./memory"], top_k=3)

    # Build LCEL chain
    llm = ChatOpenAI(model="gpt-4o-mini")
    prompt = ChatPromptTemplate.from_template(
        """Answer using the following retrieved memories:

Memories:
{context}

Question: {question}

Answer:"""
    )

    chain = (
        {"context": retriever, "question": lambda x: x["question"]}
        | prompt
        | llm
        | StrOutputParser()
    )

    # Query
    result = await chain.ainvoke({"question": "What decisions have we made about caching?"})
    print(result)


asyncio.run(langchain_example())

LangGraph Tool Integration

For LangGraph ReAct agents, wrap memsearch as a tool:

import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from memsearch import MemSearch

# Initialize memsearch
mem = MemSearch(paths=["./memory"])


@tool
async def search_memory(query: str, top_k: int = 3) -> str:
    """Search through persistent markdown memories.

    Use this tool when you need to recall previous information, decisions,
    or knowledge stored in the memory system. Returns relevant memories based
    on semantic similarity.

    Args:
        query: The search query to find relevant memories
        top_k: Number of top results to return (default: 3)
    """
    results = await mem.search(query, top_k=top_k)

    if not results:
        return "No relevant memories found."

    formatted = []
    for i, result in enumerate(results, 1):
        formatted.append(
            f"Memory {i} (Similarity: {result['score']:.3f}):\n"
            f"{result['content'][:200]}...\n"
        )

    return "\n".join(formatted)


async def langgraph_example():
    # Index memories
    await mem.index()

    # Create LLM with tool binding
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    llm_with_tools = llm.bind_tools([search_memory])

    # Create ReAct agent
    agent = create_react_agent(
        llm=llm_with_tools,
        tools=[search_memory],
        state_modifier="You have access to a memory search tool. "
        "Always search memory before answering questions about past decisions or knowledge."
    )

    # Execute agent
    response = await agent.ainvoke(
        {"messages": [("user", "What caching solution did we choose?")]}
    )

    print(response["messages"][-1].content)


asyncio.run(langgraph_example())

The tool docstring tells the agent when and how to use memory search.

LlamaIndex Integration

LlamaIndex requires extending BaseRetriever:

import asyncio
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import QueryBundle, NodeWithScore, TextNode
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.llms.openai import OpenAI
from memsearch import MemSearch


class MemSearchRetriever(BaseRetriever):
    """LlamaIndex BaseRetriever wrapper for memsearch."""

    def __init__(self, memory_paths: list = ["./memory"], top_k: int = 3):
        super().__init__()
        self.mem = MemSearch(paths=memory_paths)
        self.top_k = top_k

    def _retrieve(self, query_bundle: QueryBundle) -> list[NodeWithScore]:
        """Retrieve nodes for given query."""
        # Run async search in sync context
        results = asyncio.run(
            self.mem.search(query_bundle.query_str, top_k=self.top_k)
        )

        # Convert memsearch results to LlamaIndex NodeWithScore
        nodes_with_scores = []
        for result in results:
            node = TextNode(
                text=result["content"],
                metadata={
                    "score": result["score"],
                    "source": result.get("source", ""),
                    "chunk_id": result.get("chunk_id", ""),
                },
            )
            nodes_with_scores.append(
                NodeWithScore(node=node, score=result["score"])
            )

        return nodes_with_scores


async def llamaindex_example():
    # Initialize memsearch and index
    mem = MemSearch(paths=["./memory"])
    await mem.index()

    # Create custom retriever
    retriever = MemSearchRetriever(memory_paths=["./memory"], top_k=3)

    # Create query engine with custom retriever
    llm = OpenAI(model="gpt-4o-mini")
    query_engine = RetrieverQueryEngine.from_args(
        retriever=retriever,
        llm=llm,
        verbose=True,
    )

    # Query
    response = query_engine.query("What decisions have we made about system architecture?")
    print(response)


asyncio.run(llamaindex_example())

CrewAI Integration

CrewAI uses tool decorators or BaseTool classes:

import asyncio
from typing import Type
from pydantic import BaseModel, Field
from crewai.tools import BaseTool
from crewai import Agent, Task, Crew, Process, LLM
from memsearch import MemSearch


class MemSearchInput(BaseModel):
    """Input schema for MemSearchTool."""
    query: str = Field(
        ..., description="The search query to find relevant memories"
    )
    top_k: int = Field(
        default=3, description="Number of top results to return"
    )


class MemSearchTool(BaseTool):
    """CrewAI tool wrapper for memsearch."""

    name: str = "MemSearch"
    description: str = (
        "Search through persistent markdown memories stored in memsearch. "
        "Use this tool when you need to recall previous information, "
        "decisions, or knowledge stored in the memory system. "
        "Returns relevant memories based on semantic similarity."
    )
    args_schema: Type[BaseModel] = MemSearchInput

    def __init__(self, memory_paths: list = ["./memory"], **kwargs):
        super().__init__(**kwargs)
        self.mem = MemSearch(paths=memory_paths)

    def _run(self, query: str, top_k: int = 3) -> str:
        """Execute memsearch synchronously."""
        try:
            # Run async search in sync context
            results = asyncio.run(self.mem.search(query, top_k=top_k))

            if not results:
                return "No relevant memories found."

            # Format results for CrewAI
            formatted_results = []
            for i, result in enumerate(results, 1):
                formatted_results.append(
                    f"Memory {i} (Similarity: {result['score']:.3f}):\n"
                    f"{result['content'][:200]}...\n"
                )

            return "\n".join(formatted_results)
        except Exception as e:
            return f"Error searching memory: {str(e)}"


async def crewai_example():
    # Initialize memsearch and index
    mem = MemSearch(paths=["./memory"])
    await mem.index()

    # Create memsearch tool
    memory_tool = MemSearchTool(memory_paths=["./memory"])

    # Create agent with memory tool
    llm = LLM(model="openai/gpt-4o-mini")
    agent = Agent(
        role="Memory-Enabled Assistant",
        goal="Provide answers with context from persistent memory",
        backstory=(
            "You have access to a memory system containing past "
            "decisions and knowledge. Always search memory before "
            "answering questions about history or context."
        ),
        tools=[memory_tool],
        llm=llm,
        verbose=True,
    )

    # Create task
    task = Task(
        description="Answer: {question}",
        expected_output="A comprehensive answer based on memory search results",
        agent=agent,
    )

    # Create and execute crew
    crew = Crew(
        agents=[agent],
        tasks=[task],
        process=Process.sequential,
        verbose=True,
    )

    result = crew.kickoff(inputs={"question": "What caching solution did we choose?"})
    print(result)


asyncio.run(crewai_example())

Common Integration Pitfalls

I ran into these problems while integrating across frameworks:

Forgetting to index before search:

# WRONG: Search without indexing
mem = MemSearch(paths=["./memory"])
results = await mem.search("query")  # Returns empty

# CORRECT: Index first
mem = MemSearch(paths=["./memory"])
await mem.index()  # Build vector index
results = await mem.search("query")

Mixing async/sync in LangChain:

# WRONG: Using async method in sync context
class BadRetriever(BaseRetriever):
    def _get_relevant_documents(self, query):
        return await self.mem.search(query)  # SyntaxError

# CORRECT: Run async in sync context
import asyncio
class GoodRetriever(BaseRetriever):
    def _get_relevant_documents(self, query):
        return asyncio.run(self.mem.search(query))

Wrong return type in LlamaIndex:

# WRONG: Returning dict instead of NodeWithScore
def _retrieve(self, query_bundle):
    return {"content": "...", "score": 0.9}

# CORRECT: Return NodeWithScore objects
from llama_index.core.schema import NodeWithScore, TextNode
def _retrieve(self, query_bundle):
    return [NodeWithScore(node=TextNode(text="..."), score=0.9)]

Not handling async in CrewAI tools:

# WRONG: Async method in sync _run
class BadTool(BaseTool):
    def _run(self, query):
        return await self.mem.search(query)

# CORRECT: Use asyncio.run
import asyncio
class GoodTool(BaseTool):
    def _run(self, query):
        return asyncio.run(self.mem.search(query))

Provider mismatch:

# WRONG: Provider mismatch
pip install memsearch  # Only OpenAI included
mem = MemSearch(paths=["./memory"], embedding_provider="ollama")  # Error!

# CORRECT: Install required extras
pip install "memsearch[ollama]"
mem = MemSearch(paths=["./memory"], embedding_provider="ollama")

Framework-Agnostic Agent

Here’s a complete agent that works with any LLM provider:

import asyncio
from datetime import date
from pathlib import Path
from openai import OpenAI
from memsearch import MemSearch


class UniversalMemoryAgent:
    """Framework-agnostic agent with persistent memory.

    Works with OpenAI, Anthropic Claude, Ollama, or any LLM provider.
    Memory is managed by memsearch and stored as markdown files.
    """

    def __init__(
        self,
        memory_paths: list = ["./memory"],
        llm_provider: str = "openai",
        top_k: int = 3,
    ):
        self.memory_paths = memory_paths
        self.mem = MemSearch(paths=memory_paths)
        self.top_k = top_k
        self.llm = OpenAI()  # Could be Anthropic(), Ollama(), etc.

    def save_memory(self, content: str, category: str = "general") -> None:
        """Append a note to today's memory log."""
        memory_dir = Path(self.memory_paths[0]) if self.memory_paths else Path("./memory")
        memory_dir.mkdir(parents=True, exist_ok=True)

        today = date.today()
        file_path = memory_dir / f"{today}.md"

        with open(file_path, "a", encoding="utf-8") as f:
            f.write(f"\n## {category.title()}\n{content}\n")

    async def recall(self, query: str) -> list[dict]:
        """Recall relevant memories for query."""
        results = await self.mem.search(query, top_k=self.top_k)
        return results

    async def chat(self, user_input: str) -> str:
        """Process user input with memory recall and storage.

        Implements Recall-Think-Remember pattern:
        1. Recall - search past memories for relevant context
        2. Think - call LLM with memory context
        3. Remember - save this exchange and index it
        """
        # 1. Recall
        memories = await self.recall(user_input)
        context = "\n".join(
            f"- {m['content'][:200]}..." for m in memories
        )

        # 2. Think
        system_prompt = (
            f"You have these memories:\n{context}\n\n"
            "Use this context to inform your response. "
            "If memories are relevant, reference them. "
            "If not, answer based on your general knowledge."
        )

        resp = self.llm.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_input},
            ],
        )
        answer = resp.choices[0].message.content

        # 3. Remember
        self.save_memory(f"## {user_input}\n{answer}", "conversation")
        await self.mem.index()

        return answer


async def main():
    # Initialize agent
    agent = UniversalMemoryAgent(memory_paths=["./memory"])

    # Seed some initial knowledge
    agent.save_memory(
        "## Team\n- Alice: frontend lead\n- Bob: backend lead\n- Charlie: devops",
        "team"
    )
    agent.save_memory(
        "## Technical Decisions\n- Chose Redis for caching over Memcached\n- Using PostgreSQL for primary database\n- Deploying on AWS ECS",
        "decisions"
    )
    await agent.mem.index()

    # Chat with memory
    while True:
        user_input = input("\nYou: ")
        if user_input.lower() in ["quit", "exit"]:
            break

        response = await agent.chat(user_input)
        print(f"\nAgent: {response}")


asyncio.run(main())

The Reason

The key insight is that framework-agnostic memory separates concerns:

Memory system: Stores and retrieves knowledge (memsearch)
Agent framework: Orchestrates tool use and reasoning (LangChain, LlamaIndex, CrewAI)
LLM provider: Generates responses (OpenAI, Anthropic, Ollama)

When each layer does one thing well, you can swap components without rewriting everything.

Summary

I showed how to integrate memsearch with LangChain, LlamaIndex, and CrewAI. The pattern is consistent: wrap memsearch’s simple API to match each framework’s expected interface.

Key takeaways:

Use BaseRetriever for LangChain and LlamaIndex
Use @tool decorator or BaseTool for CrewAI and LangGraph
Markdown files are source of truth—vector DB is just a derived index
The same MemSearch instance can be shared across multiple frameworks
Zero vendor lock-in means easy framework switching

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 memsearch - A Markdown-first memory system
👨‍💻 LangChain Documentation - Retrievers
👨‍💻 LangGraph Persistence
👨‍💻 LlamaIndex Documentation
👨‍💻 CrewAI Documentation

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!