Building a Knowledge Base Chatbot with AstrBot: RAG Implementation Guide

Mar 3, 2026

Problem

I wanted to build a chatbot that could answer questions based on my company’s internal documentation. But when I asked my LLM-powered bot about our product specs, it gave me generic answers or just made things up.

User: What's the warranty period for our Product-X?

Bot: I don't have specific information about Product-X. Generally, warranty
periods vary by manufacturer and product type. You should check the product
documentation or contact the manufacturer directly.

Our support team had all the answers in our documentation, but the chatbot couldn’t access them. I needed a way to make the LLM use our actual documents when responding.

Environment

AstrBot latest version
Python 3.10+
OpenAI API (for embeddings and LLM)
Document formats: PDF, TXT, MD

What is RAG and Why You Need It

The core problem is that LLMs don’t know your private documents. They’re trained on public data, and they can’t magically access your internal wikis, product manuals, or support docs.

Retrieval-Augmented Generation (RAG) solves this by:

Converting your documents into vector embeddings
Storing them in a searchable format
Finding relevant documents when a user asks a question
Feeding those documents to the LLM as context

Here’s the basic flow:

Documents → Chunking → Embeddings → Vector Store
                                      ↓
Query → Embedding → Similarity Search → Context + Query → LLM → Response

Without RAG, your chatbot is just guessing. With RAG, it’s answering from your actual knowledge base.

AstrBot’s Knowledge Base Architecture

AstrBot has a built-in knowledge base system that handles the entire RAG pipeline. Here’s what it provides:

Document Processing: Handles PDF, TXT, MD, images, and more
Embedding Generation: Uses OpenAI-compatible embedding models
Vector Storage: Built-in vector database for similarity search
Context Management: Compresses context to fit within token limits
Multi-Platform Support: Same knowledge base works on QQ, Telegram, WeChat Work, etc.

The architecture looks like this:

┌─────────────────────────────────────────────────────────┐
│                   Knowledge Base                         │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────┐      │
│  │ Documents │→│ Chunking │→│ Embedding Model   │      │
│  └──────────┘  └──────────┘  └──────────────────┘      │
│                                     ↓                   │
│                           ┌──────────────────┐         │
│                           │  Vector Store    │         │
│                           └──────────────────┘         │
└─────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────┐
│                    AstrBot Core                          │
│  Query → Retrieval → Context Compression → LLM → Response│
└─────────────────────────────────────────────────────────┘
                          ↓
        ┌────────────┬────────────┬────────────┐
        ↓            ↓            ↓            ↓
       QQ        Telegram     WeChat      WebUI

Setting Up the Knowledge Base

Step 1: Configure LLM Provider

First, I needed to configure my LLM provider in AstrBot’s WebUI. I used OpenAI, but AstrBot supports DeepSeek, Ollama, and other OpenAI-compatible providers.

The key is ensuring your provider supports both:

Chat completions (for responses)
Embeddings (for document vectors)

Step 2: Upload Documents

I started by uploading our product documentation through the WebUI:

Navigate to “Knowledge Base” in the sidebar
Click “Upload Documents”
Select files (PDF, TXT, MD supported)
Wait for processing

AstrBot automatically:

Parses the documents
Splits them into chunks
Generates embeddings
Stores in the vector database

Step 3: Configure Chunking Parameters

The default chunking settings weren’t ideal for my technical documentation. I adjusted them:

knowledge_base:
  enabled: true
  chunk_size: 512          # Characters per chunk
  chunk_overlap: 50        # Overlap between chunks
  embedding_model: "text-embedding-3-small"
  retrieval:
    top_k: 5               # Number of chunks to retrieve
    similarity_threshold: 0.7  # Minimum similarity score

I found that chunk_size: 512 works well for technical docs - small enough to be precise, large enough to capture complete concepts. The chunk_overlap: 50 helps maintain context across chunk boundaries.

Step 4: Configure RAG Retrieval

The retrieval settings control how AstrBot finds relevant documents:

rag:
  enabled: true
  context_compression:
    enabled: true
    max_tokens: 2000       # Max context tokens sent to LLM
  reranking:
    enabled: false         # Enable if you have a reranking model

Context compression is crucial for long documents. AstrBot will:

Retrieve the top-k most relevant chunks
Compress them to fit within max_tokens
Send compressed context to the LLM

Step 5: Set Up Persona

I configured the bot’s persona to use the knowledge base properly:

persona:
  name: "Support Bot"
  system_prompt: |
    You are a helpful customer support assistant.
    Answer questions using ONLY the provided knowledge base context.
    If the answer is not in the context, say "I don't have that information."
    Be concise and specific.
  use_knowledge_base: true

The key instruction is “Answer questions using ONLY the provided knowledge base context.” This prevents the LLM from hallucinating answers.

Testing the Knowledge Base

I tested with a real question from our support tickets:

User: What's the warranty period for Product-X?

Bot: According to the product documentation, Product-X comes with a 24-month
warranty from the date of purchase. The warranty covers manufacturing defects
but does not cover damage from misuse or unauthorized modifications.

The bot now pulls from our actual documentation instead of giving generic responses.

What Didn’t Work Initially

Problem 1: Retrieval Not Finding Relevant Documents

My first attempt returned irrelevant results. The issue was the similarity_threshold was too low (0.5).

I increased it to 0.7:

retrieval:
  similarity_threshold: 0.7

Now the bot only returns documents with higher relevance scores.

Problem 2: Context Window Overflow

When I uploaded a large product manual, the context exceeded the LLM’s token limit. I got truncated responses.

The fix was enabling context compression:

context_compression:
  enabled: true
  max_tokens: 2000

AstrBot now compresses retrieved chunks before sending to the LLM.

Problem 3: Chunk Size Too Large

With chunk_size: 1024, retrieval was imprecise. Queries like “warranty period” would return entire sections instead of specific paragraphs.

I reduced chunk size to 512 characters:

chunk_size: 512
chunk_overlap: 50

Smaller chunks mean more precise retrieval.

Advanced Techniques

Hybrid Search

AstrBot supports hybrid search combining semantic and keyword matching. This helps when users use specific terminology:

retrieval:
  hybrid_search:
    enabled: true
    semantic_weight: 0.7
    keyword_weight: 0.3

Semantic search catches conceptual matches, keyword search catches exact terms.

Multi-Document Retrieval

When I uploaded multiple product manuals, AstrBot handles cross-document queries:

User: Compare Product-X and Product-Y warranty terms.

Bot: Product-X has a 24-month warranty covering manufacturing defects.
Product-Y has an 18-month warranty with the same coverage. Product-X offers
6 additional months of coverage.

Dify Integration for Advanced RAG

For more complex RAG needs, AstrBot integrates with Dify:

provider: dify
api_endpoint: "https://your-dify-instance.com/v1"
api_key: "${DIFY_API_KEY}"
dataset_id: "your-dataset-id"

Dify offers advanced features like:

Segmentation strategies
Q&A mode extraction
Hybrid search with reranking

Deploying to Multiple Platforms

One of AstrBot’s strengths is deploying the same knowledge base to multiple messaging platforms.

I deployed our support bot to both QQ and Telegram:

         Knowledge Base (Single Source)
                    ↓
              AstrBot Core
                    ↓
        ┌───────────┼───────────┐
        ↓           ↓           ↓
    QQ Channel  Telegram    WeChat Work
    Support Bot  Support    Support Bot

Each platform uses the same knowledge base, so updates propagate automatically. I just needed to configure the platform adapters in the WebUI.

Programmatic Access

For custom integrations, I can query the knowledge base programmatically:

from astrbot.api.context import Context

async def query_knowledge(query: str) -> str:
    """Query the knowledge base and return answer."""
    context = Context()
    result = await context.knowledge_base.retrieve(query)
    return result['answer']

# Example usage
answer = await query_knowledge("What is the warranty period for Product-X?")
print(answer)

This is useful for integrating the knowledge base into custom applications or workflows.

Best Practices Learned

Document Structure

Well-structured documents improve retrieval:

Use clear headings and sections
Put key information in complete sentences
Avoid large tables or complex formatting
Add summaries at the beginning of long sections

Chunk Size Guidelines

Document Type	Recommended Chunk Size	Reason
Technical docs	400-600	Precise retrieval of specific info
FAQs	200-300	Each Q&A pair should be one chunk
Long manuals	600-800	Capture complete procedures
Policy documents	500-700	Balance between context and precision

Performance Tips

Monitor retrieval latency - Large knowledge bases may need indexing optimization
Cache frequent queries - AstrBot caches retrieval results by default
Batch document uploads - Upload multiple documents at once for efficient processing
Regular updates - Re-upload documents when they change significantly

Common Issues and Solutions

Issue: Bot Says “I Don’t Know” Too Often

Cause: Similarity threshold too high or documents not properly parsed.

Fix:

retrieval:
  similarity_threshold: 0.6  # Lower from 0.7

Also check that documents were parsed correctly in the WebUI.

Issue: Responses Are Generic

Cause: Knowledge base not enabled or persona not configured.

Fix: Ensure use_knowledge_base: true in persona config and verify the system prompt instructs the LLM to use the provided context.

Issue: Slow Retrieval

Cause: Large knowledge base or inefficient chunking.

Fix:

Reduce top_k value (e.g., from 10 to 5)
Enable context compression
Consider splitting very large documents

Summary

In this post, I demonstrated how to build a knowledge base chatbot with AstrBot using RAG. The key points are:

RAG connects your documents to the LLM, eliminating hallucinations
Proper chunking and retrieval settings are critical for accuracy
Context compression handles long documents
AstrBot’s multi-platform support lets you deploy one knowledge base everywhere

The most important configuration is the persona system prompt instructing the LLM to use only the provided knowledge base context. Without this, the LLM will still make up answers.

For more advanced RAG features like reranking and segmentation, consider integrating with Dify. But for most use cases, AstrBot’s native knowledge base is sufficient and simpler to set up.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Building a Knowledge Base Chatbot with AstrBot: RAG Implementation Guide

Problem

Environment

What is RAG and Why You Need It

AstrBot’s Knowledge Base Architecture

Setting Up the Knowledge Base

Step 1: Configure LLM Provider

Step 2: Upload Documents

Step 3: Configure Chunking Parameters

Step 4: Configure RAG Retrieval

Step 5: Set Up Persona

Testing the Knowledge Base

What Didn’t Work Initially

Problem 1: Retrieval Not Finding Relevant Documents

Problem 2: Context Window Overflow

Problem 3: Chunk Size Too Large

Advanced Techniques

Hybrid Search

Multi-Document Retrieval

Dify Integration for Advanced RAG

Deploying to Multiple Platforms

Programmatic Access

Best Practices Learned

Document Structure

Chunk Size Guidelines

Performance Tips

Common Issues and Solutions

Issue: Bot Says “I Don’t Know” Too Often

Issue: Responses Are Generic

Issue: Slow Retrieval

Summary

Final Words + More Resources

Comments