Skip to content

Building a Knowledge Base Chatbot with AstrBot: RAG Implementation Guide

Problem

I wanted to build a chatbot that could answer questions based on my company’s internal documentation. But when I asked my LLM-powered bot about our product specs, it gave me generic answers or just made things up.

chat-session
User: What's the warranty period for our Product-X?
Bot: I don't have specific information about Product-X. Generally, warranty
periods vary by manufacturer and product type. You should check the product
documentation or contact the manufacturer directly.

Our support team had all the answers in our documentation, but the chatbot couldn’t access them. I needed a way to make the LLM use our actual documents when responding.

Environment

  • AstrBot latest version
  • Python 3.10+
  • OpenAI API (for embeddings and LLM)
  • Document formats: PDF, TXT, MD

What is RAG and Why You Need It

The core problem is that LLMs don’t know your private documents. They’re trained on public data, and they can’t magically access your internal wikis, product manuals, or support docs.

Retrieval-Augmented Generation (RAG) solves this by:

  1. Converting your documents into vector embeddings
  2. Storing them in a searchable format
  3. Finding relevant documents when a user asks a question
  4. Feeding those documents to the LLM as context

Here’s the basic flow:

rag-flow-diagram
Documents → Chunking → Embeddings → Vector Store
Query → Embedding → Similarity Search → Context + Query → LLM → Response

Without RAG, your chatbot is just guessing. With RAG, it’s answering from your actual knowledge base.

AstrBot’s Knowledge Base Architecture

AstrBot has a built-in knowledge base system that handles the entire RAG pipeline. Here’s what it provides:

  • Document Processing: Handles PDF, TXT, MD, images, and more
  • Embedding Generation: Uses OpenAI-compatible embedding models
  • Vector Storage: Built-in vector database for similarity search
  • Context Management: Compresses context to fit within token limits
  • Multi-Platform Support: Same knowledge base works on QQ, Telegram, WeChat Work, etc.

The architecture looks like this:

astrbot-architecture
┌─────────────────────────────────────────────────────────┐
│ Knowledge Base │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Documents │→│ Chunking │→│ Embedding Model │ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
│ ↓ │
│ ┌──────────────────┐ │
│ │ Vector Store │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ AstrBot Core │
│ Query → Retrieval → Context Compression → LLM → Response│
└─────────────────────────────────────────────────────────┘
┌────────────┬────────────┬────────────┐
↓ ↓ ↓ ↓
QQ Telegram WeChat WebUI

Setting Up the Knowledge Base

Step 1: Configure LLM Provider

First, I needed to configure my LLM provider in AstrBot’s WebUI. I used OpenAI, but AstrBot supports DeepSeek, Ollama, and other OpenAI-compatible providers.

The key is ensuring your provider supports both:

  • Chat completions (for responses)
  • Embeddings (for document vectors)

Step 2: Upload Documents

I started by uploading our product documentation through the WebUI:

  1. Navigate to “Knowledge Base” in the sidebar
  2. Click “Upload Documents”
  3. Select files (PDF, TXT, MD supported)
  4. Wait for processing

AstrBot automatically:

  • Parses the documents
  • Splits them into chunks
  • Generates embeddings
  • Stores in the vector database

Step 3: Configure Chunking Parameters

The default chunking settings weren’t ideal for my technical documentation. I adjusted them:

knowledge-base-config.yaml
knowledge_base:
enabled: true
chunk_size: 512 # Characters per chunk
chunk_overlap: 50 # Overlap between chunks
embedding_model: "text-embedding-3-small"
retrieval:
top_k: 5 # Number of chunks to retrieve
similarity_threshold: 0.7 # Minimum similarity score

I found that chunk_size: 512 works well for technical docs - small enough to be precise, large enough to capture complete concepts. The chunk_overlap: 50 helps maintain context across chunk boundaries.

Step 4: Configure RAG Retrieval

The retrieval settings control how AstrBot finds relevant documents:

rag-config.yaml
rag:
enabled: true
context_compression:
enabled: true
max_tokens: 2000 # Max context tokens sent to LLM
reranking:
enabled: false # Enable if you have a reranking model

Context compression is crucial for long documents. AstrBot will:

  1. Retrieve the top-k most relevant chunks
  2. Compress them to fit within max_tokens
  3. Send compressed context to the LLM

Step 5: Set Up Persona

I configured the bot’s persona to use the knowledge base properly:

persona-config.yaml
persona:
name: "Support Bot"
system_prompt: |
You are a helpful customer support assistant.
Answer questions using ONLY the provided knowledge base context.
If the answer is not in the context, say "I don't have that information."
Be concise and specific.
use_knowledge_base: true

The key instruction is “Answer questions using ONLY the provided knowledge base context.” This prevents the LLM from hallucinating answers.

Testing the Knowledge Base

I tested with a real question from our support tickets:

test-session
User: What's the warranty period for Product-X?
Bot: According to the product documentation, Product-X comes with a 24-month
warranty from the date of purchase. The warranty covers manufacturing defects
but does not cover damage from misuse or unauthorized modifications.

The bot now pulls from our actual documentation instead of giving generic responses.

What Didn’t Work Initially

Problem 1: Retrieval Not Finding Relevant Documents

My first attempt returned irrelevant results. The issue was the similarity_threshold was too low (0.5).

I increased it to 0.7:

similarity_threshold.yaml
retrieval:
similarity_threshold: 0.7

Now the bot only returns documents with higher relevance scores.

Problem 2: Context Window Overflow

When I uploaded a large product manual, the context exceeded the LLM’s token limit. I got truncated responses.

The fix was enabling context compression:

context_compression.yaml
context_compression:
enabled: true
max_tokens: 2000

AstrBot now compresses retrieved chunks before sending to the LLM.

Problem 3: Chunk Size Too Large

With chunk_size: 1024, retrieval was imprecise. Queries like “warranty period” would return entire sections instead of specific paragraphs.

I reduced chunk size to 512 characters:

chunk_size.yaml
chunk_size: 512
chunk_overlap: 50

Smaller chunks mean more precise retrieval.

Advanced Techniques

AstrBot supports hybrid search combining semantic and keyword matching. This helps when users use specific terminology:

hybrid_search.yaml
retrieval:
hybrid_search:
enabled: true
semantic_weight: 0.7
keyword_weight: 0.3

Semantic search catches conceptual matches, keyword search catches exact terms.

Multi-Document Retrieval

When I uploaded multiple product manuals, AstrBot handles cross-document queries:

multi-doc-session
User: Compare Product-X and Product-Y warranty terms.
Bot: Product-X has a 24-month warranty covering manufacturing defects.
Product-Y has an 18-month warranty with the same coverage. Product-X offers
6 additional months of coverage.

Dify Integration for Advanced RAG

For more complex RAG needs, AstrBot integrates with Dify:

dify-integration.yaml
provider: dify
api_endpoint: "https://your-dify-instance.com/v1"
api_key: "${DIFY_API_KEY}"
dataset_id: "your-dataset-id"

Dify offers advanced features like:

  • Segmentation strategies
  • Q&A mode extraction
  • Hybrid search with reranking

Deploying to Multiple Platforms

One of AstrBot’s strengths is deploying the same knowledge base to multiple messaging platforms.

I deployed our support bot to both QQ and Telegram:

multi-platform-deployment
Knowledge Base (Single Source)
AstrBot Core
┌───────────┼───────────┐
↓ ↓ ↓
QQ Channel Telegram WeChat Work
Support Bot Support Support Bot

Each platform uses the same knowledge base, so updates propagate automatically. I just needed to configure the platform adapters in the WebUI.

Programmatic Access

For custom integrations, I can query the knowledge base programmatically:

kb_query.py
from astrbot.api.context import Context
async def query_knowledge(query: str) -> str:
"""Query the knowledge base and return answer."""
context = Context()
result = await context.knowledge_base.retrieve(query)
return result['answer']
# Example usage
answer = await query_knowledge("What is the warranty period for Product-X?")
print(answer)

This is useful for integrating the knowledge base into custom applications or workflows.

Best Practices Learned

Document Structure

Well-structured documents improve retrieval:

  1. Use clear headings and sections
  2. Put key information in complete sentences
  3. Avoid large tables or complex formatting
  4. Add summaries at the beginning of long sections

Chunk Size Guidelines

Document TypeRecommended Chunk SizeReason
Technical docs400-600Precise retrieval of specific info
FAQs200-300Each Q&A pair should be one chunk
Long manuals600-800Capture complete procedures
Policy documents500-700Balance between context and precision

Performance Tips

  1. Monitor retrieval latency - Large knowledge bases may need indexing optimization
  2. Cache frequent queries - AstrBot caches retrieval results by default
  3. Batch document uploads - Upload multiple documents at once for efficient processing
  4. Regular updates - Re-upload documents when they change significantly

Common Issues and Solutions

Issue: Bot Says “I Don’t Know” Too Often

Cause: Similarity threshold too high or documents not properly parsed.

Fix:

fix_similarity.yaml
retrieval:
similarity_threshold: 0.6 # Lower from 0.7

Also check that documents were parsed correctly in the WebUI.

Issue: Responses Are Generic

Cause: Knowledge base not enabled or persona not configured.

Fix: Ensure use_knowledge_base: true in persona config and verify the system prompt instructs the LLM to use the provided context.

Issue: Slow Retrieval

Cause: Large knowledge base or inefficient chunking.

Fix:

  1. Reduce top_k value (e.g., from 10 to 5)
  2. Enable context compression
  3. Consider splitting very large documents

Summary

In this post, I demonstrated how to build a knowledge base chatbot with AstrBot using RAG. The key points are:

  1. RAG connects your documents to the LLM, eliminating hallucinations
  2. Proper chunking and retrieval settings are critical for accuracy
  3. Context compression handles long documents
  4. AstrBot’s multi-platform support lets you deploy one knowledge base everywhere

The most important configuration is the persona system prompt instructing the LLM to use only the provided knowledge base context. Without this, the LLM will still make up answers.

For more advanced RAG features like reranking and segmentation, consider integrating with Dify. But for most use cases, AstrBot’s native knowledge base is sufficient and simpler to set up.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments