Building a Knowledge Base Chatbot with AstrBot: RAG Implementation Guide
Problem
I wanted to build a chatbot that could answer questions based on my company’s internal documentation. But when I asked my LLM-powered bot about our product specs, it gave me generic answers or just made things up.
User: What's the warranty period for our Product-X?
Bot: I don't have specific information about Product-X. Generally, warrantyperiods vary by manufacturer and product type. You should check the productdocumentation or contact the manufacturer directly.Our support team had all the answers in our documentation, but the chatbot couldn’t access them. I needed a way to make the LLM use our actual documents when responding.
Environment
- AstrBot latest version
- Python 3.10+
- OpenAI API (for embeddings and LLM)
- Document formats: PDF, TXT, MD
What is RAG and Why You Need It
The core problem is that LLMs don’t know your private documents. They’re trained on public data, and they can’t magically access your internal wikis, product manuals, or support docs.
Retrieval-Augmented Generation (RAG) solves this by:
- Converting your documents into vector embeddings
- Storing them in a searchable format
- Finding relevant documents when a user asks a question
- Feeding those documents to the LLM as context
Here’s the basic flow:
Documents → Chunking → Embeddings → Vector Store ↓Query → Embedding → Similarity Search → Context + Query → LLM → ResponseWithout RAG, your chatbot is just guessing. With RAG, it’s answering from your actual knowledge base.
AstrBot’s Knowledge Base Architecture
AstrBot has a built-in knowledge base system that handles the entire RAG pipeline. Here’s what it provides:
- Document Processing: Handles PDF, TXT, MD, images, and more
- Embedding Generation: Uses OpenAI-compatible embedding models
- Vector Storage: Built-in vector database for similarity search
- Context Management: Compresses context to fit within token limits
- Multi-Platform Support: Same knowledge base works on QQ, Telegram, WeChat Work, etc.
The architecture looks like this:
┌─────────────────────────────────────────────────────────┐│ Knowledge Base ││ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ ││ │ Documents │→│ Chunking │→│ Embedding Model │ ││ └──────────┘ └──────────┘ └──────────────────┘ ││ ↓ ││ ┌──────────────────┐ ││ │ Vector Store │ ││ └──────────────────┘ │└─────────────────────────────────────────────────────────┘ ↓┌─────────────────────────────────────────────────────────┐│ AstrBot Core ││ Query → Retrieval → Context Compression → LLM → Response│└─────────────────────────────────────────────────────────┘ ↓ ┌────────────┬────────────┬────────────┐ ↓ ↓ ↓ ↓ QQ Telegram WeChat WebUISetting Up the Knowledge Base
Step 1: Configure LLM Provider
First, I needed to configure my LLM provider in AstrBot’s WebUI. I used OpenAI, but AstrBot supports DeepSeek, Ollama, and other OpenAI-compatible providers.
The key is ensuring your provider supports both:
- Chat completions (for responses)
- Embeddings (for document vectors)
Step 2: Upload Documents
I started by uploading our product documentation through the WebUI:
- Navigate to “Knowledge Base” in the sidebar
- Click “Upload Documents”
- Select files (PDF, TXT, MD supported)
- Wait for processing
AstrBot automatically:
- Parses the documents
- Splits them into chunks
- Generates embeddings
- Stores in the vector database
Step 3: Configure Chunking Parameters
The default chunking settings weren’t ideal for my technical documentation. I adjusted them:
knowledge_base: enabled: true chunk_size: 512 # Characters per chunk chunk_overlap: 50 # Overlap between chunks embedding_model: "text-embedding-3-small" retrieval: top_k: 5 # Number of chunks to retrieve similarity_threshold: 0.7 # Minimum similarity scoreI found that chunk_size: 512 works well for technical docs - small enough to be precise, large enough to capture complete concepts. The chunk_overlap: 50 helps maintain context across chunk boundaries.
Step 4: Configure RAG Retrieval
The retrieval settings control how AstrBot finds relevant documents:
rag: enabled: true context_compression: enabled: true max_tokens: 2000 # Max context tokens sent to LLM reranking: enabled: false # Enable if you have a reranking modelContext compression is crucial for long documents. AstrBot will:
- Retrieve the top-k most relevant chunks
- Compress them to fit within
max_tokens - Send compressed context to the LLM
Step 5: Set Up Persona
I configured the bot’s persona to use the knowledge base properly:
persona: name: "Support Bot" system_prompt: | You are a helpful customer support assistant. Answer questions using ONLY the provided knowledge base context. If the answer is not in the context, say "I don't have that information." Be concise and specific. use_knowledge_base: trueThe key instruction is “Answer questions using ONLY the provided knowledge base context.” This prevents the LLM from hallucinating answers.
Testing the Knowledge Base
I tested with a real question from our support tickets:
User: What's the warranty period for Product-X?
Bot: According to the product documentation, Product-X comes with a 24-monthwarranty from the date of purchase. The warranty covers manufacturing defectsbut does not cover damage from misuse or unauthorized modifications.The bot now pulls from our actual documentation instead of giving generic responses.
What Didn’t Work Initially
Problem 1: Retrieval Not Finding Relevant Documents
My first attempt returned irrelevant results. The issue was the similarity_threshold was too low (0.5).
I increased it to 0.7:
retrieval: similarity_threshold: 0.7Now the bot only returns documents with higher relevance scores.
Problem 2: Context Window Overflow
When I uploaded a large product manual, the context exceeded the LLM’s token limit. I got truncated responses.
The fix was enabling context compression:
context_compression: enabled: true max_tokens: 2000AstrBot now compresses retrieved chunks before sending to the LLM.
Problem 3: Chunk Size Too Large
With chunk_size: 1024, retrieval was imprecise. Queries like “warranty period” would return entire sections instead of specific paragraphs.
I reduced chunk size to 512 characters:
chunk_size: 512chunk_overlap: 50Smaller chunks mean more precise retrieval.
Advanced Techniques
Hybrid Search
AstrBot supports hybrid search combining semantic and keyword matching. This helps when users use specific terminology:
retrieval: hybrid_search: enabled: true semantic_weight: 0.7 keyword_weight: 0.3Semantic search catches conceptual matches, keyword search catches exact terms.
Multi-Document Retrieval
When I uploaded multiple product manuals, AstrBot handles cross-document queries:
User: Compare Product-X and Product-Y warranty terms.
Bot: Product-X has a 24-month warranty covering manufacturing defects.Product-Y has an 18-month warranty with the same coverage. Product-X offers6 additional months of coverage.Dify Integration for Advanced RAG
For more complex RAG needs, AstrBot integrates with Dify:
provider: difyapi_endpoint: "https://your-dify-instance.com/v1"api_key: "${DIFY_API_KEY}"dataset_id: "your-dataset-id"Dify offers advanced features like:
- Segmentation strategies
- Q&A mode extraction
- Hybrid search with reranking
Deploying to Multiple Platforms
One of AstrBot’s strengths is deploying the same knowledge base to multiple messaging platforms.
I deployed our support bot to both QQ and Telegram:
Knowledge Base (Single Source) ↓ AstrBot Core ↓ ┌───────────┼───────────┐ ↓ ↓ ↓ QQ Channel Telegram WeChat Work Support Bot Support Support BotEach platform uses the same knowledge base, so updates propagate automatically. I just needed to configure the platform adapters in the WebUI.
Programmatic Access
For custom integrations, I can query the knowledge base programmatically:
from astrbot.api.context import Context
async def query_knowledge(query: str) -> str: """Query the knowledge base and return answer.""" context = Context() result = await context.knowledge_base.retrieve(query) return result['answer']
# Example usageanswer = await query_knowledge("What is the warranty period for Product-X?")print(answer)This is useful for integrating the knowledge base into custom applications or workflows.
Best Practices Learned
Document Structure
Well-structured documents improve retrieval:
- Use clear headings and sections
- Put key information in complete sentences
- Avoid large tables or complex formatting
- Add summaries at the beginning of long sections
Chunk Size Guidelines
| Document Type | Recommended Chunk Size | Reason |
|---|---|---|
| Technical docs | 400-600 | Precise retrieval of specific info |
| FAQs | 200-300 | Each Q&A pair should be one chunk |
| Long manuals | 600-800 | Capture complete procedures |
| Policy documents | 500-700 | Balance between context and precision |
Performance Tips
- Monitor retrieval latency - Large knowledge bases may need indexing optimization
- Cache frequent queries - AstrBot caches retrieval results by default
- Batch document uploads - Upload multiple documents at once for efficient processing
- Regular updates - Re-upload documents when they change significantly
Common Issues and Solutions
Issue: Bot Says “I Don’t Know” Too Often
Cause: Similarity threshold too high or documents not properly parsed.
Fix:
retrieval: similarity_threshold: 0.6 # Lower from 0.7Also check that documents were parsed correctly in the WebUI.
Issue: Responses Are Generic
Cause: Knowledge base not enabled or persona not configured.
Fix: Ensure use_knowledge_base: true in persona config and verify the system prompt instructs the LLM to use the provided context.
Issue: Slow Retrieval
Cause: Large knowledge base or inefficient chunking.
Fix:
- Reduce
top_kvalue (e.g., from 10 to 5) - Enable context compression
- Consider splitting very large documents
Summary
In this post, I demonstrated how to build a knowledge base chatbot with AstrBot using RAG. The key points are:
- RAG connects your documents to the LLM, eliminating hallucinations
- Proper chunking and retrieval settings are critical for accuracy
- Context compression handles long documents
- AstrBot’s multi-platform support lets you deploy one knowledge base everywhere
The most important configuration is the persona system prompt instructing the LLM to use only the provided knowledge base context. Without this, the LLM will still make up answers.
For more advanced RAG features like reranking and segmentation, consider integrating with Dify. But for most use cases, AstrBot’s native knowledge base is sufficient and simpler to set up.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 AstrBot Documentation
- 👨💻 AstrBot GitHub Repository
- 👨💻 OpenAI Embeddings API
- 👨💻 LangChain RAG Documentation
- 👨💻 Dify Platform
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments