How to Set Up Persona and Context Compression in AstrBot: Complete Guide
Problem
When I deployed my AstrBot chatbot for customer support, I faced two problems:
- The bot responded with generic, robotic messages that didn’t match our brand voice
- Long conversations quickly hit token limits, causing context loss and high API costs
Here’s what a typical conversation looked like without proper persona configuration:
User: How do I reset my password?Bot: To reset your password, please follow these steps:1. Go to the login page2. Click "Forgot Password"3. Enter your email4. Check your inbox for reset instructions
Is there anything else I can help you with?And when conversations exceeded context limits:
User: What was the issue I mentioned earlier?Bot: I apologize, but I don't have access to our previous conversation context.Could you please remind me what issue you were referring to?Environment
- AstrBot 3.5.0
- Python 3.11
- OpenAI GPT-4o-mini (primary LLM)
- Platform: QQ + Telegram
What Happened?
I deployed AstrBot with default settings. The bot worked, but conversations felt impersonal and memory management was inefficient.
My initial configuration looked like this:
llm: provider: "openai" model: "gpt-4o-mini" api_key: "${OPENAI_API_KEY}"
platforms: - qq - telegram
# No persona or context settingsWithout persona configuration, the bot had no defined personality. Without context compression, every message in the conversation history was sent to the LLM, quickly consuming tokens.
I needed to:
- Define a persona that matches our support voice
- Enable context compression to manage token usage
Understanding Persona in AstrBot
A persona in AstrBot is defined through a system prompt that shapes how the bot responds. It’s not just about “being friendly” - it determines response style, expertise level, and behavioral constraints.
The persona gets injected as the first message in every LLM call:
User Message → System Prompt (Persona) + Conversation History → LLM ResponseWithout a persona, the LLM defaults to a generic assistant behavior, which explains the robotic responses I saw.
Setting Up a Persona
Basic Configuration
I created a persona configuration file:
persona: name: "Support Assistant" system_prompt: | You are a helpful technical support assistant for TechCorp. - Be concise but thorough - Use a friendly, professional tone - Acknowledge user frustration before providing solutions - Format responses with bullet points for clarity - If unsure, admit it and suggest escalation paths
tone: "professional" language: "user_preferred"After applying this configuration, the same password reset query produced:
User: How do I reset my password?Bot: I understand password issues can be frustrating! Here's how to reset it:
- Head to our login page at login.techcorp.com- Click "Forgot Password" below the login button- Enter your registered email address- Check your inbox (and spam folder) for the reset link
Need help with any step? I'm here to guide you through it.Much better - the tone matches our brand and the response is more helpful.
Platform-Specific Personas
I have users on both QQ (casual community) and Telegram (business partners). Different platforms need different personas:
platforms: qq: persona: "casual_friend" system_prompt: | Hey! I'm your TechCorp buddy on QQ. Feel free to ask me anything - I'll help you out! Keep it casual and fun.
telegram: persona: "professional_support" system_prompt: | Hello! I'm your TechCorp enterprise support assistant. How may I assist with your business needs today? Maintain professional tone for B2B communications.This creates a flow like:
┌─────────────────┐│ Base Persona ││ Template │└────────┬────────┘ │ ┌────┴────┐ │ │ ▼ ▼┌───────┐ ┌──────────┐│ QQ │ │ Telegram ││Casual │ │Business │└───────┘ └──────────┘Context Compression: The Token Problem
Even with a good persona, I hit another wall. Long conversations consumed excessive tokens.
Here’s what happens without context management:
Message 1: 500 tokens (user + response)Message 2: 800 tokens (includes Message 1 context)Message 3: 1200 tokens (includes Messages 1-2 context)Message 4: 1600 tokens (includes Messages 1-3 context)...Message 20: 8000+ tokens (full conversation history)Each message sends the entire conversation history to the LLM. At message 20, I’m paying for 8000+ tokens per request.
With GPT-4o-mini at $0.15/1M input tokens, a 50-message conversation with 10 users costs approximately:
50 messages × 5000 avg tokens × 10 users = 2.5M tokens2.5M tokens × $0.15/1M = $0.375 per conversation setFor 1000 daily conversations: $375/day in input tokens alone.
Enabling Context Compression
AstrBot provides built-in context compression through summarization. Instead of keeping full message history, it summarizes older messages when token thresholds are reached.
Configuration
context_compression: enabled: true strategy: "summarization"
triggers: - type: "tokens" value: 3000 # Compress when context exceeds 3000 tokens - type: "messages" value: 50 # Or when conversation exceeds 50 messages
keep: messages: 10 # Always keep last 10 messages tokens: 1000 # Reserve 1000 tokens for recent messages
summarization: model: "gpt-4o-mini" # Use cheaper model for summarization trim_tokens: 4000 # Target 4000 tokens after compression
placeholder: "[Previous conversation summarized]"How Compression Works
The compression flow:
Conversation History │ ▼Token Count Check │ ┌────┴────┐ │ │Under OverLimit Limit │ │ │ ▼ │ Summarize Old Messages │ │ │ ▼ │ Summary + Recent Messages │ │ └────┬────┘ │ ▼ Send to LLMWhen the trigger threshold is hit:
- Keep the last N messages intact
- Send older messages to summarization model
- Replace old messages with summary
- Continue conversation with compressed context
Example of Compressed Context
Original conversation (8000 tokens):
User: I can't log into my accountBot: Have you tried resetting your password?User: Yes, but the reset email isn't arrivingBot: Let me check your account status...User: I'm using a corporate email, could that be the issue?Bot: Corporate emails sometimes block our domain...[... 15 more messages of troubleshooting ...]User: So what was the solution you mentioned earlier?After compression (2500 tokens):
[Previous conversation summarized: User had login issues with corporate email.Initial password reset attempts failed. Investigated email filtering.Found that corporate firewall was blocking reset emails. IT teamneeded to whitelist @techcorp.com domain.]
User: So what was the solution you mentioned earlier?Bot: The solution is to have your IT team whitelist our domain...The summary preserves the key information: the problem, diagnosis, and solution path. But instead of 8000 tokens, it’s now 2500 tokens.
Cost Savings
With compression enabled:
Without Compression:- 50 messages avg 5000 tokens each = 250,000 tokens per conversation- 1000 conversations = 250M tokens = $37.50/day
With Compression:- 50 messages avg 2500 tokens each = 125,000 tokens per conversation- 1000 conversations = 125M tokens = $18.75/day
Savings: 50% reduction in token costsPlus, the summarization model cost is minimal:
1 summary per conversation × 1000 summaries = 1000 summarizationsEach summarization: ~2000 tokens input, ~500 tokens output1000 × 2500 tokens × $0.15/1M = $0.38/day
Net savings: $37.50 - $18.75 - $0.38 = $18.37/dayAnnual savings: ~$6,700Fine-Tuning Compression Settings
The key parameters to tune:
Threshold Selection
triggers: - type: "tokens" value: 3000 # 75% of 4000 token context windowWhy 75%? Leave room for the response generation. If you set it at 100%, you risk context overflow before compression triggers.
Message Retention
keep: messages: 10 # Keep recent context intact tokens: 1000 # Ensure recent messages fitMore retained messages = better conversation continuity, but higher token usage. I found 10-15 messages is a good balance for support conversations.
Summarization Model Choice
summarization: model: "gpt-4o-mini" # Fast and cheap # model: "gpt-4o" # Better quality, 10x more expensiveFor support conversations, GPT-4o-mini summaries are sufficient. The summary doesn’t need to be perfect - it just needs to preserve key facts.
Custom Summarization Prompts
For specialized domains, you can customize the summarization prompt:
summarization: custom_prompt: | Summarize this technical support conversation, preserving: - User's original problem statement - Technical details mentioned (error codes, versions, configs) - Troubleshooting steps attempted - Any unresolved issues or pending actions
Exclude conversational filler and greetings. Focus on facts that would help continue the support session.This ensures critical information isn’t lost in summarization.
Production Configuration Example
Here’s my production configuration after tuning:
# Persona Configurationpersona: name: "TechCorp Support" system_prompt: | You are a technical support assistant for TechCorp. Expertise: Account management, API integration, billing. Tone: Professional but approachable. Always verify user identity before account changes. Escalate to human support for: security issues, refunds >$500.
# Platform-specific overridesplatform_overrides: qq: tone: "casual" telegram: tone: "professional"
# Context Managementcontext_compression: enabled: true strategy: "summarization"
triggers: - type: "tokens" value: 3000 - type: "messages" value: 40
keep: messages: 12 tokens: 1200
summarization: model: "gpt-4o-mini" trim_tokens: 3500 custom_prompt: | Summarize this TechCorp support conversation, preserving: - User account details mentioned - Technical issues and error messages - Solutions provided and their outcomes - Pending actions or follow-ups needed
# Session Managementsessions: timeout: inactive_minutes: 30 absolute_hours: 8
storage: type: "sqlite" path: "data/sessions.db"
context_limits: max_tokens_per_session: 10000 max_messages_per_session: 100Troubleshooting Common Issues
Persona Not Applied
If the bot ignores persona settings:
- Check file path matches configuration
- Verify YAML syntax is valid
- Restart AstrBot after config changes
# Check if persona file is loadedastrbot --debug --show-config
# Look for persona section in outputContext Lost Too Aggressively
If the bot forgets recent context:
keep: messages: 20 # Increase from 10 tokens: 2000 # Increase from 1000Poor Quality Summaries
If summaries lose important information:
- Use a better summarization model (GPT-4o instead of GPT-4o-mini)
- Add a custom prompt with explicit preservation rules
- Increase the trim_tokens target to allow more detailed summaries
summarization: model: "gpt-4o" trim_tokens: 5000 # More room for detail custom_prompt: | Create a detailed summary preserving all technical specifics...Monitoring and Metrics
Track compression effectiveness:
logging: level: "INFO" context_compression: true
metrics: token_usage: true compression_ratio: trueWatch for:
- Compression frequency (too often = threshold too low)
- Average context size before/after compression
- User satisfaction with conversation continuity
Summary
In this post, I showed how to configure persona and context compression in AstrBot. The key points are:
- Persona defines bot personality through system prompts - essential for brand consistency
- Context compression reduces token costs by summarizing old messages instead of keeping full history
- Tune thresholds based on your use case - balance between cost savings and conversation quality
- Monitor compression effectiveness and adjust settings based on real usage patterns
With proper persona and context compression configuration, my AstrBot deployment maintains consistent personality while reducing token costs by approximately 50%. The configuration requires upfront tuning, but the cost savings and improved user experience are worth the effort.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments