Skip to content

How to Set Up Persona and Context Compression in AstrBot: Complete Guide

Problem

When I deployed my AstrBot chatbot for customer support, I faced two problems:

  1. The bot responded with generic, robotic messages that didn’t match our brand voice
  2. Long conversations quickly hit token limits, causing context loss and high API costs

Here’s what a typical conversation looked like without proper persona configuration:

Generic Bot Response
User: How do I reset my password?
Bot: To reset your password, please follow these steps:
1. Go to the login page
2. Click "Forgot Password"
3. Enter your email
4. Check your inbox for reset instructions
Is there anything else I can help you with?

And when conversations exceeded context limits:

Context Overflow Error
User: What was the issue I mentioned earlier?
Bot: I apologize, but I don't have access to our previous conversation context.
Could you please remind me what issue you were referring to?

Environment

  • AstrBot 3.5.0
  • Python 3.11
  • OpenAI GPT-4o-mini (primary LLM)
  • Platform: QQ + Telegram

What Happened?

I deployed AstrBot with default settings. The bot worked, but conversations felt impersonal and memory management was inefficient.

My initial configuration looked like this:

astrbot/config/default.yaml
llm:
provider: "openai"
model: "gpt-4o-mini"
api_key: "${OPENAI_API_KEY}"
platforms:
- qq
- telegram
# No persona or context settings

Without persona configuration, the bot had no defined personality. Without context compression, every message in the conversation history was sent to the LLM, quickly consuming tokens.

I needed to:

  1. Define a persona that matches our support voice
  2. Enable context compression to manage token usage

Understanding Persona in AstrBot

A persona in AstrBot is defined through a system prompt that shapes how the bot responds. It’s not just about “being friendly” - it determines response style, expertise level, and behavioral constraints.

The persona gets injected as the first message in every LLM call:

Persona Injection Flow
User Message → System Prompt (Persona) + Conversation History → LLM Response

Without a persona, the LLM defaults to a generic assistant behavior, which explains the robotic responses I saw.

Setting Up a Persona

Basic Configuration

I created a persona configuration file:

astrbot/config/persona.yaml
persona:
name: "Support Assistant"
system_prompt: |
You are a helpful technical support assistant for TechCorp.
- Be concise but thorough
- Use a friendly, professional tone
- Acknowledge user frustration before providing solutions
- Format responses with bullet points for clarity
- If unsure, admit it and suggest escalation paths
tone: "professional"
language: "user_preferred"

After applying this configuration, the same password reset query produced:

Persona-Aware Response
User: How do I reset my password?
Bot: I understand password issues can be frustrating! Here's how to reset it:
- Head to our login page at login.techcorp.com
- Click "Forgot Password" below the login button
- Enter your registered email address
- Check your inbox (and spam folder) for the reset link
Need help with any step? I'm here to guide you through it.

Much better - the tone matches our brand and the response is more helpful.

Platform-Specific Personas

I have users on both QQ (casual community) and Telegram (business partners). Different platforms need different personas:

astrbot/config/platform_personas.yaml
platforms:
qq:
persona: "casual_friend"
system_prompt: |
Hey! I'm your TechCorp buddy on QQ.
Feel free to ask me anything - I'll help you out!
Keep it casual and fun.
telegram:
persona: "professional_support"
system_prompt: |
Hello! I'm your TechCorp enterprise support assistant.
How may I assist with your business needs today?
Maintain professional tone for B2B communications.

This creates a flow like:

Multi-Platform Persona Flow
┌─────────────────┐
│ Base Persona │
│ Template │
└────────┬────────┘
┌────┴────┐
│ │
▼ ▼
┌───────┐ ┌──────────┐
│ QQ │ │ Telegram │
│Casual │ │Business │
└───────┘ └──────────┘

Context Compression: The Token Problem

Even with a good persona, I hit another wall. Long conversations consumed excessive tokens.

Here’s what happens without context management:

Token Growth Over Conversation
Message 1: 500 tokens (user + response)
Message 2: 800 tokens (includes Message 1 context)
Message 3: 1200 tokens (includes Messages 1-2 context)
Message 4: 1600 tokens (includes Messages 1-3 context)
...
Message 20: 8000+ tokens (full conversation history)

Each message sends the entire conversation history to the LLM. At message 20, I’m paying for 8000+ tokens per request.

With GPT-4o-mini at $0.15/1M input tokens, a 50-message conversation with 10 users costs approximately:

Cost Without Compression
50 messages × 5000 avg tokens × 10 users = 2.5M tokens
2.5M tokens × $0.15/1M = $0.375 per conversation set

For 1000 daily conversations: $375/day in input tokens alone.

Enabling Context Compression

AstrBot provides built-in context compression through summarization. Instead of keeping full message history, it summarizes older messages when token thresholds are reached.

Configuration

astrbot/config/context_compression.yaml
context_compression:
enabled: true
strategy: "summarization"
triggers:
- type: "tokens"
value: 3000 # Compress when context exceeds 3000 tokens
- type: "messages"
value: 50 # Or when conversation exceeds 50 messages
keep:
messages: 10 # Always keep last 10 messages
tokens: 1000 # Reserve 1000 tokens for recent messages
summarization:
model: "gpt-4o-mini" # Use cheaper model for summarization
trim_tokens: 4000 # Target 4000 tokens after compression
placeholder: "[Previous conversation summarized]"

How Compression Works

The compression flow:

Context Compression Flow
Conversation History
Token Count Check
┌────┴────┐
│ │
Under Over
Limit Limit
│ │
│ ▼
│ Summarize Old Messages
│ │
│ ▼
│ Summary + Recent Messages
│ │
└────┬────┘
Send to LLM

When the trigger threshold is hit:

  1. Keep the last N messages intact
  2. Send older messages to summarization model
  3. Replace old messages with summary
  4. Continue conversation with compressed context

Example of Compressed Context

Original conversation (8000 tokens):

Original Context
User: I can't log into my account
Bot: Have you tried resetting your password?
User: Yes, but the reset email isn't arriving
Bot: Let me check your account status...
User: I'm using a corporate email, could that be the issue?
Bot: Corporate emails sometimes block our domain...
[... 15 more messages of troubleshooting ...]
User: So what was the solution you mentioned earlier?

After compression (2500 tokens):

Compressed Context
[Previous conversation summarized: User had login issues with corporate email.
Initial password reset attempts failed. Investigated email filtering.
Found that corporate firewall was blocking reset emails. IT team
needed to whitelist @techcorp.com domain.]
User: So what was the solution you mentioned earlier?
Bot: The solution is to have your IT team whitelist our domain...

The summary preserves the key information: the problem, diagnosis, and solution path. But instead of 8000 tokens, it’s now 2500 tokens.

Cost Savings

With compression enabled:

Cost Comparison
Without Compression:
- 50 messages avg 5000 tokens each = 250,000 tokens per conversation
- 1000 conversations = 250M tokens = $37.50/day
With Compression:
- 50 messages avg 2500 tokens each = 125,000 tokens per conversation
- 1000 conversations = 125M tokens = $18.75/day
Savings: 50% reduction in token costs

Plus, the summarization model cost is minimal:

Summarization Cost
1 summary per conversation × 1000 summaries = 1000 summarizations
Each summarization: ~2000 tokens input, ~500 tokens output
1000 × 2500 tokens × $0.15/1M = $0.38/day
Net savings: $37.50 - $18.75 - $0.38 = $18.37/day
Annual savings: ~$6,700

Fine-Tuning Compression Settings

The key parameters to tune:

Threshold Selection

Threshold Configuration
triggers:
- type: "tokens"
value: 3000 # 75% of 4000 token context window

Why 75%? Leave room for the response generation. If you set it at 100%, you risk context overflow before compression triggers.

Message Retention

Retention Settings
keep:
messages: 10 # Keep recent context intact
tokens: 1000 # Ensure recent messages fit

More retained messages = better conversation continuity, but higher token usage. I found 10-15 messages is a good balance for support conversations.

Summarization Model Choice

Model Selection
summarization:
model: "gpt-4o-mini" # Fast and cheap
# model: "gpt-4o" # Better quality, 10x more expensive

For support conversations, GPT-4o-mini summaries are sufficient. The summary doesn’t need to be perfect - it just needs to preserve key facts.

Custom Summarization Prompts

For specialized domains, you can customize the summarization prompt:

Custom Summarization Prompt
summarization:
custom_prompt: |
Summarize this technical support conversation, preserving:
- User's original problem statement
- Technical details mentioned (error codes, versions, configs)
- Troubleshooting steps attempted
- Any unresolved issues or pending actions
Exclude conversational filler and greetings.
Focus on facts that would help continue the support session.

This ensures critical information isn’t lost in summarization.

Production Configuration Example

Here’s my production configuration after tuning:

astrbot/config/production.yaml
# Persona Configuration
persona:
name: "TechCorp Support"
system_prompt: |
You are a technical support assistant for TechCorp.
Expertise: Account management, API integration, billing.
Tone: Professional but approachable.
Always verify user identity before account changes.
Escalate to human support for: security issues, refunds >$500.
# Platform-specific overrides
platform_overrides:
qq:
tone: "casual"
telegram:
tone: "professional"
# Context Management
context_compression:
enabled: true
strategy: "summarization"
triggers:
- type: "tokens"
value: 3000
- type: "messages"
value: 40
keep:
messages: 12
tokens: 1200
summarization:
model: "gpt-4o-mini"
trim_tokens: 3500
custom_prompt: |
Summarize this TechCorp support conversation, preserving:
- User account details mentioned
- Technical issues and error messages
- Solutions provided and their outcomes
- Pending actions or follow-ups needed
# Session Management
sessions:
timeout:
inactive_minutes: 30
absolute_hours: 8
storage:
type: "sqlite"
path: "data/sessions.db"
context_limits:
max_tokens_per_session: 10000
max_messages_per_session: 100

Troubleshooting Common Issues

Persona Not Applied

If the bot ignores persona settings:

  1. Check file path matches configuration
  2. Verify YAML syntax is valid
  3. Restart AstrBot after config changes
Debug Persona Loading
# Check if persona file is loaded
astrbot --debug --show-config
# Look for persona section in output

Context Lost Too Aggressively

If the bot forgets recent context:

Adjust Retention Settings
keep:
messages: 20 # Increase from 10
tokens: 2000 # Increase from 1000

Poor Quality Summaries

If summaries lose important information:

  1. Use a better summarization model (GPT-4o instead of GPT-4o-mini)
  2. Add a custom prompt with explicit preservation rules
  3. Increase the trim_tokens target to allow more detailed summaries
Better Summarization
summarization:
model: "gpt-4o"
trim_tokens: 5000 # More room for detail
custom_prompt: |
Create a detailed summary preserving all technical specifics...

Monitoring and Metrics

Track compression effectiveness:

Enable Logging
logging:
level: "INFO"
context_compression: true
metrics:
token_usage: true
compression_ratio: true

Watch for:

  • Compression frequency (too often = threshold too low)
  • Average context size before/after compression
  • User satisfaction with conversation continuity

Summary

In this post, I showed how to configure persona and context compression in AstrBot. The key points are:

  1. Persona defines bot personality through system prompts - essential for brand consistency
  2. Context compression reduces token costs by summarizing old messages instead of keeping full history
  3. Tune thresholds based on your use case - balance between cost savings and conversation quality
  4. Monitor compression effectiveness and adjust settings based on real usage patterns

With proper persona and context compression configuration, my AstrBot deployment maintains consistent personality while reducing token costs by approximately 50%. The configuration requires upfront tuning, but the cost savings and improved user experience are worth the effort.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments