How to Set Up Persona and Context Compression in AstrBot: Complete Guide

Mar 3, 2026

Problem

When I deployed my AstrBot chatbot for customer support, I faced two problems:

The bot responded with generic, robotic messages that didn’t match our brand voice
Long conversations quickly hit token limits, causing context loss and high API costs

Here’s what a typical conversation looked like without proper persona configuration:

User: How do I reset my password?
Bot: To reset your password, please follow these steps:
1. Go to the login page
2. Click "Forgot Password"
3. Enter your email
4. Check your inbox for reset instructions

Is there anything else I can help you with?

And when conversations exceeded context limits:

User: What was the issue I mentioned earlier?
Bot: I apologize, but I don't have access to our previous conversation context.
Could you please remind me what issue you were referring to?

Environment

AstrBot 3.5.0
Python 3.11
OpenAI GPT-4o-mini (primary LLM)
Platform: QQ + Telegram

What Happened?

I deployed AstrBot with default settings. The bot worked, but conversations felt impersonal and memory management was inefficient.

My initial configuration looked like this:

llm:
  provider: "openai"
  model: "gpt-4o-mini"
  api_key: "${OPENAI_API_KEY}"

platforms:
  - qq
  - telegram

# No persona or context settings

Without persona configuration, the bot had no defined personality. Without context compression, every message in the conversation history was sent to the LLM, quickly consuming tokens.

I needed to:

Define a persona that matches our support voice
Enable context compression to manage token usage

Understanding Persona in AstrBot

A persona in AstrBot is defined through a system prompt that shapes how the bot responds. It’s not just about “being friendly” - it determines response style, expertise level, and behavioral constraints.

The persona gets injected as the first message in every LLM call:

User Message → System Prompt (Persona) + Conversation History → LLM Response

Without a persona, the LLM defaults to a generic assistant behavior, which explains the robotic responses I saw.

Setting Up a Persona

Basic Configuration

I created a persona configuration file:

persona:
  name: "Support Assistant"
  system_prompt: |
    You are a helpful technical support assistant for TechCorp.
    - Be concise but thorough
    - Use a friendly, professional tone
    - Acknowledge user frustration before providing solutions
    - Format responses with bullet points for clarity
    - If unsure, admit it and suggest escalation paths

  tone: "professional"
  language: "user_preferred"

After applying this configuration, the same password reset query produced:

User: How do I reset my password?
Bot: I understand password issues can be frustrating! Here's how to reset it:

- Head to our login page at login.techcorp.com
- Click "Forgot Password" below the login button
- Enter your registered email address
- Check your inbox (and spam folder) for the reset link

Need help with any step? I'm here to guide you through it.

Much better - the tone matches our brand and the response is more helpful.

Platform-Specific Personas

I have users on both QQ (casual community) and Telegram (business partners). Different platforms need different personas:

platforms:
  qq:
    persona: "casual_friend"
    system_prompt: |
      Hey! I'm your TechCorp buddy on QQ.
      Feel free to ask me anything - I'll help you out!
      Keep it casual and fun.

  telegram:
    persona: "professional_support"
    system_prompt: |
      Hello! I'm your TechCorp enterprise support assistant.
      How may I assist with your business needs today?
      Maintain professional tone for B2B communications.

This creates a flow like:

┌─────────────────┐
│  Base Persona   │
│   Template      │
└────────┬────────┘
         │
    ┌────┴────┐
    │         │
    ▼         ▼
┌───────┐  ┌──────────┐
│  QQ   │  │ Telegram │
│Casual │  │Business  │
└───────┘  └──────────┘

Context Compression: The Token Problem

Even with a good persona, I hit another wall. Long conversations consumed excessive tokens.

Here’s what happens without context management:

Message 1:  500 tokens  (user + response)
Message 2:  800 tokens  (includes Message 1 context)
Message 3:  1200 tokens (includes Messages 1-2 context)
Message 4:  1600 tokens (includes Messages 1-3 context)
...
Message 20: 8000+ tokens (full conversation history)

Each message sends the entire conversation history to the LLM. At message 20, I’m paying for 8000+ tokens per request.

With GPT-4o-mini at $0.15/1M input tokens, a 50-message conversation with 10 users costs approximately:

50 messages × 5000 avg tokens × 10 users = 2.5M tokens
2.5M tokens × $0.15/1M = $0.375 per conversation set

For 1000 daily conversations: $375/day in input tokens alone.

Enabling Context Compression

AstrBot provides built-in context compression through summarization. Instead of keeping full message history, it summarizes older messages when token thresholds are reached.

Configuration

context_compression:
  enabled: true
  strategy: "summarization"

  triggers:
    - type: "tokens"
      value: 3000    # Compress when context exceeds 3000 tokens
    - type: "messages"
      value: 50      # Or when conversation exceeds 50 messages

  keep:
    messages: 10     # Always keep last 10 messages
    tokens: 1000     # Reserve 1000 tokens for recent messages

  summarization:
    model: "gpt-4o-mini"  # Use cheaper model for summarization
    trim_tokens: 4000     # Target 4000 tokens after compression

  placeholder: "[Previous conversation summarized]"

How Compression Works

The compression flow:

Conversation History
        │
        ▼
Token Count Check
        │
   ┌────┴────┐
   │         │
Under      Over
Limit      Limit
   │         │
   │         ▼
   │    Summarize Old Messages
   │         │
   │         ▼
   │    Summary + Recent Messages
   │         │
   └────┬────┘
        │
        ▼
   Send to LLM

When the trigger threshold is hit:

Keep the last N messages intact
Send older messages to summarization model
Replace old messages with summary
Continue conversation with compressed context

Example of Compressed Context

Original conversation (8000 tokens):

User: I can't log into my account
Bot: Have you tried resetting your password?
User: Yes, but the reset email isn't arriving
Bot: Let me check your account status...
User: I'm using a corporate email, could that be the issue?
Bot: Corporate emails sometimes block our domain...
[... 15 more messages of troubleshooting ...]
User: So what was the solution you mentioned earlier?

After compression (2500 tokens):

[Previous conversation summarized: User had login issues with corporate email.
Initial password reset attempts failed. Investigated email filtering.
Found that corporate firewall was blocking reset emails. IT team
needed to whitelist @techcorp.com domain.]

User: So what was the solution you mentioned earlier?
Bot: The solution is to have your IT team whitelist our domain...

The summary preserves the key information: the problem, diagnosis, and solution path. But instead of 8000 tokens, it’s now 2500 tokens.

Cost Savings

With compression enabled:

Without Compression:
- 50 messages avg 5000 tokens each = 250,000 tokens per conversation
- 1000 conversations = 250M tokens = $37.50/day

With Compression:
- 50 messages avg 2500 tokens each = 125,000 tokens per conversation
- 1000 conversations = 125M tokens = $18.75/day

Savings: 50% reduction in token costs

Plus, the summarization model cost is minimal:

1 summary per conversation × 1000 summaries = 1000 summarizations
Each summarization: ~2000 tokens input, ~500 tokens output
1000 × 2500 tokens × $0.15/1M = $0.38/day

Net savings: $37.50 - $18.75 - $0.38 = $18.37/day
Annual savings: ~$6,700

Fine-Tuning Compression Settings

The key parameters to tune:

Threshold Selection

triggers:
  - type: "tokens"
    value: 3000    # 75% of 4000 token context window

Why 75%? Leave room for the response generation. If you set it at 100%, you risk context overflow before compression triggers.

Message Retention

keep:
  messages: 10     # Keep recent context intact
  tokens: 1000     # Ensure recent messages fit

More retained messages = better conversation continuity, but higher token usage. I found 10-15 messages is a good balance for support conversations.

Summarization Model Choice

summarization:
  model: "gpt-4o-mini"  # Fast and cheap
  # model: "gpt-4o"     # Better quality, 10x more expensive

For support conversations, GPT-4o-mini summaries are sufficient. The summary doesn’t need to be perfect - it just needs to preserve key facts.

Custom Summarization Prompts

For specialized domains, you can customize the summarization prompt:

summarization:
  custom_prompt: |
    Summarize this technical support conversation, preserving:
    - User's original problem statement
    - Technical details mentioned (error codes, versions, configs)
    - Troubleshooting steps attempted
    - Any unresolved issues or pending actions

    Exclude conversational filler and greetings.
    Focus on facts that would help continue the support session.

This ensures critical information isn’t lost in summarization.

Production Configuration Example

Here’s my production configuration after tuning:

# Persona Configuration
persona:
  name: "TechCorp Support"
  system_prompt: |
    You are a technical support assistant for TechCorp.
    Expertise: Account management, API integration, billing.
    Tone: Professional but approachable.
    Always verify user identity before account changes.
    Escalate to human support for: security issues, refunds >$500.

# Platform-specific overrides
platform_overrides:
  qq:
    tone: "casual"
  telegram:
    tone: "professional"

# Context Management
context_compression:
  enabled: true
  strategy: "summarization"

  triggers:
    - type: "tokens"
      value: 3000
    - type: "messages"
      value: 40

  keep:
    messages: 12
    tokens: 1200

  summarization:
    model: "gpt-4o-mini"
    trim_tokens: 3500
    custom_prompt: |
      Summarize this TechCorp support conversation, preserving:
      - User account details mentioned
      - Technical issues and error messages
      - Solutions provided and their outcomes
      - Pending actions or follow-ups needed

# Session Management
sessions:
  timeout:
    inactive_minutes: 30
    absolute_hours: 8

  storage:
    type: "sqlite"
    path: "data/sessions.db"

  context_limits:
    max_tokens_per_session: 10000
    max_messages_per_session: 100

Troubleshooting Common Issues

Persona Not Applied

If the bot ignores persona settings:

Check file path matches configuration
Verify YAML syntax is valid
Restart AstrBot after config changes

# Check if persona file is loaded
astrbot --debug --show-config

# Look for persona section in output

Context Lost Too Aggressively

If the bot forgets recent context:

keep:
  messages: 20     # Increase from 10
  tokens: 2000     # Increase from 1000

Poor Quality Summaries

If summaries lose important information:

Use a better summarization model (GPT-4o instead of GPT-4o-mini)
Add a custom prompt with explicit preservation rules
Increase the trim_tokens target to allow more detailed summaries

summarization:
  model: "gpt-4o"
  trim_tokens: 5000  # More room for detail
  custom_prompt: |
    Create a detailed summary preserving all technical specifics...

Monitoring and Metrics

Track compression effectiveness:

logging:
  level: "INFO"
  context_compression: true

metrics:
  token_usage: true
  compression_ratio: true

Watch for:

Compression frequency (too often = threshold too low)
Average context size before/after compression
User satisfaction with conversation continuity

Summary

In this post, I showed how to configure persona and context compression in AstrBot. The key points are:

Persona defines bot personality through system prompts - essential for brand consistency
Context compression reduces token costs by summarizing old messages instead of keeping full history
Tune thresholds based on your use case - balance between cost savings and conversation quality
Monitor compression effectiveness and adjust settings based on real usage patterns

With proper persona and context compression configuration, my AstrBot deployment maintains consistent personality while reducing token costs by approximately 50%. The configuration requires upfront tuning, but the cost savings and improved user experience are worth the effort.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!