Skip to content

What Makes an AI Agent Useful vs Annoying in Slack? A $1/Day Case Study

Problem

I deployed an AI agent to our Slack workspace. Three weeks later, half the team had muted it.

The agent was “helpful”—it answered questions with detailed explanations, provided context, and offered suggestions. But nobody wanted that. They wanted answers, not essays.

A Reddit thread changed my perspective. A team reported their AI agent handling 40 questions per day for about $1, with 65% of usage from team members who’d been there 3+ months. That last number caught my attention: experienced users aren’t supposed to need training wheels. If seniors kept using it, something was different.

I analyzed what went wrong with my agent and what made theirs work. The difference wasn’t features or model choice. It was behavior.

What I Did Wrong

My initial system prompt looked like this:

bad_prompt.txt
You are a helpful AI assistant. Answer questions thoroughly and provide
context to help users understand the topic better. Be friendly and
offer additional suggestions when relevant.

The agent lived up to every word. When someone asked “What’s the API endpoint for user creation?”, it responded:

annoying_response.txt
The API endpoint for user creation is a crucial part of your system
architecture. In RESTful API design, user creation typically follows
the POST method convention. According to best practices established
by REST API guidelines, you should use a POST request to /api/users.
This endpoint should accept JSON data containing user details like
name, email, and password. Remember to implement proper validation
and error handling. The HTTP status code should be 201 Created on
success. You might also want to consider rate limiting and
authentication for security...

150 words to answer a 7-word question. And none of it was our actual endpoint—it was generic advice from training data.

The team’s reaction was predictable:

  • Senior engineers ignored it completely
  • New hires used it once, then learned to ask humans
  • The only people engaging were asking trivial questions

The Four Principles That Changed Everything

I rewrote the prompt based on the successful case study. Here’s what changed.

Principle 1: Search Before Answering from Memory

The successful agent didn’t answer from training data first. It searched:

search_first.txt
Before answering from your training data:
1. Search the team's documentation
2. Search recent Slack messages
3. Search the codebase if relevant
4. Only then answer from general knowledge

Why this matters: Team-specific knowledge beats generic knowledge. When someone asks about our deployment process, they want our process, not a DevOps tutorial.

I implemented this by connecting the agent to our knowledge sources:

Query Flow
User: "How do we deploy to staging?"
|
v
Search Notion docs
|
v
Found: "Deployment Guide v2"
|
v
Extract relevant section
|
v
Return with citation

Now when someone asks about deployment, they get our actual process, sourced from our docs.

Principle 2: Admit Uncertainty

My original agent would guess. Confidently. Wrongly.

The fix:

confidence_levels.txt
Confidence levels:
- HIGH: Direct quote from docs or recent Slack
- MEDIUM: Inferred from team patterns
- LOW: General knowledge that may not apply
If LOW confidence:
- Say "I'm not certain this applies to your setup"
- Suggest who to ask: "You might want to check with @devops"

This changed team trust dramatically. Before, wrong answers meant people stopped asking. After, the agent would say things like:

honest_response.txt
I found something similar in our docs, but it's from 6 months ago.
I'm not certain it's still current—you might want to check with @sarah.

Paradoxically, admitting ignorance built more trust than confident guessing.

Principle 3: No Unsolicited Information

This was the hardest habit to break. LLMs love to be “helpful.”

New rule:

no_unsolicited.txt
Answer ONLY what was asked.
- If asked for an endpoint, give the endpoint
- Do NOT add: best practices, security tips, alternatives
- Do NOT suggest: "You might also want to consider..."
- Save context for follow-up questions

The same API endpoint question now gets:

concise_response.txt
POST /api/users
Found in: docs/api.md#endpoints

8 words. That’s it. If they want more context, they’ll ask.

Principle 4: Brevity by Default

length_rules.txt
Response length guidelines:
- Default: Under 200 words
- If detail requested: Be thorough
- If asked to explain: Provide context
- Default assumption: User needs quick answer, not tutorial

38% of queries in the case study were doc searches. Doc searches should return results, not essays.

The Rewritten System Prompt

Here’s what the final prompt looks like:

system_prompt.txt
You are a team assistant in Slack. Follow these rules strictly:
1. SEARCH FIRST
- Check team docs before answering from memory
- Cite your sources when possible
2. BE HONEST ABOUT UNCERTAINTY
- If unsure, say so
- Suggest who to ask: "I'm not certain, check with @person"
3. NO UNSOLICITED ADVICE
- Answer the question asked
- Don't add tips, alternatives, or related info
- Wait for follow-up questions
4. KEEP IT SHORT
- Default: under 200 words
- Provide detail only when asked
- Use bullet points for lists
5. RESPECT THE CHANNEL
- Match technical depth to audience
- Don't spam busy channels

Results After Two Months

The numbers:

Usage Stats
Questions answered: ~40/day
Cost: ~$1/day
Cost per answer: $0.025
Experienced user adoption: 65%

The 65% figure is what matters most. It means the agent isn’t just training wheels for new hires—it’s a tool experienced team members actually find useful.

Query breakdown:

Query Types
Doc searches: 38%
Status checks: 24%
Thread summaries: 18%
Misc: 20%

Each query type has different behavior:

  • Doc searches: Return results, cite sources
  • Status checks: Pull from Linear/Jira, format concisely
  • Thread summaries: Key points only, bullet format
  • Misc: Apply the four principles

What Still Went Wrong

Even with these principles, I made mistakes.

Over-engineering query detection:

bad_classification.py
# TOO COMPLEX
def classify_query(query):
# 50+ lines of regex patterns
# ML model for intent classification
# Multiple fallback strategies
pass

Simple worked better:

simple_classification.py
def classify_query(query):
if any(k in query.lower() for k in ["where", "find", "look for"]):
return "doc_search"
elif any(k in query.lower() for k in ["status", "progress", "done"]):
return "status_check"
elif any(k in query.lower() for k in ["summarize", "summary", "catch up"]):
return "thread_summary"
else:
return "misc"

Ignoring channel context:

An agent responding in #general should be more careful than one in #engineering. I added:

channel_context.py
def adjust_response(channel, response):
if channel == "#general":
# Extra concise, fewer technical terms
return simplify(response)
elif channel == "#engineering":
# Can use technical shorthand
return response

No usage analytics:

Initially, I had no visibility into what was being asked. I added tracking:

analytics.py
def log_query(query_type, confidence, response_length, source):
# Track query patterns
# Identify doc gaps
# Measure follow-up rate
pass

This revealed that “where is” queries often failed because docs were outdated. Fixing the docs reduced failures by 30%.

When This Approach Works

These principles work when:

  • Your team asks lookup-based questions
  • You have documented knowledge to search
  • Users want speed over conversation
  • Cost matters ($1/day vs. $10+/day for verbose agents)

They don’t work when:

  • Questions require complex reasoning
  • Your knowledge base is empty or outdated
  • Users expect conversational interaction
  • You need multi-step workflows

The One Thing That Matters Most

I spent weeks thinking the problem was features. More integrations. Better models. Smarter reasoning.

The solution was simpler: respect the user’s time.

An agent that gives the wrong answer quickly is trusted more than one that gives the right answer slowly. An agent that says “I don’t know” is trusted more than one that guesses. An agent that answers the question asked is trusted more than one that answers every possible question.

The 65% experienced user adoption isn’t about intelligence. It’s about behavior.

Before vs After
BEFORE:
User: "What's the API endpoint for user creation?"
Agent: [150 words of generic REST tutorial, no actual endpoint]
AFTER:
User: "What's the API endpoint for user creation?"
Agent: "POST /api/users - found in docs/api.md"

Same model. Same integrations. Different behavior.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments