How Effective Are LLMs for Real Development? What Backend Engineers Actually Report

Mar 28, 2026

The Question That Won’t Go Away

I’ve been using AI coding assistants daily for backend development. The marketing sounds incredible: “Transform your development workflow,” “10x your output,” “Ship faster than ever.”

But when I looked at my actual work, the results didn’t match the promises. Some days AI felt like a superpower. Other days it felt like a distraction.

I wanted to know: Am I using these tools wrong? Or is the reality more nuanced than the marketing suggests?

Then I found a Reddit thread in r/Backend where experienced developers were having the same conversation. The responses were refreshingly honest.

What Actually Works: The Good

SQL Generation

The most consistent positive feedback? SQL generation.

My experience:
- Prompt: "Find active users who haven't logged in for 30 days"
- AI output: Correct query in 3 seconds
- My time saved: About 10 minutes of docs lookup

Before AI:
SELECT user_id FROM users WHERE status = 'active' AND last_login_at < NOW() - INTERVAL '30 days';
(Would need to verify syntax, check INTERVAL format for PostgreSQL)

After AI:
Query generated correctly, syntax verified, edge cases handled.

One developer put it simply: “AI works best for me as a SQL starter or regex generator.”

The pattern is clear: when the problem is well-defined and the syntax is standard, AI accelerates significantly.

Well-Defined Codebases

Another observation that matched my experience:

“LLMs are effective at large tasks in well defined code bases where there are a LOT of examples to pull from.”

This explains why AI works better in some projects than others. If your codebase has consistent patterns, clear conventions, and abundant examples, AI can recognize and replicate those patterns.

Good for AI:
- Django project with 50 models following same patterns
- React components with consistent structure
- Express routes with standardized middleware

Bad for AI:
- New project with no established patterns
- Mixed coding styles across files
- Legacy code with inconsistent conventions

The CRUD Reality

“In reality most is just the same old CRUD development.”

This comment hit home. Most backend work isn’t novel algorithm design. It’s create, read, update, delete operations with some business logic layered on top.

AI handles CRUD operations well because:

The patterns are well-documented
Examples abound in training data
Requirements are usually clear

What Doesn’t Work: The Bad

Architecture and Design Decisions

The most striking limitation developers reported:

“Architecture, code structure, and abstractions are still all done entirely manually.”

I’ve experienced this repeatedly. When I ask AI to help with system design, I get generic responses:

My Question: "How should I structure this e-commerce backend?"

AI Response:
- "Consider microservices for scalability"
- "Use repository pattern for data access"
- "Implement caching for performance"

What I actually needed:
- Analysis of my specific team size (3 developers)
- Understanding of our deployment constraints
- Knowledge of our expected traffic patterns
- Budget considerations for infrastructure

The AI gave me options. It didn't help me decide.

The Overengineering Problem

One developer shared a concrete example that illustrates a key AI limitation:

Task: Implement a function using an Erlang library

LLM Approach:
- Generated 100+ lines of code
- Created unnecessary abstraction layers
- Missed idiomatic Erlang patterns
- Reinvented error handling that library already provided

My Manual Rewrite:
- 50 lines of code
- Used idiomatic patterns from library docs
- Leveraged existing OTP behaviors
- Integrated properly with ecosystem

Result: The "time saved" by AI was actually time spent understanding and then discarding its output.

This resonated with my experience. AI often generates code that works but isn’t idiomatic. It misses existing abstractions because it doesn’t understand the full context of your project.

Documentation vs AI

“For everything else, when I’ve used it, the problem has been better solved by just reading the docs.”

This was a common sentiment. AI is faster for quick lookups, but official documentation often provides:

Edge cases AI misses
Version-specific information
Security considerations
Best practices from the maintainers

The Lead Developer Perspective

Perhaps the most sobering comment came from a team lead:

“As a lead, who helps a team of 13 very regularly: it still sucks a lot.”

When your job involves debugging complex interactions, understanding multi-system dependencies, and making architectural decisions, AI’s limitations become very apparent.

Why This Pattern Exists

I’ve been trying to understand why AI works well in some contexts and fails in others. Here’s what I’ve observed:

The Training Data Problem

High-quality outputs:
- Common frameworks (React, Django, Rails, Express)
- Well-documented libraries (Lodash, NumPy, Pandas)
- Standard patterns (CRUD, auth, validation)

Lower-quality outputs:
- Niche libraries with limited training data
- Internal proprietary systems
- Novel architectural patterns
- Cutting-edge frameworks released after training cutoff

The Context Understanding Gap

AI generates code based on patterns it has seen. It doesn’t understand:

Your specific business requirements
Your team’s conventions and preferences
Your existing architecture constraints
The technical debt you’re navigating

The Idiom Blindness

AI often generates code that works but isn’t idiomatic:

-- AI generated (works but not idiomatic for this codebase):
SELECT u.id, u.email, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active'
GROUP BY u.id, u.email
HAVING COUNT(o.id) > 0;

-- Idiomatic for our codebase (uses our naming conventions):
SELECT
    u.user_id,
    u.email_address,
    COUNT(o.order_id) AS total_orders
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE u.is_active = TRUE
GROUP BY u.user_id, u.email_address
HAVING COUNT(o.order_id) > 0;

Both queries work. But the second matches our conventions, making it easier to maintain and less likely to cause confusion during code review.

My Practical Approach

After months of trial and error, I’ve developed a workflow that maximizes AI’s strengths while avoiding its weaknesses.

Use AI For

✓ SQL query generation and optimization
✓ Regular expressions (always test thoroughly)
✓ Boilerplate code (models, basic CRUD)
✓ Documentation generation
✓ Test scaffolding
✓ Code explanation (with verification)
✓ Finding syntax for common patterns

Avoid AI For

✗ Architecture decisions
✗ Complex system integration
✗ Business logic with specific rules
✗ Niche libraries or frameworks
✗ Security-critical code
✗ Performance-critical code
✗ Code I can't fully explain

My New Workflow

1. Define clearly before generating
   - Specify input/output contracts
   - List edge cases to handle
   - Note business rules

2. Generate, then verify
   - Run tests immediately
   - Check against official docs
   - Compare to existing patterns in codebase

3. If I can't explain it, rewrite it
   - Never commit code I don't understand
   - Better to write from scratch than debug unfamiliar AI code

A Real Example From My Work

I recently needed to implement a user notification system. Here’s how AI helped and where it fell short.

What AI Generated Quickly

# AI generated this correctly in seconds
from dataclasses import dataclass
from datetime import datetime
from enum import Enum
from typing import Optional

class NotificationType(Enum):
    EMAIL = "email"
    SMS = "sms"
    PUSH = "push"

@dataclass
class Notification:
    user_id: int
    type: NotificationType
    title: str
    body: str
    created_at: datetime
    read_at: Optional[datetime] = None

This saved me about 15 minutes of boilerplate typing.

What AI Got Wrong

# AI suggested this service:
class NotificationService:
    async def send(self, notification: Notification) -> bool:
        if notification.type == NotificationType.EMAIL:
            return await self._send_email(notification)
        elif notification.type == NotificationType.SMS:
            return await self._send_sms(notification)
        elif notification.type == NotificationType.PUSH:
            return await self._send_push(notification)
        return False

# Problems I discovered during review:
# 1. No rate limiting (spam risk)
# 2. No retry logic for failed sends
# 3. No audit logging (compliance requirement)
# 4. No user preference checking
# 5. No batching for performance
# 6. Missing circuit breaker for third-party services

The AI code worked for the happy path. But production requires handling failures, rate limits, compliance, and user preferences—none of which AI knew about.

What I Actually Needed

class NotificationService:
    def __init__(
        self,
        rate_limiter: RateLimiter,
        audit_logger: AuditLogger,
        user_prefs: UserPreferenceStore,
        circuit_breaker: CircuitBreaker,
    ):
        self.rate_limiter = rate_limiter
        self.audit_logger = audit_logger
        self.user_prefs = user_prefs
        self.circuit_breaker = circuit_breaker

    async def send(self, notification: Notification) -> SendResult:
        # Check user preferences first
        if not await self.user_prefs.is_enabled(
            notification.user_id, notification.type
        ):
            return SendResult.SKIPPED

        # Apply rate limiting
        allowed = await self.rate_limiter.check(
            f"notif:{notification.user_id}",
            limit=10,
            window=3600,
        )
        if not allowed:
            return SendResult.RATE_LIMITED

        # Send with circuit breaker and retry
        try:
            async with self.circuit_breaker:
                result = await self._dispatch(notification)
                await self.audit_logger.log(notification, result)
                return result
        except ThirdPartyError as e:
            await self.audit_logger.log_failure(notification, e)
            raise

This is the code that actually went to production. The structure is similar to what AI suggested, but the implementation addresses our specific requirements—things AI couldn’t know.

The Honest Assessment

After tracking my productivity for several months:

Task                    | AI Speedup | Notes
------------------------|------------|---------------------------
SQL queries             | 70%        | Almost always correct
Boilerplate models      | 60%        | Minor tweaks needed
Documentation           | 50%        | Good starting point
Test scaffolding        | 40%        | Need to add edge cases
Feature implementation  | 10-20%     | Context matters heavily
Architecture design     | 0%         | AI provides options, not decisions
Bug fixing              | -10%       | Understanding AI code takes time
Code review             | -20%       | More code to review

The productivity gains are real but concentrated. AI is a specialized tool, not a general-purpose accelerator.

What This Means For Developers

The developers who get the most from AI coding tools:

Know when to use AI - They recognize problems AI handles well
Verify everything - They test AI output against docs and edge cases
Maintain ownership - They never commit code they can’t explain
Understand limitations - They don’t ask AI for what it can’t provide
Keep fundamentals sharp - They can work without AI when needed

The developers who struggle with AI:

Trust it blindly - They commit without understanding
Expect too much - They ask AI for architecture decisions
Lose ownership - They can’t debug their own codebase
Skip verification - They miss bugs AI introduces

The Bottom Line

AI coding tools are genuinely useful for specific tasks: SQL generation, regex patterns, boilerplate code, and documentation. The productivity gains in these areas are real and significant.

But the marketing oversells. AI cannot make architecture decisions, understand business context, or generate idiomatic code for unfamiliar libraries. The fundamental skills of software engineering—system design, problem decomposition, and technical judgment—remain essential.

The most effective approach I’ve found: use AI as an accelerator for routine tasks, not as a replacement for engineering judgment. The productivity gains come from knowing the difference.

What works:
- SQL, regex, boilerplate in well-defined codebases
- Documentation and explanation
- Getting started with standard patterns

What doesn't work:
- Architecture and design decisions
- Complex integrations
- Novel or niche technology
- Understanding your specific business context

The key insight:
AI is a tool that amplifies existing expertise.
It doesn't replace the need for that expertise.

The honest truth: AI has made me faster at certain tasks, but it hasn’t fundamentally changed how I approach software development. The hard problems still require human thinking. The easy problems are just easier now.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit r/Backend Discussion on LLM Effectiveness
👨‍💻 Claude Code Documentation
👨‍💻 AI Coding Tools Research

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!