How AI triage actually works (and why it's not keyword matching)

When a product says “AI-powered,” it usually means one of two things: there’s a genuine large language model involved, or there’s a very confident marketing team. The difference matters enormously for support triage.

Keyword-based triage — routing emails based on whether the subject line contains “refund,” “cancel,” or “bug” — has existed for years. It works, badly, for simple cases. But it breaks immediately the moment customers stop writing the words you expect.

“I’m done with this” could be a churn signal. It could also be a satisfied customer finishing an onboarding process. Keyword matching can’t tell the difference. A language model can.

What LLMs actually do differently

A large language model doesn’t scan for specific words. It reads the email the way a human would — understanding the full meaning of the message in context.

That means:

Intent over phrasing. “I’d like to stop my subscription” and “can you please just cancel everything” and “I’ve decided to move on” all express the same intent. Keyword matching catches the first one and misses the other two. An LLM categorizes all three identically.

Urgency from tone. “Hi team, when you get a chance, could you help me with my invoice?” is low urgency. “This is the third time I’ve emailed — I need this fixed TODAY” is high urgency. That distinction comes from reading tone and context, not keywords.

Distinguishing noise from signal. Out-of-office auto-replies, promotional emails from vendors, newsletters that somehow make it to your support address — these should be filtered out before they ever reach your team. An LLM can reliably identify them. A keyword list needs to be maintained manually and still misses edge cases.

How MailBridge does it

When an email arrives at your inbound address, it goes through a structured pipeline before it ever reaches your Slack channel.

Sanitization first. Before anything touches the LLM, the email body is cleaned. Quoted reply chains (the ”> Original message” blocks) are stripped. Email headers, HTML artifacts, and tracking pixels are removed. What remains is just the actual content of the message.

LLM classification. The cleaned message is passed to a language model with a structured prompt that asks it to determine:

The category (billing, technical, feature request, general inquiry, etc.)
The urgency level (high, medium, low)
A short, human-readable summary
Whether the message is a real customer email or noise (auto-reply, spam, newsletter)

The model returns a structured JSON response. Not a paragraph. Not a confidence score. A clean, typed object that the system can act on deterministically.

Routing based on the result. Once we have a category, the routing engine checks your configured rules. If you’ve set up “billing emails → #billing-support” and “bug reports → #bugs,” those rules fire on the category — not on keyword guessing.

Why structured output matters

One thing that sets production AI pipelines apart from demos is structured output. It’s easy to prompt an LLM and get a paragraph back. It’s harder to get a reliable, parseable response that a downstream system can act on without hallucinating extra fields or dropping required ones.

MailBridge uses a strict output schema and validates the model’s response before acting on it. If the model returns something malformed — which happens, rarely — the email is flagged for human review rather than silently dropped or misrouted.

What the LLM doesn’t see

Privacy matters. MailBridge sanitizes emails before classification, and the model prompt is designed to extract only the metadata needed for routing: category, urgency, and summary.

The model doesn’t need to see — and isn’t given — financial account numbers, medical information, or other sensitive data in order to categorize an email. The sanitization layer strips patterns that match these formats before the content ever reaches the model.

Your customers’ sensitive information stays in your system. The LLM sees enough to route. No more.

The honest limitation

LLMs aren’t perfect classifiers, and we won’t pretend otherwise. Occasionally a billing complaint gets categorized as a general inquiry. Occasionally a feature request gets marked as medium urgency when it should be low.

But the failure modes are different from keyword matching. A keyword system misses everything that doesn’t use the right words. An LLM system makes occasional judgment calls that a human might make differently. One fails systematically; the other fails the way people do.

The practical result: MailBridge triage is accurate enough that most teams never need to touch the routing configuration after initial setup. The categories your team cares about — billing, bugs, feature requests, churn signals — route correctly, consistently.

And when something does get miscategorized, it lands in your default channel rather than being lost. A human catches it. The system learns from the miss.

The bottom line

AI triage isn’t magic, and it’s not keyword matching with extra steps. It’s a language model reading email the way a human support lead would — and doing it at the speed and scale a human can’t sustain.

That’s what the “AI-powered” label should mean.