← ALL WRITINGSWritings6 MIN READ

Mar 2026 · 6 min

Building an LLM prompt-injection firewall in 48 hours

How SentryML detects prompt-injection attacks in real time — with explainability, sub-millisecond latency, and an open-source SDK.

The problem

LLM-powered products are vulnerable to prompt injection — users can smuggle instructions that override your system prompt, exfiltrate data, or hijack tool calls. Most teams either ignore it until an incident, or bolt on heavyweight security that slows every request and gives no explanation when something is blocked.

At the NatWest × Google Cloud hackathon, the brief was clear: build something a real product team could drop inline, not a research demo. The guardrail had to be fast, explainable, and shippable.

What we built

SentryML is a security layer that sits between user input and your LLM. A three-line Python SDK — configure, guard, ship — scans every message before it reaches the model. If an attack is detected, it raises a structured exception with severity, attack type, and token-level SHAP explanations so engineers know *why* something was blocked.

Under the hood: a fine-tuned DistilBERT classifier for speed, semantic similarity against known attack archetypes as a fallback, and a FastAPI service deployed to GCP Cloud Run. A React dashboard gives real-time visibility into threats, latency, and carbon cost per scan.

Results

97% detection accuracy on our evaluation set, with sub-millisecond inference latency — fast enough to run inline on every request without users noticing. Pitched to NatWest and Google Cloud industry judges, then published as open-source Python and JavaScript SDKs on PyPI and npm.

The design choice that mattered most was explainability. Blocking without context trains teams to disable the guardrail. SHAP token attribution and a human-readable threat summary mean security engineers can trust and tune the system instead of fighting it.

Stack

PythonFastAPIDistilBERTSHAPFirebaseGCP Cloud RunReact

More writings

usmanmateen · 2026