Social Dolphin Services
SDS · Field notes

AI doxxing is a design failure, not a bug

Prompts are not a security boundary. Tool wrappers are.

Type
Field note
Date
14 May 2026
Audience
Founders and engineering leaders shipping AI features

On May 13, MIT Technology Review published Eileen Guo's piece on a quiet pattern showing up across the public chatbot frontier: people are getting WhatsApp messages, phone calls, and unsolicited contact from strangers because a chatbot handed out their personal number as the answer to someone else's question.

Daniel Abraham is an Israeli software developer. He posted his cell number once in 2015 on a Quora-like local site. In mid-March 2026 he started getting WhatsApp messages from strangers asking for PayBox customer service help. Gemini had been telling people his personal number was the PayBox support line. PayBox confirmed they have no WhatsApp support number. Meira Gilbert, a PhD candidate at the University of Washington, searched her colleague Yael Eiger's contact info on Gemini and got Eiger's personal cell number, originally shared once in 2025 for a technology workshop. A Reddit user reported a full month of harassment from strangers seeking a lawyer, a locksmith, a product designer, all directed to him by a chatbot that confidently misidentified him as each of them.

DeleteMe, a service that removes personal data from the public web, says customer queries about generative AI privacy are up 400% over seven months. 55% mention ChatGPT, 20% Gemini, 15% Claude, 10% other.

This is not a quirky bug. It is a design failure that any team shipping AI features has to take seriously, because the same pattern, scaled down, is on every AI feature roadmap right now.

The wrong frame: "we just need better guardrails"

The reflex inside an AI team that ships something like this is to add guardrails to the prompt. "Refuse to return personal contact information." "Do not provide phone numbers, addresses, or emails of private individuals." The behavior is described as an oversight to patch.

Guo's reporting documents how that frame fails in practice. ChatGPT initially blocks "find this person" requests, then helpfully offers what the article calls "investigative-style alternatives," suggesting users provide neighborhood information or co-owner names to "narrow things down." Grok, prompted with "[name] address," returns home addresses, phone numbers, work addresses, and similar-sounding names' addresses in nearly all cases. Gemini surfaces info that was technically public somewhere on the web but was effectively buried until the chatbot promoted it to a direct answer.

The pattern across all three is the same: the policy was in the prompt, and the prompt is not the right control surface. A motivated user can negotiate the guardrail away. The prompt drifts. The next model update changes how it responds to the same instruction. The guardrail does not survive contact with adversarial use, and frankly it does not survive contact with curious use either.

The deeper issue is structural. These systems were architected with the assumption that "if it is publicly accessible somewhere on the web, the AI can use it and surface it." Yael Eiger's number was technically public; she posted it once for a workshop. The architectural decision to treat available and visible as the same thing is what makes a buried 2025 forum post into a service-desk number a stranger calls in 2026.

The right frame: AI doxxing is a design failure

We treat AI features the way you would treat any new hire on day one. Trained on your playbook. Scoped on what it can touch. Supervised on the decisions that actually matter. The same posture that prevents an over-eager new employee from sending a customer a competitor's price list also prevents an AI feature from leaking a phone number.

Concretely, four design moves separate a feature that can leak from a feature that cannot.

Privacy by design, not as a layer added later

The cleanest fix is that the model never sees the leakable data in the first place. AI features get abstracted profiles, roles, tags, capabilities, internal IDs. They do not get raw names, raw phone numbers, raw addresses. When the model never sees "Jane Doe, 555-123-4567, 123 Oak Street," the model cannot leak it.

This is a stronger statement than "we redact PII before training." It is the stance that, for every AI feature, you ask "what is the minimum the model needs to do its job," and you build the prompt assembly to provide only that. Most features need a role, a status, an action, a category. Very few features need a phone number, and the ones that do can use a tokenized reference that resolves to the real value only inside the trusted application layer.

Policy in the tools, not in the prompts

Prompts are not a security boundary. Tool wrappers are. A chatbot that has a "look up customer service contact" tool can have that tool refuse to return any number not in an explicit allow-list of verified business support numbers. The tool refuses regardless of how the user phrases the question, regardless of what the model wants to do, regardless of how persuasive the conversation gets. The boundary is in code, not in instructions.

The same pattern handles every category. A scheduling tool with no concept of personal phone numbers cannot accidentally include one. A document retrieval tool with an explicit content classification cannot retrieve documents flagged as containing sensitive PII for a chat surface that is not authorized to see them. The model is reduced from "intelligent agent that decides what to do" to "decision-maker that picks among well-scoped tools," and the scoping is where safety lives.

This is the same pattern we wrote about in production-grade AI managers applied to a different failure mode. The "AI manager ordered 6,000 napkins" failure and the "AI chatbot gave out a personal phone number" failure have the same root cause: too-broad tools, policy in prompts.

Defense in depth: input redaction plus output filtering

Even with the first two moves in place, you build the third move as if both of them will sometimes fail.

On the input side, the gateway layer scrubs incoming requests for personal identifiers before the prompt is assembled. If a user pastes a third party's number, email, or address into a chat, the system either tokenizes the value or refuses to send it forward. The agent sees the request without the leakable payload.

On the output side, every response is scanned before it leaves the system. Phone-number patterns, email-pattern matches, address-shaped strings, government ID formats, all get checked against the response schema for that endpoint. The schema says what each endpoint is allowed to return; the scan rejects anything outside it. If the schema for a help-bot endpoint says "may return business category and city, never a phone or email," and the model emits one anyway, the response is rewritten or rejected before the user sees it.

Neither layer alone is sufficient. Together, they make a single failure into a near-miss instead of an incident.

Curated inputs, not the open web

The structural fix that closes most of the remaining surface area is the input pipeline itself. AI features should retrieve from curated, purpose-limited stores, not from the open web. Public marketing content, official help docs, internal SOPs that have been reviewed for sensitive data: these are the safe inputs. Web scrapes, data broker feeds, and "we ingested everything we could find" corpora are not.

The MIT Tech Review piece notes that 31 of 578 California-registered data brokers reported sharing consumer data with GenAI developers in the past year. That pipeline is the structural pre-condition for the Daniel Abraham incident. Refusing to ride on it, at the architecture level, is the move.

What this article is not

  • Not a critique of Google, OpenAI, or xAI personally. The companies face genuinely difficult problems at the public-chatbot frontier, with training corpora that predate the privacy posture they would have chosen with hindsight. The point is not "those companies got it wrong"; the point is that the same failure mode will land in any AI feature that ships with the same architectural assumptions, including features we build ourselves.
  • Not a claim that we have a productized "AI privacy gateway" or "AI doxxing prevention" SKU for sale. We do not. The patterns above are the architectural posture we bring to AI engagements. The implementation is scoped to your stack, your data classes, and your specific use cases inside an engagement.
  • Not a regulatory or legal compliance piece. GDPR, CCPA, COPPA, and HIPAA all touch this surface; the architectural moves above generally make compliance easier, but compliance is a separate conversation with your legal counsel and our compliance scaffolding work.
  • Not a one-size-fits-all blueprint. The right input-redaction patterns for a healthcare chatbot are different from the right ones for a customer support assistant on a marketing site. The principles transfer; the implementation does not.

One-sentence takeaway

AI doxxing is not what happens when a chatbot gets confused; it is what happens when a system was architected on the assumption that "publicly accessible somewhere" means "fine to surface as a direct answer," and the fix is in the architecture, not in the prompt.

Talk to us

If you are shipping an AI feature this quarter and the question "could this leak something it should not" is open, the next move is a 30-minute conversation. We will walk through the data the feature touches, the tool boundaries it should have, and where input and output filtering would catch the failure modes most likely to land in your specific use case. If a deeper engagement is the right next step, we will scope it on the call. If it is not, we will tell you what we would tighten and you can take it from there.

We do not take every engagement, and we will tell you on the call whether we are the right partner.

Sources