AI Visibility / GEO/
Mentioned but not cited: the five states of AI-answer visibility
A five-state model for AI-answer visibility — full, mention-only, citation-only, third-party, invisible — and why the mention–citation gap is the cheapest thing to measure and the most expensive to ignore.
title: "Mentioned but not cited: the five states of AI-answer visibility" series: "AI Visibility / GEO" date: "2026-02-10" summary: "A five-state model for AI-answer visibility — full, mention-only, citation-only, third-party, invisible — and why the mention–citation gap is the cheapest thing to measure and the most expensive to ignore."
A buyer used to open Google, type "best issue tracker for startups," and scan ten blue links. Now a lot of them ask ChatGPT or Perplexity instead, and the assistant just names a few tools and cites a few URLs. If your product isn't in that short answer, the buyer never learns it exists — and your Google rank had nothing to do with it.
I've spent a good while building tools that measure this, and the first thing I had to fix wasn't code. It was the question. "Are we visible in AI search?" is too coarse to act on, because there are several different ways to be visible and they need different fixes.
Mention is not citation
Here's the trap. You ask an assistant a question and your brand shows up in the prose: "Linear is popular with startups." You're mentioned. Great. But scroll to the sources and your domain isn't in them — it cited a Reddit thread and a competitor's comparison page. The model talks about you using someone else's words. You have visibility with zero attribution and zero click.
The opposite happens too: your domain gets cited as a source, but the model never says your name in the answer the user reads. You did the work to be the reference, and the user walks away remembering a different brand.
Treating both of those as "visible = yes" hides the exact problem you'd want to fix. So I model every answer as landing a brand in exactly one of five states:
| State | What it means | What it usually means for you |
|---|---|---|
| Full visibility | Named in the answer and your domain is cited | The goal. You're the recommendation and the source. |
| Mention-only | Named, but your domain isn't cited | You're known, but losing the attribution and the click. |
| Citation-only | Your domain is cited, but you're not named in prose | You're the source, but the user remembers someone else. |
| Third-party mention | Named only inside a third-party cited source | You exist in the model's world entirely through other people. |
| Invisible | Neither named nor cited | You're not in the conversation at all. |
Once you split it this way, the strategy writes itself. A pile of mention-only states is a citation problem — your content isn't structured to be quoted as a source. A pile of citation-only states is a naming/brand problem. Invisible across the board is a content-and-authority problem. One number ("visibility %") would have averaged all three into a shrug.
The single most useful derived metric I track is the mention–citation gap: mention rate minus citation rate. A big positive gap means people hear about you secondhand but you're not the source of truth — which is the most fixable and most expensive state to ignore.
You can't measure it once
The second thing that surprised me: AI answers are not stable. Ask the same question twice and you can get different brands, in a different order, with different sources. Reported run-to-run brand consistency sits around 30%, and citations drift 40–60% month to month. So a single query tells you almost nothing.
The fix is boring but non-negotiable: sample each prompt many times (I use at least 30 runs for live measurement) and always report the run count next to the number. "73% visible" means nothing without "over how many runs." A metric you can't reproduce isn't a metric; it's a vibe.
I open-sourced the measurement
I pulled a generalized slice of this out of the production work and put it on GitHub as geocheck — a small tool that scores a brand's AI-answer visibility with these five states and a set of transparent, unit-tested formulas (share of voice, citation position, the mention–citation gap, a "blind spot" rate for when you're invisible but a competitor isn't).
Two design choices I'd repeat:
- It runs offline with no API key. A
mockprovider replays recorded fixtures, so the demo, the tests, and CI are fully reproducible. The numbers in the demo are synthetic — they're there to show the shape of the output, not to rank real brands. - The extraction step is graded, not assumed. Deciding "is this brand named? is its domain cited?" is itself a model task that can be wrong, so it's checked against a hand-labeled gold set on every CI run (F1 0.985, Cohen's κ 0.976) and the build fails if it regresses. The same harness can validate an LLM-as-judge against the human labels before you trust it — which is the only honest way to use a model to grade a model.
The takeaway
If you're trying to win AI search, stop asking whether you're "visible" and start asking which state you're in, for which prompts, over enough runs to mean something. Mention-only and citation-only need opposite fixes, and the gap between them is the cheapest thing to measure and the most expensive thing to ignore.
If you want to see the states on your own brand, geocheck runs in one command: uv run geocheck check. Point it at a real provider with your own key when you want live numbers.