Skip to content
CitedMetrics

How we measure AI visibility: full methodology

Published 2026-06-10 · Updated 2026-06-10 · David King

TL;DR: we measure what AI engines tell buyers about a brand using 50–150 real buying-intent prompts, run across ChatGPT, Claude, Gemini and Perplexity, three times each, scored against a fixed rubric, with every finding linked to a dated raw response. This page publishes the method in full — both so audit buyers can verify what they paid for, and so anyone evaluating any AI-visibility vendor knows what to demand.

How are prompts selected?

Prompts are generated from the client's category, not from their keywords: comparison questions ("best X for Y"), problem questions ("how do I solve Z"), validation questions ("is [brand] worth it"), and alternative-seeking questions ("[competitor] alternatives") — branded and non-branded, mapped to funnel stages. 50–150 distinct prompts per audit, disclosed in the report's prompt appendix.

How is each answer scored?

  • Mentioned — the brand appears anywhere in the answer.
  • Cited — the brand's own site (or a page about it) is used as a source.
  • Recommended — the engine names the brand as a pick, not just a mention.
  • Invisible — none of the above. The zero rows are usually the reason an audit gets commissioned.

Sentiment and factual accuracy are coded separately: what the engine says about the brand, and whether it's true. False claims (discontinued products, wrong pricing, phantom acquisitions) are flagged as hallucination findings with the raw response attached.

How is engine randomness handled?

Every prompt runs three times per engine in clean sessions — no prior context, generic IPs. Scores are averaged and the variance is reported, not hidden. High-variance prompts are flagged: an answer that names you once in three runs is a coin-flip, not a presence.

What's in the evidence chain?

Every number in the report traces to raw engine responses with model versions and run dates. The raw export ships with the report so a client's team can re-check any claim. We consider this table stakes; the market mostly doesn't provide it.

How is AI visibility calculated?

Four rates, all over repeated runs. Mention rate: runs naming the brand ÷ total runs. Citation rate: runs where the brand's own site is a source ÷ total runs. Recommendation rate: runs endorsing the brand ÷ total runs. Share of voice: the brand's mentions ÷ all brand mentions in the category. Every rate is reported per engine and overall, with the run count attached — a rate without its denominator is marketing, not measurement.

How should you judge any AI-visibility audit (including ours)?

  • Enough prompts to matter — 50+, not 5.
  • More than one engine — ChatGPT alone ignores where half your buyers ask.
  • Repeat runs — single runs are anecdotes.
  • A competitor benchmark — your score means nothing without share-of-voice context.
  • Technical root causes — knowing you're invisible isn't enough; you need to know why.