Why run every prompt three times?

The same prompt can produce different answers. One run is an example, not a measure. We run each prompt three times, average the result, and report the spread.

Why do model versions and run dates matter?

Engines change often. An undated claim cannot be checked later. Every finding includes the model version and run date, so another person can repeat the test.

How we measure AI visibility: full methodology

Published 2026-06-10 · Updated 2026-07-23 · David King

A controlled prompt set passing through four parallel systems and repeated observation layers into an evidence archive. — Original editorial illustrationThe methodology fixes the inputs, repeats the observations and retains variance instead of hiding it.

TL;DR: we test 50–150 real buyer prompts across ChatGPT, Claude, Gemini, and Perplexity. We run each prompt three times and use one fixed scoring guide. Every finding links to a dated raw answer. This page explains the full method so buyers can check our work.

How are prompts selected?

Prompts come from the client's market and buyer journey, not a keyword export. We test comparisons, problems, brand checks, and rival options. The set includes branded and non-branded prompts at each buying stage. Each audit uses 50–150 prompts, all listed in the report.

How is each answer scored?

Mentioned — the brand appears anywhere in the answer.
Cited — the brand's own site (or a page about it) is used as a source.
Recommended — the engine names the brand as a pick, not just a mention.
Invisible — none of the above. The zero rows are usually the reason an audit gets commissioned.

We score tone and facts on separate fields. One asks what the engine says about the brand. The other checks if the claim is true. Wrong prices, old products, and false company events are flagged, with the raw answer attached.

How is engine randomness handled?

Each prompt runs three times per engine in a clean session. There is no prior chat context. We average the scores and show the spread. If a brand appears once in three runs, we flag the result as unstable.

What's in the evidence chain?

Every report number traces to a raw answer with a model version and date. The client receives the full export and can check any claim. Evidence is part of the product, not an optional appendix.

How is AI visibility calculated?

We report four rates over all repeat runs. Mention rate is the share of runs that name the brand.Citation rate is the share that use the brand's site as a source.Recommendation rate is the share that endorse the brand. Share of voice is the brand's part of all brand mentions. Each rate includes its run count and is split by engine.

How should you judge any AI-visibility audit (including ours)?

Enough prompts to matter — 50+, not 5.
More than one engine — ChatGPT alone ignores where half your buyers ask.
Repeat runs — single runs are anecdotes.
A competitor benchmark — your score means nothing without share-of-voice context.
Technical root causes — knowing you're invisible isn't enough; you need to know why.