An AI visibility audit — also called an AEO audit or AI SEO audit — measures whether, and how, AI answer engines such as ChatGPT, Perplexity, Google AI Overviews, and Gemini mention your brand when people ask questions in your category, and then diagnoses why. A traditional SEO audit asks where you rank in a list of links. An AI visibility audit asks a different question entirely: when an engine synthesizes an answer from many sources, are you one of the sources it cites — and is it citing you accurately and favorably? Because the outcome is different, the measurement method is different too.
From AEO Agency Team
This article explains, step by step, how a rigorous audit is run, what it measures, and what separates a credible audit from a checklist of generic tips.
Scope. This methodology covers brand and category visibility inside generative answer engines. It assumes you already have a functioning website and conventional SEO basics in place; an AI visibility audit builds on top of those, it does not replace them.
Core Components of an AEO/GEO Audit
An AEO/GEO audit examines six core components:
- Prompt-Set & Engine Coverage — The audit first defines the real questions buyers ask in your category — cold, solution, and branded prompts — and tests them across ChatGPT, Perplexity, Google AI Overviews, and Gemini, per language and market.
- Cold-Prompt Baseline Measurement — Each prompt is run in clean, non-personalized sessions and repeated multiple times, because AI answers are non-deterministic. It records whether you are cited, how often, in what position, and with what sentiment — a measurement, not a single screenshot.
- Retrievability & Crawlability — Checks whether AI crawlers (GPTBot, PerplexityBot, Google-Extended) can reach and parse your site: robots.txt access, server-side rendering, clean content chunking, and valid structured data.
- Citation-Gap Analysis — When you are not cited, the audit inventories who is — competitors, directories, or third-party editorial — because the type of source winning a prompt reveals the actual path to inclusion.
- Entity & Authority Signals — Assesses whether engines recognize your brand as a coherent, corroborated entity: Organization schema with
sameAslinks, consistent identity across the web, and independent third-party mentions. - Content & Synthesis-Fit — Evaluates whether your pages are answer-first, evidence-backed, and phrased to match real prompts, so engines preferentially lift them into synthesized answers.
Beyond these, the audit then quantifies your share of voice against named competitors and turns every finding into a prioritized roadmap.
Step 1 — Define the prompt set and the engines
Everything downstream depends on this step, and most weak audits skip it. Before measuring anything, you define the exact set of questions a real buyer would ask an AI engine in your category, and the engines you will test against.
Three classes of prompt matter, and they are not interchangeable:
- Cold category prompts — unbranded questions a prospect asks before they know you exist (e.g. “who does AEO audits in Greece”, “how do I improve my brand’s visibility in ChatGPT”). These are the prompts that win new customers, and the hardest to influence.
- Solution prompts — questions about the problem you solve (e.g. “why isn’t my site cited by AI”, “how to measure AI search visibility”). These test topical authority.
- Branded prompts — questions that name you directly. These test whether the engine describes you accurately.
The engines tested typically include ChatGPT, Perplexity, Google AI Overviews, and Gemini, because each retrieves and synthesizes differently — a brand can be strong in one and invisible in another. The prompt set is also defined per language and per market, since results in Greek and English for the same query are rarely the same.
Step 2 — Baseline measurement with cold-prompt testing
This is the heart of the audit, and the part that cannot be faked with a tool dashboard alone. Each prompt is run cold — in clean, non-personalized sessions, so the result reflects what a stranger would see rather than what the engine already knows about you. For each prompt and each engine, the audit records:
- Citation presence — are you mentioned at all?
- Citation share — of all sources cited in the answer, what proportion are yours?
- Position within the synthesis — are you the lead reference or a footnote?
- Sentiment — is the way the engine describes you accurate and favorable, or neutral, or wrong?
- Cited sources — which pages and domains the engine actually pulled from.
Because AI answers are non-deterministic, each prompt is run multiple times and results are aggregated, not read from a single response. A single screenshot is an anecdote; a repeated, structured run is a measurement. This distinction is what makes a baseline defensible when a skeptical client asks “how do you know?”
Step 3 — Retrievability audit: can the engines reach you?
Before content can be cited, it has to be reachable and parseable by the systems doing the citing. This step checks the mechanical layer:
- AI crawler access — whether AI user-agents (e.g. GPTBot, PerplexityBot, Google-Extended) are allowed in
robots.txt, and whether anything blocks them at the server or CDN level. - Rendering — whether key content is present in the served HTML or hidden behind client-side rendering the crawler may not execute.
- Content chunking — whether the page is structured into clean, self-contained blocks an engine can extract one at a time, rather than long undifferentiated text.
- Structured data — whether
Article,FAQPage,Organization, and other relevant schema are present and valid. - Entity clarity — whether each page makes unambiguous what it is about, so the engine can resolve the topic.
A site can have excellent content and still be invisible because an AI crawler was quietly blocked. This step catches that before anyone touches the content.
Step 4 — Citation gap analysis: where the answers come from instead
When you are not cited, the engine is citing someone else — and that “someone else” is the most useful diagnostic in the whole audit. This step inventories the sources the engines actually pull from for your priority prompts and classifies them:
- Competitors — direct rivals who have earned the citation.
- Directories and listings — aggregators (industry directories, “best agencies” lists) that AI engines lean on heavily for “who does X” questions.
- Third-party editorial — articles, roundups, and references published by others.
- Reference sources — encyclopedic or authoritative domains.
The pattern reveals the path to inclusion. If “who” prompts are answered almost entirely from directories and third-party lists, the fix is off-site presence, not another blog post on your own domain. If competitors win on content, the gap is in your pages. Naming the actual cited sources turns a vague goal (“be more visible”) into a concrete target list.
Step 5 — Entity and authority audit
Generative engines reason about entities, not just pages. This step assesses whether your brand is recognized as a coherent, trustworthy entity in your category:
- Entity resolution — does the engine recognize your organization as a distinct entity, with consistent identity across the web?
- Knowledge graph and structured identity —
Organizationschema withsameAslinks to your verified profiles, consistent name and details across directories and platforms. - Third-party corroboration — independent sources that mention or describe you, which engines use as evidence of legitimacy.
Authority here is not a single number; it is the consistency and corroboration of your identity across sources the engine trusts. Inconsistent or thin entity signals are a common, invisible reason a brand never enters the candidate set for category answers.
Step 6 — Content and synthesis-fit analysis
This step evaluates whether your content is written in a way that engines preferentially pull into a synthesized answer. Research into generative engine optimization has found that certain content characteristics measurably increase visibility inside AI-generated answers, including: original statistics and data, cited sources, direct quotations, authoritative and specific language, and clear question-and-answer structure. The audit checks your priority pages against these patterns:
- Answer-first structure — does each section lead with the direct answer the engine can lift?
- Data and evidence — does the page contain original, citable figures rather than only opinion?
- Sourcing — are claims backed by references the engine can verify?
- Question matching — do headings and FAQ blocks mirror the way real prompts are phrased?
The output is a page-by-page list of what to add or restructure so the content fits how synthesis actually selects sources.
Step 7 — Competitive share of voice
Across the full prompt set, the audit measures who gets cited and how often — your share of voice against named competitors, per engine. This converts scattered observations into a single comparative picture: where you lead, where a competitor owns the answer, and where the category answer cites no one strongly (an opening). Share of voice is tracked per engine because dominance in one does not transfer to another.
Step 8 — Prioritized roadmap
A list of problems is not an audit; a sequenced plan is. This step turns every finding into an ordered set of actions, prioritized by expected impact against effort. Typically that means fixing retrievability blockers first (they gate everything else), then closing the highest-value citation gaps, then strengthening entity signals and content. Each recommendation ties back to a specific prompt or finding from the measurement, so the rationale is traceable rather than generic.
Step 9 — Re-measurement and monitoring
AI answers change — engines update models, re-crawl, and re-rank sources continuously, so a single audit is a snapshot, not a verdict. A credible engagement re-runs the same prompt-set protocol after changes are implemented, using the identical method as the baseline, so movement is measured like-for-like rather than estimated. Ongoing monitoring tracks citation share and share of voice over time and flags regressions caused by competitor activity or engine updates.
What a credible audit actually measures
The metrics below are what separate a measured audit from an opinion. A trustworthy audit reports, per engine and per prompt set:
- Citation share — your proportion of cited sources in the answer.
- Citation frequency — how often you appear across repeated runs.
- Position within the synthesis — lead reference versus passing mention.
- Sentiment — accuracy and favorability of how you are described.
- Share of voice — your citations relative to named competitors.
If an audit cannot produce these numbers and explain how they were gathered, it is describing AI visibility rather than measuring it.
Frequently asked questions
Why the methodology matters more than the label
The market has produced a cluster of competing terms — AEO, GEO, AIO, LLMO — that largely describe the same underlying work: getting accurately cited inside AI-generated answers. The rigor of an audit is not in which label it carries. It is in whether it measures real engine behavior with a repeatable method, names the actual sources answers come from, and produces metrics it can defend when challenged. A checklist of tips is not an audit. A repeatable measurement is. Request an AI Visibility Audit




