Free GEO Audit: Check Your Site's AI Search Readiness
A GEO audit answers one blunt question: can AI search engines actually find, read, and cite your pages? You can check the major signals by hand in about 30 minutes — robots.txt rules for AI crawlers, content structure, schema, llms.txt, and citation signals — or run a scanner and get a 0-100 readiness score in seconds. This page walks through exactly what a generative engine optimization audit inspects, how to do it manually, and how to read the result so you know what to fix first.
What a GEO audit actually checks
A generative engine optimization (GEO) audit is not an SEO report with a new label. SEO asks whether you rank in a list of blue links. A GEO audit asks whether an AI answer engine can retrieve your content and reproduce it as a cited source. Those are different mechanics, so they need different checks.
A complete audit covers five layers. The first three are pass/fail technical gates — if you fail them, nothing downstream matters. The last two are quality signals that decide whether you actually get pulled into an answer.
- AI crawler access — Does robots.txt allow the bots that matter — GPTBot and OAI-SearchBot (OpenAI), ClaudeBot and Claude-SearchBot (Anthropic), PerplexityBot, and Google-Extended for AI Overviews and Gemini? A single overly broad Disallow can lock all of them out.
- Extractability — Can the answer be lifted as plain text? Content rendered only by client-side JavaScript, buried in images, or trapped in complex tables is hard for retrieval systems to parse cleanly.
- Content structure — Clear H1/H2 hierarchy, a direct answer near the top, short self-contained paragraphs, and question-shaped headings. AI systems excerpt passages, not whole pages, so each section should stand alone.
- Structured data — Schema.org markup (Article, FAQPage, HowTo, Product, Organization) gives machines an unambiguous read of what the page is and who wrote it.
- Citation and authority signals — Named authors, dates, sources, an llms.txt file, and corroboration elsewhere on the web. These influence whether a model trusts you enough to cite you by name.
Run the audit manually: a do-it-now checklist
You can validate the technical gates yourself right now with a browser and your site's source. Work top to bottom — the access checks come first because they gate everything else.
- 1. Pull your robots.txt — Visit yourdomain.com/robots.txt. Confirm none of the AI user-agents below are disallowed and that no blanket `User-agent: *` rule accidentally blocks them.
- 2. Check JavaScript dependence — Disable JavaScript in your browser (or view the raw HTML with View Source) and reload a key page. If the main content vanishes, retrieval crawlers likely miss it too.
- 3. Scan your heading structure — One H1, descriptive H2s, and a one- or two-sentence answer directly under each heading. If a section only makes sense in the context of the whole page, tighten it.
- 4. Validate schema — Run a key URL through Google's Rich Results Test or Schema.org validator. Confirm the page type, author, and publish date are present and error-free.
- 5. Look for llms.txt — Check yourdomain.com/llms.txt. If it returns a 404, you have no machine-readable map of your best content — add one (template below).
- 6. Spot-check citation signals — Does the page name a real author, cite sources, and carry a visible date? Search your brand or topic in ChatGPT or Perplexity and note whether you appear at all.
The AI crawlers your robots.txt controls
Crawler access is the part of a GEO audit people fail most often, usually by accident. Privacy plugins, CDN bot-blocking, or a copy-pasted robots.txt can quietly disallow the bots that feed AI search. It is worth knowing what each one does, because blocking a training crawler is a different decision than blocking a search-and-cite crawler.
Roughly, the bots split into three jobs. Training crawlers (GPTBot, Google-Extended, ClaudeBot, CCBot) feed long-term model knowledge. Search/retrieval crawlers (OAI-SearchBot, PerplexityBot, Claude-SearchBot, Googlebot) build the indexes queried at answer time and are the ones that produce citations. User-triggered fetchers (ChatGPT-User, Perplexity-User) grab a page live when someone asks a question about it.
If your goal is to be cited, the search and user-triggered bots are non-negotiable — block those and you remove yourself from AI answers entirely. Here is a permissive robots.txt baseline that welcomes the main ones and points to your llms.txt:
# robots.txt — allow AI search/citation crawlers
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
# AI content map
# https://yourdomain.com/llms.txt
Add an llms.txt to map your best content
llms.txt is an emerging convention — a Markdown file at your site root that gives language models a curated, low-noise index of your most important pages. It is not an official standard the way robots.txt is, and no platform has publicly confirmed it as a ranking factor, so treat it as a low-cost bet rather than a guarantee.
The upside is real even with that caveat: it reduces the work a model does to understand your site, and it lets you choose which pages represent you instead of leaving that to a crawler guessing from your nav. Keep it short, plain-language, and link-forward.
A minimal llms.txt looks like this:
# Your Company
> One-sentence description of what you do and who you serve.
## Core pages
- [Product](https://yourdomain.com/product): What it does, in one line.
- [Pricing](https://yourdomain.com/pricing): Plans and what each includes.
- [About](https://yourdomain.com/about): Who's behind this and credentials.
## Key guides
- [GEO audit guide](https://yourdomain.com/geo-audit): How to check AI search readiness.
- [llms.txt explained](https://yourdomain.com/llms-txt): What the file is and how to write one.
How to read your readiness score
A GEO score (typically 0-100) is a triage tool, not a grade. Its only job is to tell you what to fix first. A scanner weights the hard gates — crawler access and extractability — more heavily than the soft signals, because a beautifully structured page that GPTBot can't reach scores zero in practice.
Be honest about what a score can and can't tell you. AI platforms are opaque: none publish their exact citation criteria, and rankings shift as models update. A score measures whether you've cleared the known technical and structural barriers — it cannot promise a citation. Anyone selling certainty about AI search is guessing.
Read your result roughly like this, then fix top-down: access issues first, structure and schema next, citation signals last.
- 80-100 — Crawlers are in, content is extractable, schema and llms.txt are present. You're eligible to be cited — now compete on content quality and authority.
- 50-79 — Reachable but leaking signals. Usually missing schema, no llms.txt, thin structure, or JavaScript-dependent content. Steady, fixable wins.
- Below 50 — A hard gate is failing — often a robots.txt rule blocking AI bots or content that only renders client-side. Fix the gate before touching anything else.
See your AI search readiness score
The free Am I Citable scanner runs this entire audit for you in seconds. Enter your URL and it checks AI-crawler access across GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended, inspects your content structure and extractability, validates your schema, and looks for an llms.txt and citation signals. You get a single 0-100 readiness score that tells you exactly which gate to fix first — and if you're missing an llms.txt, it generates one you can drop straight into your site root. It's the fastest way to turn the manual checklist above into a prioritized to-do list.
Run the Free ScanFAQ
An SEO audit optimizes for ranking in a list of links — keywords, backlinks, page speed, Core Web Vitals. A GEO audit optimizes for being retrieved and cited inside an AI-generated answer. It checks things SEO tools ignore: whether AI crawlers like GPTBot and PerplexityBot are allowed in robots.txt, whether your content is extractable as clean passages, and whether you have an llms.txt. There's overlap — good structure and schema help both — but a page can rank well on Google and still be invisible to ChatGPT and Perplexity.
No, and be skeptical of any tool that claims it will. An audit clears the known technical and structural barriers so you're eligible to be cited; it can't force a model to choose you. AI platforms don't publish their citation criteria, and behavior changes as models update. What you control is removing the blockers — crawler access, extractability, structure, schema, citation signals — which is exactly what an audit measures.
Re-run after any change that touches crawl access or rendering — a new CDN, a robots.txt edit, a CMS migration, or a switch to a JavaScript framework, since those silently break AI-crawler access. Beyond that, a quarterly check is reasonable: the crawler landscape and conventions like llms.txt are still evolving, and new AI user-agents appear regularly. A quick scan catches regressions before they cost you citations.