Audit GEO

GEO Audit Toolkit

A Claude Code skill that produces evidence-based Generative Engine Optimization audits for any website. Audit-only — no CRM, no proposals, no pricing tiers. Free public APIs and respectful scraping; no paid LLM or SERP keys required.

What it does

One command (/geo audit <url>), one Markdown report. The pipeline:

Detects locale of the audited site (HTML lang → Content-Language → TLD → fallback en)
Samples 8 pages with strategic diversity (homepage + service + article + about + case-study)
Audits robots.txt with a 3-category bot taxonomy (training / RAG-search / user-request)
Per-page schema + content health (JSON-LD validity, Enhanced Entity Pages visible-facts coverage, BLUF compliance, Mount AI anti-patterns)
Measures RRF visibility on Google + Brave when a positioning is supplied (10 fan-out queries × 2 engines)
Renders a unified report with per-LLM presence assessment (ChatGPT, Gemini, Perplexity, Claude, Le Chat) + an action plan tagged with per-LLM impact

Output lands at site-client/<slug>/audits/<YYYY-MM-DD>-audit.md. The audit itself is in the audited site's language by default.

Per-LLM retrieval map (rev. 2026-04-26)

LLM (in scope)	Backend	Engine the toolkit measures
ChatGPT / SearchGPT	Google (OpenAI–Google deal, 2025)	`google` (via Startpage proxy)
Gemini / Google AI Overviews	Google	`google`
Perplexity	Multi (Google + Brave)	`google` + `brave`
Claude	Brave (Anthropic–Brave, 2025)	`brave`
Le Chat (Mistral)	Brave (Brave–Mistral, 2025)	`brave`

Bing Copilot, Meta AI and Grok are out of scope (deliberately). Mapping validated empirically: 80–90% Google SERP overlap with ChatGPT citations vs ~30% Bing overlap. See docs/LLM_RETRIEVAL_ARCHITECTURE.md.

Quick start

Install (macOS / Linux)

git clone https://github.com/Ghanyte/geo-seo-audit.git
cd geo-seo-audit
./install.sh

Install (Windows / PowerShell)

git clone https://github.com/Ghanyte/geo-seo-audit.git
cd geo-seo-audit
.\install-win.ps1

PowerShell needs Administrator privileges or Developer Mode for symlinks.

Requirements

Python 3.9+
Claude Code CLI
Git
3 Python deps (requests, beautifulsoup4, lxml) auto-installed by install.sh

No paid API key. No browser automation.

Run an audit

/geo audit https://example.com

With positioning + competitors (recommended for any commercial site):

/geo audit https://example.com --positioning "your target query" --vs comp1.com,comp2.com

Optional: Brave Search API key (recommended for repeat use)

The toolkit measures Brave (the SERP behind Claude and Le Chat citations) via two interchangeable paths:

HTML scraper (default) — zero setup, but slow (8–15 s/query) and exposed to layout changes / soft-throttle.
Official Brave Search API — ~10× faster (1.1 s/query), stable, free tier of 2 000 requests/month at brave.com/search/api/ (no credit card).

To switch on the API path, set the env var once and the toolkit auto-detects it on every run:

export BRAVE_SEARCH_API_KEY="BSA…your-key…"

Or pass it inline for a one-off run:

python3 -m scripts.run_audit https://example.com \
  --positioning "your target query" \
  --brave-api-key "BSA…your-key…"

The toolkit prints which path is active at the start of phase 5 (⚡ Brave: using official API vs 🐢 Brave: using HTML scraper). At the default fan-out size (~10 queries × Brave), the free tier covers ~50 audits/month.

Commands

Command	What it does
`/geo audit <url>`	Full multi-page audit + per-LLM action plan (default skill)

That's the only user-facing command. The sub-skills are wired up internally:

geo-audit — orchestrator
geo-schema — JSON-LD audit (Schema.org + Enhanced Entity Pages)
geo-content-health — BLUF compliance + Mount AI anti-patterns
geo-crawlers — robots.txt + 3-category bot taxonomy
geo-rrf — Reciprocal Rank Fusion measurement on Google + Brave
geo-client — per-site folder manager (site-client/<slug>/)

You can also invoke each module directly via Python:

python3 -m scripts.checks.schema https://example.com
python3 -m scripts.checks.content_health https://example.com
python3 -m scripts.checks.crawlers https://example.com
python3 -m scripts.rrf example.com --queries "query 1, query 2" --vs comp.com

Output structure

site-client/<slug>/audits/<YYYY-MM-DD>-audit.md

  1. État des lieux par LLM        ← verdict table + 5 LLM detail blocks
  2. Plan d'action                  ← unified, sorted by per-LLM impact
                                      + experimental sub-bucket
  3. Ce que l'audit ne mesure pas

  Annexes techniques:
    A. RRF mesuré (détail)
    B. Crawlers IA (robots.txt)
    C. Schema multi-page
    D. Content health multi-page
    E. Findings — vue agrégée

  Glossaire technique

The 5 verdicts in section 1 are: 🟢 Présence forte · 🟢 Présent · 🟡 Présence partielle · 🔴 Absent · ⚪ Non mesuré.

When the target leads but the gap to the closest competitor is < 5%, the audit flags ⚠️ leadership fragile (écart X.X%).

Methodology

Every claim in an audit is anchored to either a verifiable site signal (HTML / JSON-LD / SERP rank) or a public dataset query. The toolkit deliberately avoids:

Composite "AI readiness" scores (hide which signal is failing → reported per signal instead)
Unsourced GEO statistics (no "AI traffic converts 4.4× better" without primary source)
Single-study correlations promoted as critical (tagged medium or experimental, not high)
Sales framing (no pricing tiers, MRR projections, traffic-uplift estimates)

Reference docs:

docs/LLM_RETRIEVAL_ARCHITECTURE.md — per-LLM backend mapping
docs/CRAWLERS_TAXONOMY.md — 29 AI bots catalogued, 3 categories
docs/STRUCTURED_DATA_BEST_PRACTICES.md — Schema.org / EEP methodology
docs/CONTENT_HEALTH_PATTERNS.md — BLUF + Mount AI patterns
docs/BRAVE_GRAIL_SIGNALS.md — Brave-rooted LLM evidence
docs/sources/ — 8 archived methodology articles, each with critical-thinking triage

Repo layout

geo-seo-audit/
├── CLAUDE.md                    # operating rules (loaded each session)
├── README.md                    # this file
├── LICENSE                      # MIT
├── requirements.txt             # 3 runtime deps
├── install.sh / install-win.ps1 # symlink skills into ~/.claude/skills/
├── uninstall.sh
├── geo/SKILL.md                 # top-level Claude Code skill router
├── skills/                      # 6 sub-skills (audit-only)
│   ├── geo-audit/
│   ├── geo-schema/
│   ├── geo-content-health/
│   ├── geo-crawlers/
│   ├── geo-rrf/
│   └── geo-client/
├── scripts/                     # core pipeline
│   ├── run_audit.py             # orchestrator (entry point)
│   ├── checks/                  # schema, content_health, crawlers
│   ├── serp/                    # google (Startpage), brave, cache
│   ├── rrf.py                   # Reciprocal Rank Fusion
│   ├── fanout.py                # query expansion
│   ├── llm_impact.py            # finding → per-LLM impact map
│   ├── i18n.py                  # FR translation + glossary
│   ├── client_paths.py          # per-site folder resolver
│   ├── site_crawler.py          # multi-page sampler
│   └── site_locale.py           # locale detection
├── schema/                      # 7 JSON-LD starter templates
├── docs/                        # methodology refs + 8 archived sources
├── tasks/                       # lessons + durable plans
└── site-client/                 # audit output (gitignored, README only)

Uninstall

./uninstall.sh

Removes the symlinks from ~/.claude/skills/. Doesn't touch the repo, your audits in site-client/, or installed Python packages.

License

MIT — see LICENSE.

Contributing

Pull requests welcome. Before opening one:

Check tasks/lessons.md for the project's documented design choices
Run an audit on at least one real site to verify your change doesn't break the pipeline
Keep the audit-only philosophy intact (no commercial framing, no sales tooling)
Tag finding confidence honestly (high requires reproduced evidence; single studies are medium-high at most)