How to run your own AI search visibility audit (step-by-step, free template)

The exact audit I run for paying clients, written so you can run it yourself in an afternoon. Twenty buyer prompts, four AI engines, a 5-tier citation rubric, and a free Google Sheet template. No tools required beyond a browser.

/ 19 min read / By Faz
AUDIT VIEW  /  one prompt, four engines

“What are the best AI search visibility tools for B2B SaaS?”

ChatGPT

Tier 3Recommended with reasoning
Claude

Tier 2Passing mention in a list
Perplexity

Tier 1Source-only, not in answer
Google AIO

Tier 0Invisible
Pattern: single-engine deficit. Strong on ChatGPT, weak on Google. Fix the source set Google’s overview pulls from.
Sources tested  ·  ChatGPT  ·  Claude  ·  Perplexity  ·  Google AI Overview

What this audit produces. One prompt, four engines, tier-scored. Twenty prompts gives you the full diagnostic.

The short version. An AI search visibility audit is 20 buyer prompts, run three times each across ChatGPT, Claude, Perplexity, and Google AI Overview, scored on a 5-tier citation rubric. Three to four hours of work. The output is a list of three to five fixes, in priority order. The free Google Sheet template at the bottom of this post will do the math for you.

The exact audit I run for paying clients, written so you can run it yourself in an afternoon. Free Google Sheet template included. No tools required beyond a browser.

Most “how to audit AI visibility” guides on Google right now are written by tool vendors selling a $99/month dashboard. That dashboard does what a free spreadsheet, a list of 20 prompts, and four browser tabs do. Worse than that, it does it in a way that hides the actual reasoning from you.

I run Zilwaris, a small AI search content agency for B2B SaaS. We sell a productized audit for $500. This is the same audit, written so you can do it yourself. Most readers will not. The ones who do, and find gaps that justify the $500, will hire us. The ones who do it well will save themselves the cost. Both outcomes are fine.

Here is what we will cover.

  1. What an AI search visibility audit actually measures (and what it cannot)
  2. The seven steps, in order
  3. Common mistakes that invalidate the result
  4. When the DIY version breaks down
  5. The free Google Sheet template

If you only have 30 minutes, jump to step 4 and run a stripped-down version on one ChatGPT prompt. If you want the full diagnostic, set aside three to four hours.

What an AI search visibility audit actually measures

The audit answers one question. When a buyer in your category asks ChatGPT, Claude, Perplexity, or Google’s AI Overview a real purchase-intent question, do you show up, and how?

That is it. Anything more elaborate is sold as the same thing. It is not the same thing.

The audit does not measure your website traffic. It does not predict revenue. It does not tell you whether ChatGPT is your “biggest channel.” Those are different questions and different work. For what happens after a successful audit and a content engagement, see the case study on an AI image generation tool: ChatGPT moved from no measurable referrals to the third-largest traffic source overall in 30 days. What it does is establish a baseline of citations across the four engines that matter for your buyers, identify the pattern of where you are missing, and turn that into a short, prioritized list of fixes.

Five things to keep in mind before you start.

One. AI engine responses are non-deterministic. Run the same prompt twice and you will sometimes get two different answers. This is not a bug, it is how large language models work. Any audit that runs each prompt once and reports a single result is reporting noise. We run three.

Two. Logged-in accounts give contaminated answers. ChatGPT, Claude, and Gemini personalize responses based on history. If you ask “what’s the best AI image tool” while logged into the account where you tested 40 AI image tools last month, the answer reflects your history, not a new buyer’s experience. Use logged-out browsers or incognito sessions.

Three. Citation is not the same as recommendation. There is a difference between being listed in a source panel, being mentioned in passing, and being recommended as the answer. The audit scores all three separately because they require different fixes.

Four. Your competitors are not always who you think they are. The right comparison set is the three names AI engines actually surface for your buyer’s queries, not the three names on your internal pitch deck.

Five. Twenty prompts is enough. More than 20 produces marginal extra signal at significant extra cost in time. Fewer than 15 misses too much. Twenty buyer questions, run three times across four engines, equals 240 data points. That is a defensible sample.

The seven steps

Step 1. Pick the three right competitors

Most audits skip this step or do it badly. They list whoever the founder named in the last sales call. That set is biased toward the obvious players. The point of an AI search audit is to see who AI is actually recommending, which often is not who you think.

The right way to pick competitors:

  1. Open ChatGPT in an incognito window. Ask: “What are the top tools for [your category]?” Capture the names mentioned.
  2. Repeat the same prompt in Perplexity, Claude, and Google AI Overview.
  3. Across the four engines, you will see a pattern. Three or four names will dominate. Those are your real comparison set.
  4. If your name appears, congratulations. If it does not, that is the audit’s first finding.

Pick the three that show up most consistently. Add a fourth name only if there is a clear “challenger” you want to track. More than four competitors makes the spreadsheet unreadable.

One nuance for B2B SaaS. Sometimes the AI surfaces an enterprise leader (HubSpot, Salesforce) when your real competitive set is smaller. Use your judgment. If you sell a $99/month niche tool and AI is recommending a $50K/year platform, you are not actually competing with that platform for the same buyer. Pick competitors at your tier.

Step 2. Build your buyer query map (20 prompts)

The single biggest mistake people make in AI audits is testing prompts they wrote, not prompts buyers actually ask. “What is [my brand]?” tells you nothing. A buyer in research mode does not ask that.

You need 20 prompts split across four categories. Five prompts each.

Category A. Category-level discovery (5 prompts).

This is how a buyer who does not yet know your brand searches for the category. The most important category. If you do not show up here, you are invisible to new buyers.

  • “What are the best [category] tools for [ICP]?”
  • “Top [category] for [use case] in 2026”
  • “What [category] do [ICP type] companies use?”
  • “Which [category] is best for [specific scenario]?”
  • “[Category] recommendations for [budget tier]”

Category B. Problem-solution (5 prompts).

The buyer has a problem and is asking how to solve it, not which tool to buy. Your category may or may not surface here.

  • “How do I [solve the core problem your product solves]?”
  • “[Problem] best practices in 2026”
  • “What’s the best way to [achieve outcome]?”
  • “How do small teams handle [problem]?”
  • “[Problem]: software vs DIY”

Category C. Head-to-head comparison (5 prompts).

The buyer is in late-stage evaluation. These prompts are gold for diagnosing your competitive position.

  • “[Your brand] vs [Competitor 1]: which is better?”
  • “Alternative to [Competitor 1] for [ICP]”
  • “[Competitor 1] vs [Competitor 2]: pros and cons”
  • “Why choose [your brand] over [Competitor 1]?”
  • “Cheapest [category] comparable to [Competitor 1]”

Category D. Branded recall (5 prompts).

The buyer already knows your name and is researching you. Last category because it is the easiest to score and the least diagnostic.

  • “What is [your brand]?”
  • “Who founded [your brand]?”
  • “[Your brand] reviews”
  • “[Your brand] pricing”
  • “Is [your brand] worth it?”

If a buyer has used you for a year and you cannot pass the branded recall set, you have an entity-recognition problem, not a content problem. That is a useful diagnostic on its own.

Write the 20 prompts down before you run anything. Once you start running them, do not adjust the prompt mid-audit. Tweaking a prompt because the answer was disappointing destroys the audit’s integrity.

Step 3. Set up the spreadsheet

The free template is linked at the bottom of this post. If you want to build your own, the column structure is:

Column What it captures
Prompt # 1 through 20
Category A, B, C, or D
Prompt text Exact text you ran
Engine ChatGPT, Claude, Perplexity, Google AI Overview
Run 1, 2, or 3
Citation tier 0 to 4 (see step 5)
Competitor 1 tier 0 to 4
Competitor 2 tier 0 to 4
Competitor 3 tier 0 to 4
Sources cited URLs in the source panel (Perplexity and ChatGPT show this)
Notes Anything unusual (hallucinated description, sentiment issue, etc.)

That is 11 columns and 240 rows (20 prompts × 4 engines × 3 runs). The spreadsheet does the math.

Step 4. Run the prompts across four engines

This is the slowest part. Set aside two hours and do it in one sitting if possible. Switching contexts mid-audit makes you sloppy.

For each prompt, in this order:

  1. Open a fresh incognito window. Log out of all engines if you cannot use incognito.
  2. Run the prompt in ChatGPT. Capture the full response. Note any sources. Score yourself and each competitor 0 to 4 (rubric below). Move to row 2 of the spreadsheet.
  3. Run the same prompt in Claude. Same capture and scoring.
  4. Run the same prompt in Perplexity. Same capture and scoring.
  5. Run the same prompt in Google. Capture the AI Overview if one appears (not all queries trigger one). Same scoring.
  6. Close the window. Open a new incognito window. Repeat the entire prompt across all four engines.
  7. Repeat once more, for run 3.

Why fresh windows for each run? Because even within an incognito session, some engines (especially ChatGPT) will weight context from earlier in the same session. Closing and reopening is the only way to get three independent samples. The non-determinism of large language model responses is documented in the original generative engine optimization paper by Aggarwal et al. (2023), which is also where the GEO term originates.

You can speed this up by batching: do all 20 prompts in ChatGPT first, then all 20 in Claude, etc. The trade-off is that you are then running three sequential runs in the same engine before switching, which works but introduces a small bias from session warming. For most B2B SaaS use cases the bias is small enough to ignore.

Two hours later, twenty prompt blocks are full. Each cell is one observation. The medians self-populate. Time for the rubric.

Step 5. Score using the 5-tier citation typology

This is where most DIY audits fall apart. They use a binary “mentioned or not mentioned” score, which loses the most useful information. Citation type matters more than citation count.

The five tiers, from worst to best:

Tier What it means Example
0. Invisible Not mentioned anywhere in the response or source panel. You are not in the answer at all.
1. Source-only Your URL appears in the source panel but the engine did not use it in the answer. Perplexity lists you in sources but the bullet points cite competitors.
2. Passing mention Named in the answer but not as a recommendation. Often in a list of “options.” “Tools include Asana, Trello, ClickUp, and Linear.”
3. Recommended Recommended as a fit for the buyer’s situation, with a reason. “For small teams that prefer keyboard shortcuts, Linear is the better choice because…”
4. Top recommendation The first or only recommendation. Treated as the answer, not an option. “The best AI image tool for solo creators is Midjourney.”

Score yourself on this rubric for every prompt-engine-run combination. Score each competitor on the same rubric. The scoring is subjective. That is fine. Reasonable people will score within a half-tier of each other. Use your gut and stay consistent. If you want to automate parts of this once you have done it manually, the tools I actually pay for are documented in the GEO tools I actually use for client work.

For each prompt, take the median tier across the three runs. Median, not mean. This protects against one anomalous run pulling the score in either direction.

Step 6. Find the pattern

Now you have 80 median scores: 20 prompts × 4 engines, one number each. Compare them against your three competitors.

The patterns to look for, in priority order:

Pattern 1. The category invisibility gap. If your median score is 0 or 1 across most Category A prompts (category-level discovery) while at least one competitor scores 3 or 4, you have an entity recognition problem. AI engines do not yet associate your brand with the category. This is the most common pattern for B2B SaaS under $5M ARR. Fix: third-party authority content, listicle inclusions, Wikipedia entity creation if you qualify, structured data.

Pattern 2. The single-engine deficit. You score 3 or 4 on three engines and 0 or 1 on the fourth. This usually means you are missing from a specific source the underperforming engine relies on. ChatGPT leans on Reddit, news, and Wikipedia. Perplexity leans on real-time web search. Claude leans on official documentation. Diagnose which source set you are missing and fix that source set, not your whole content stack.

Pattern 3. The comparison gap. You score well on Category D (branded recall) and weakly on Category C (head-to-head). Buyers in late-stage research are not finding you in the comparison. Fix: write or earn comparison content. “X vs Y” pages, third-party reviews, comparison tables on review sites.

Pattern 4. The misrepresentation pattern. You appear, but the answer describes you incorrectly. Wrong pricing, wrong feature set, wrong founder, outdated positioning. Fix: update structured data, About page, pricing page, and earn fresh third-party mentions that correct the record.

Pattern 5. The dominance pattern. One competitor is at tier 3 or 4 across most prompts in most engines. They have built durable AI search authority. Fix: this is a slow ground game, not a quick fix. Plan a 6 to 12 month content investment.

Most audits surface a combination of two or three patterns at once. That is normal. Pick the one that explains the most data points and start there.

Step 7. Build the prioritized fix list

The deliverable of an audit is not a 40-page report. It is a list of three to five things to do next, in priority order, with rough effort estimates.

Use a simple impact-vs-effort frame:

  • High impact, low effort. Do this week. Examples: fixing structured data, updating an About page that AI is summarizing wrong, adding a feature comparison table to your homepage.
  • High impact, high effort. Plan over the next quarter. Examples: building a 2,500-word comparison piece, earning a Wikipedia entry, getting onto an authority listicle.
  • Low impact, low effort. Do if you have spare time, otherwise ignore.
  • Low impact, high effort. Do not do.

Three to five items in the top two quadrants. That is the audit’s output.

Sequence matters. Almost always: fix the entity recognition problem first (people need to know who you are before they can recommend you), then fix the category content problem, then the comparison content problem, then the long-tail.

Common mistakes that invalidate the audit

Six failures I have watched founders make. Each one breaks the audit’s integrity badly enough that the result is worse than not doing the audit at all.

Running each prompt once. One run is noise. The number you see is in a range of plus or minus one tier. If you act on a single-run result you will fix the wrong thing roughly a third of the time. Three runs minimum.

Testing while logged in. You see what you trained the engine to show you. The audit is meant to show what a new buyer sees.

Cherry-picking prompts. If a prompt gives you a bad result and you tweak it until you get a better result, you are no longer auditing. You are reassuring yourself.

Wrong competitors. If you compare yourself against a tier-1 enterprise leader and you sell to startups, you will conclude you are losing badly when in fact you are winning your actual segment. Use the competitor selection method in step 1.

Confusing source-only citations with recommendations. Being listed in Perplexity’s source panel is not the same as being recommended. Use the 5-tier rubric.

Acting on the audit without re-running it. AI engines update. The audit is a baseline. Re-run it monthly while you are actively making changes, then quarterly once you stabilize. Without re-runs you cannot tell whether your changes worked or whether the engines just shifted underneath you.

When the DIY version breaks down

I should be honest about what this audit cannot do for you.

It will not help you write the fix. The fix list at the end of step 7 says “build a comparison piece” or “fix the entity recognition gap.” Those are projects, not tasks. Each one is a week or two of senior content work, and the difference between a piece that gets cited by AI engines and a piece that does not is not a five-bullet checklist.

It will not catch slow drift. Doing this audit once is a snapshot. Doing it monthly is a habit. Most founders do it once, get useful diagnostics, and then never run it again. Six months later they have no idea whether the work they did moved the needle.

It assumes you have time. Three to four hours of focused work, a calm room, no Slack. Many founders do not have this kind of time on demand.

It cannot tell you whether the gaps you find are worth fixing. That is a strategy question, not an audit question. Sometimes the right answer is “we do not need to win this category in AI search yet, our growth is coming from outbound.” The audit will tell you the truth. It will not tell you what to do with the truth.

If any of those four limits matter for your situation, hire someone, not necessarily us. We sell a productized version of this exact audit at zilwaris.com/audit for $500. It includes a 30-minute fit call before purchase, the same 20-prompt buyer query map, the same 4-engine 3-run methodology, a competitor gap report, and a 20-minute Loom walkthrough of the findings. Seven business days from kickoff to delivery. The fit call is free. If you are not a fit, we say no before you pay.

A sample completed audit (mini)

To make the rubric concrete, here is what the output looks like for a hypothetical B2B SaaS at the end of step 6. Numbers are illustrative.

AUDIT DIAGNOSTIC  /  what a completed audit looks like
Category
You
Best comp
Pattern
Category discovery5 prompts · A1-A5
0.5
3.5
Category invisibility
Problem-solution5 prompts · B1-B5
1.0
2.5
Mild gap
Head-to-head5 prompts · C1-C5
2.0
3.0
Comparison gap
Branded recall5 prompts · D1-D5
3.5
4.0
Acceptable
Priority fix list (output of step 7)

1. Fix entity recognition. Get listed on three category authority sites in the next 30 days. Update structured data on the homepage.
2. Build one flagship category piece. Aim to be cited as the answer, not as an option, for at least one Category A prompt within 90 days.
3. Add three comparison pages over the next 90 days, one for each direct competitor.

The diagnosis writes itself. The brand is recognized when buyers know to ask for it (Category D). It is invisible at the category-discovery stage (Category A). The fix list, in order:

  1. Fix entity recognition. Get listed on three category authority sites in the next 30 days. Update structured data on the homepage.
  2. Build one flagship category piece. Aim to be cited as the answer, not as an option, for at least one Category A prompt within 90 days.
  3. Add three comparison pages over the next 90 days, one for each direct competitor.

Three to five things, in order, with rough timelines. That is what the audit produces.

The free Google Sheet template

Here is a copy of the spreadsheet I use for client audits, stripped down for self-service use.

Download the AI Search Visibility Audit template (Excel, 19 KB)

The workbook has four tabs. Setup (your brand, three competitors, and twenty prompts in one place). Capture, redesigned as twenty self-contained prompt blocks. Each block shows the prompt at the top, then a small grid: four engines down the rows (ChatGPT, Claude, Perplexity, Google AIO), three runs across, median auto-calculated, three competitor scores beside, and a notes column. Color-coded by tier. Diagnostic, auto-calculated medians per category with the pattern flagged and a priority fix recommendation. Engine notes, a quick reference for what to capture from each engine.

No email gate. No popup. Open it directly in Excel, Google Sheets (File → Import), or Numbers. The formulas survive the import.

If you find the template useful, the most useful thing you can do is publish your own audit. The methodology becomes more credible the more it gets used. The Zilwaris site has a public case study at zilwaris.com/case-study-ai-image-tool that documents what running this audit and the recommendations look like end-to-end on a real engagement.

FAQ

How long does this take?

Three to four hours for the full audit. 30 minutes for a stripped-down version with one prompt category. The reason it takes time is the three-run requirement, which is not optional. Anything faster is not an audit, it is a hot take.

Which engines should I test?

ChatGPT, Claude, Perplexity, and Google AI Overview. Those four cover roughly 95% of consumer-grade AI search use in the United States as of mid-2026. Add Copilot if you sell into Microsoft-heavy enterprise. Skip Gemini consumer if your buyers do not use Android, the signal mostly overlaps with Google AI Overview.

What if my category is too niche for AI engines to know it?

That is a finding. If a buyer in your category cannot ask AI a question and get any useful answer, your audit will show across-the-board zeros and the diagnosis is “the category is pre-AI.” The fix is different from the patterns above. You are creating the category in AI engines’ minds, which is content investment without competitive comparison. Reach out, this case is rare enough that I will help diagnose for free.

Should I rerun the audit monthly or quarterly?

Monthly while you are actively making changes. Quarterly once you have stabilized. Annual is too slow. Weekly is too noisy.

What about Reddit and YouTube?

Both matter as input sources to AI engines, especially ChatGPT. The audit measures the output, not the input. If your audit shows a ChatGPT-specific deficit, the most likely cause is missing Reddit presence in your category subreddits. That is a fix, not a separate audit.

How do I score “appeared in source panel but not in answer”?

Tier 1 (source-only). The 5-tier rubric in step 5 covers this explicitly.

What if my brand has a generic name and gets confused with other things?

Add a context tag to your branded recall prompts. “[Brand] [category]” instead of just “[Brand].” That isolates the entity. If you still see confusion, you have a brand-disambiguation problem and need structured data plus authority signals to clear it up.

Can I just hire you to do this?

Yes, $500 productized at zilwaris.com/audit. Same methodology, 7 business days, 30-minute fit call before purchase. The fit call is free and is where we say no to engagements where the audit will not help. We also turn down clients we do not think we can help, so the call is two-way.

What to do next

Block three hours on your calendar this week. Do step 1 (pick competitors) and step 2 (build the prompt set) on day one. Run the audit on day two. Score and diagnose on day three. Send the fix list to whoever owns content at your company.

Most companies that run this audit find a fix that more than pays for the time spent. Many find a fix worth tens of thousands of dollars in pipeline they were leaving on the table.

If you want a second pair of eyes on your fix list, or you want to skip the three hours, the productized audit is at zilwaris.com/audit. The fit call is the place to start.

And if you run the audit yourself and find it useful, tell me what you found. I read every email at faz@zilwaris.com.


Want this run for your B2B SaaS?

Founding pricing for the first 5 clients. Methodology fully public. Month-to-month, cancel anytime.

Apply to work together

Leave a comment

Your email address will not be published. Required fields are marked *