Measurement
Methodology

How we calculate results. What we measure, what we exclude, and why each decision was made.

Overview

What We Measure

Modern Discovery measures how AI systems represent brands, products, and companies in generated answers. The core question is simple: when a buyer or researcher asks an AI engine about your category, are you present, how prominently, and how does that compare to direct competitors?

We do not scrape search rankings. We do not estimate traffic. We query AI engines directly, the same way real users do, and measure the output. This is downstream measurement: we observe what AI systems say, not what we predict they will say.

Coverage spans ChatGPT (GPT-4), Gemini, Claude, Perplexity, and Google AI Mode. Each model is queried independently. Results are not pooled or averaged at the response level. Cross-model consistency is itself a signal we report.

Process

How a Run Works

01
Query Construction

A structured query set is constructed for each entity. Brand, product, or company. Queries are classified across five intent categories: awareness, comparison, recommendation, authority, and use-case. Each category maps to a distinct phase of the buyer decision journey. Query sets are versioned and immutable: once a run begins, the query set does not change.

02
Multi-Model Execution

Each query is submitted to ChatGPT, Gemini, Claude, Perplexity, and Google AI Mode. Runs are executed independently per model. No response is shared or interpolated across models. This isolates each model's behavior so cross-model consistency can be measured directly, not inferred.

03
Signal Extraction

For each response, we extract: mention presence (was the entity named), mention context (positive, neutral, qualified, negative), citation presence (was a URL cited), citation domain (owned, competitor, neutral), and competitive context (which other entities appeared). Sentiment scoring is deliberately excluded from headline metrics. See Design Choices below.

04
Gap Identification

Citation voids are queries where the entity is mentioned but no supporting source is cited. Consensus gaps are queries where the entity is present in some models but absent in others. Competitive deltas measure the difference in mention rate between the entity and named competitors across the same query set. These three outputs drive the diagnostic layer.

05
Metric Calculation

Shortlist Share is calculated as the fraction of recommendation and comparison queries where the entity appears in a top-position mention. AVI (AI Visibility Index) is the aggregate of mention rate, citation rate, competitive delta, and cross-model consistency for the entity across all query categories. Both metrics are calculated per model and as a cross-model composite.

06
Prescription Generation

Gaps map directly to recommended actions. Citation voids suggest structured data improvements and source authority gaps. Consensus gaps suggest query-category-specific content investment. Competitive deltas identify the specific queries where a competitor owns the AI answer. Each recommendation is tied to a measured gap, not generated from heuristics.

Metric Definitions

The Numbers

Shortlist Share

Proprietary Headline Metric

The fraction of recommendation and comparison queries where the entity appears in a top-position mention. Calculated separately per model and as a cross-model composite.

Shortlist Share isolates the queries with the highest commercial intent. The moment a buyer asks "which options should I consider" or "compare these vendors." A brand with strong awareness but low Shortlist Share is being discovered but not recommended.

AVIAI Visibility Index

Composite of mention rate, citation rate, competitive delta, and cross-model consistency across all five query categories. Calculated per model and as a cross-model aggregate.

AVI provides the full-spectrum view. High mention rate with low citation rate indicates a content authority gap. High single-model AVI with low cross-model AVI indicates a consensus gap. Each component is auditable independently.

Cross-Model Consistency

Measures the degree to which AI models agree on which brands appear for a given query. A brand appearing consistently across ChatGPT, Gemini, and Perplexity on the same query has high cross-model consistency. A brand present on one model only has a measurable consensus gap. Different from simply having low AVI.

Citation Rate

The share of mentions accompanied by a cited source URL. High mention rate with low citation rate is a citation void: the AI engine references the brand but draws from no auditable source. Citation voids indicate either weak domain authority in AI-visible content or a structured data gap. Both are actionable.

Design Choices

Why We Built It This Way

Methodology decisions involve tradeoffs. This section explains the choices we made and what we traded away.

Why We Exclude Sentiment from Headline Metrics

Sentiment scoring introduces noise into AI visibility measurement. A brand mentioned in a qualified or cautious context is still present in the AI answer, and that presence is what matters for discovery. We capture mention context in the raw signal layer, but do not incorporate it into AVI or Shortlist Share calculations. Enterprise buyers do not need a score that conflates presence with perception. Those are different problems.

Why Query Sets Are Versioned and Immutable

Changing the query set mid-run invalidates the comparison. If we change which queries we ask, we cannot tell whether a metric shift reflects real change in AI behavior or a change in what we measured. Every query set receives a version identifier. When we update coverage by adding a new query category or expanding competitor scope, a new version is issued and a new run begins. Historical runs are never retroactively modified.

Why We Run Each Model Independently

Different AI models weight different sources, apply different retrieval logic, and produce meaningfully different outputs for identical queries. Cross-model consistency, the degree to which models agree on which brands appear for a given query, is itself a signal. A brand that appears on ChatGPT but not Gemini has a measurable consensus gap. Pooling model runs would destroy that signal.

Why Shortlist Share Is the Headline Metric

AVI measures presence across all query types. Shortlist Share isolates the queries that matter most commercially: the moment a buyer asks 'what are the top options?' or 'compare X vs Y.' These are the queries where AI-generated answers have the highest direct influence on vendor selection. A brand that is present in awareness queries but absent from recommendation queries has a measurable shortlist problem.

Versioning

Methodology Versions

v1.1, March 2026
  • Shortlist Share added as headline metric alongside AVI. Calculated from existing recommendation + comparison query categories.
  • Five query categories fully deployed: awareness, comparison, recommendation, authority, use-case.
  • Cross-model consistency reporting added as a standalone diagnostic output.
  • Source attribution layer in active development (owned / competitor / neutral classification).
v1.0, February 2026
  • Initial production methodology. AVI and ASOV as primary metrics.
  • Three initial query categories: awareness, comparison, recommendation.
  • Validated against three production entities: Carter's, Edible Arrangements, and a B2C platform.

Questions

Talk to the Team

If you have questions about how a specific metric is calculated or want to understand what a run covers for your category, reach out directly.

We'll respond within one business day.