Back to Signals

Methodology

How we measure AI attention on PeerPush, what we hide, and when we decide there is enough data to publish a number.

What counts as “AI traffic”

We classify every request hitting PeerPush by user-agent and entry point. “AI traffic” combines three categories:

  • AI crawlers: training crawlers operated by model providers (e.g., GPTBot, ClaudeBot, PerplexityBot, Google-Extended).
  • AI agents: browsing on behalf of a user via a chat assistant (e.g., ChatGPT-User, Claude-User).
  • MCP clients: explicit tool calls into PeerPush's Model Context Protocol endpoint.

Search-engine crawlers (Googlebot, Bingbot) and analytics bots (Ahrefs, Semrush) are excluded; they are categorized separately and not part of the “AI” share.

Platform totals vs per-segment shape

Platform-wide totals are shown as real numbers: how many times AI systems read product data, how many upvotes builders have cast, how many countries see genuine human discovery. For per-category and per-product signals we publish shape instead - share %, ordinal rank, relative bars, and percentage change vs the prior window. That keeps thin per-subject counts and the catalog's exact size off the public surface while still showing where attention is moving. Headline numbers appear only after a window has accumulated at least 50 distinct visitors; below that floor the page shows a “Calibrating” placeholder instead.

K-anonymity floor

Public Signals surfaces never expose a named subject (a specific product, agent, category, or country) backed by fewer than 5 distinct visitors. Below the floor, the row is suppressed.

Privacy and redaction

  • IP addresses are never stored. Only a salted hash is retained.
  • Query strings, search terms, and prompt text pass through a PII redactor before publication. The redactor strips email addresses, phone numbers, OAuth-token shapes, and credit-card-shaped digit sequences.
  • Anonymous-visitor identifiers rotate daily, so the same anonymous browser is intentionally unlinkable across UTC days.

How fresh is the data

Aggregations refresh on rolling schedules: short windows update every few minutes, longer windows on slower schedules. The “Data fresh as of…” timestamp on each Signals page reflects the slowest aggregation still feeding that page.

How each panel is computed

Most-watched categories

Each category's engagement combines product-page views, external-website clicks, alternative-page outbound clicks, and MCP citation impressions across its products over the last 7 days. Categories are ranked by total attention, biggest first, with a minimum-volume floor; the growth chip marks genuine week-over-week increases where a comparable prior window exists.

AI Surface Area (owner dashboard)

On private owner dashboards, AI Surface Area decomposes into four streams: AI crawlers indexing the product, traffic landing from AI chat assistants, programmatic API consumers, and MCP tool calls. Click-through events to external sites are deliberately excluded so the metric reflects discovery exposure, not downstream funnel behavior. Tiles are unweighted: the total is the simple sum of the four streams.

Hidden gems

Products with strong AI signal but low human visibility yet, in the last 30 days. Human visibility is a composite of upvotes, follows, and average rating. The thresholds re-calibrate as the catalog grows; the panel stays empty until the candidate pool is large enough to make percentile cuts meaningful.

Alternative lens (Switcher pressure)

On the per-alternative lens page, switcher pressure is the share of alt-page visitors who click through to a PeerPush challenger product. The page shows the top 3 challengers only; the full graph and verbatim queries are visible only on the verified owner's private dashboard.

Unmet demand

Three complementary signals over the last 30 days. PeerPush search is hybrid (pgvector cosine similarity combined with full-text search via reciprocal rank fusion). Even when the catalog has nothing genuinely close, hybrid search returns weakly-related neighbors - so a zero-result count is rare and not a useful signal. We capture three different angles instead.

  • Weak match: queries where the top result's semantic cosine similarity is below 0.55 (configurable). Suggests the catalog has nothing genuinely close. Captured at insert time from the hybrid search path; the simpler full-text fallback path is excluded.
  • Low CTR: queries with at least 50 impressions where the click-through rate is below 5%. Either the ranking is wrong, the taglines mislead, or the right product doesn't exist.
  • Re-search: queries followed by a different query from the same anonymous visitor within 60 seconds. Strong behavioural “I didn't find it” signal. Within-day actor stability only - cross-day re-search is intentionally invisible.

Every tab applies the k-anonymity floor of 5 distinct visitors before a query surfaces, and every query string passes through the PII redactor.

AI citation freshness (owner dashboard)

On private owner dashboards, we track when each of the tracked AI agent buckets (Claude, ChatGPT, Perplexity, Gemini, Copilot) last cited the product. Severity bands: fresh under 14 days, aging 14-30 days, stale beyond 30 days, never cited otherwise. Buckets the product has not been cited by are surfaced explicitly so the gap is visible. “Cited” means at least one MCP result impression for the product attributed to an agent in the bucket.

AI coverage gap

For each of the top 30 categories (by total events in the last 30 days) and each tracked AI agent bucket (Claude, ChatGPT, Perplexity, Gemini, Copilot), we compute the share of approved + published Products in the category that have been cited at least once by that agent in the last 30 days. Categories with fewer than 10 approved products are excluded so single-product percentages don't skew the matrix. The panel is suppressed until total MCP citation volume crosses the privacy floor.

Retention

Raw event rows are kept for 540 days, then dropped by an automated retention policy. Aggregated daily, weekly, and monthly rollups (which contain no per-visitor data) are kept indefinitely.

Methodology v2, last updated June 2026.