
The AI Citation Index: Where Do ChatGPT, Perplexity, and Gemini Get Their Data?
Perplexity averages 21.87 citations per answer vs ChatGPT’s 7.92. Reddit accounts for 46.7% of Perplexity’s top citations. Here’s the complete citation source breakdown.
Perplexity averages 21.87 citations per answer vs ChatGPT’s 7.92. Reddit accounts for 46.7% of Perplexity’s top citations. YouTube has overtaken Reddit as the #1 social citation source at 39.2%. Each platform has radically different citation patterns.
The AI Citation Landscape in 2026
Understanding where AI models get their information is the foundation of any GEO strategy. If you don’t know what sources an AI model draws on to generate its recommendation, you’re optimizing blind. Fortunately, we now have substantial research data that maps the citation landscape in detail.
Three major studies provide the backbone of our understanding. The XFunnel study analyzed over 40,000 AI-generated responses across ChatGPT, Perplexity, and Gemini, mapping citation patterns, source diversity, and platform-specific behaviors. The Profound study examined 30 million AI citations to quantify which domains get cited most frequently and how citation patterns differ by query type. And the Semrush Reddit study specifically analyzed Reddit’s role in AI citations, revealing that Reddit has become the single most important third-party platform for AI visibility.
According to Joel House, founder of MentionLayer and author of AI for Revenue, "Understanding the citation index isn’t academic — it’s the difference between investing blindly and investing strategically. When you know that Reddit accounts for 46.7% of Perplexity’s citations, you stop guessing where to focus and start building where it matters."
The headline findings reshape how marketers should think about their digital presence. Earned media accounts for 90% of LLM citations — meaning AI models overwhelmingly reference third-party sources rather than brand-owned content. Your website matters, but the internet’s conversation about your brand matters 9x more. This isn’t a theoretical finding. It’s backed by citation analysis across millions of AI-generated responses.
The citation landscape is also remarkably concentrated. Only 11% of domains appear across all four major AI platforms. Our AI Visibility Index study confirmed this at the *business* level as well: of businesses that got mentioned at all, only 11% appeared in 2 or more AI models. The model overlap matrix from that study showed Google AI Overview and Perplexity share the most similar worldview (84% overlap from AIO→Perplexity), likely because both rely on real-time web retrieval. ChatGPT and Gemini share 70% overlap — the parametric-model cluster. Claude is the most independent, with consistently the lowest overlap with every other model (32–63%). Source-level fragmentation and brand-level fragmentation both converge on the same strategic conclusion: you can’t optimize for "AI" as a monolith. Most sources are platform-specific — a domain that’s heavily cited by Perplexity might not appear in ChatGPT’s references at all. This means a cross-platform GEO strategy must account for each platform’s unique source preferences. A one-size-fits-all approach leaves gaps that competitors can exploit. Understanding how AI models choose which brands to recommend is essential context for interpreting this data.
What follows is a platform-by-platform breakdown of citation sources, backed by data from these studies. This is the intelligence layer that should inform every GEO decision you make — from which forums to prioritize for citation seeding, to which publications to target for press coverage, to where to focus review-building efforts.
Where ChatGPT Gets Its Data
ChatGPT is the most widely used AI assistant, making its citation behavior the most commercially important to understand. When browsing is enabled, ChatGPT averages 7.92 citations per response — significantly fewer than Perplexity but still enough to drive meaningful referral traffic. The lower citation count means each citation carries more weight and the competition for those limited slots is more intense.
Wikipedia dominates ChatGPT’s citation landscape, accounting for 47.9% of its top-10 most-cited domains. This isn’t surprising — ChatGPT was heavily trained on Wikipedia and continues to reference it for factual, entity-level, and definitional information. For GEO purposes, this means that having a Wikipedia presence (or at minimum, being mentioned in relevant Wikipedia articles) gives you a significant advantage with ChatGPT specifically. If your brand has a Wikipedia page, keep it accurate and current. If industry-relevant Wikipedia articles exist where your brand could be legitimately mentioned, that’s a high-value target.
Reddit represents approximately 2.2% of ChatGPT’s top citations overall, but this number is misleading. For buying-intent and recommendation queries specifically — the queries that matter most for GEO — Reddit’s share jumps to approximately 8-12%. When someone asks ChatGPT "What’s the best X?" it frequently references Reddit discussions alongside its training knowledge. This means Reddit citation seeding has real ChatGPT impact, especially for product recommendation queries.
ChatGPT uses Bing’s index for web browsing, which is a crucial detail. Pages that rank well on Bing (not just Google) have an advantage in ChatGPT’s real-time search results. While Google and Bing rankings overlap significantly, there are differences. If you’re specifically optimizing for ChatGPT visibility, check your Bing rankings for key queries and consider Bing-specific SEO if there are significant gaps.
Content freshness is perhaps the most important factor for ChatGPT citations. Research shows that 76.4% of pages cited by ChatGPT were updated within the last 30 days. Make sure your robots.txt isn’t blocking AI crawlers either — it’s a common mistake that can make your content invisible. This has profound implications: that comprehensive guide you published in 2023 isn’t getting cited. ChatGPT strongly favors recent content, which means a regular content refresh cadence is essential. Update your key pages monthly with new data, examples, or insights. Even minor updates signal freshness to ChatGPT’s browsing function.
ChatGPT’s citation behavior also varies significantly by query type. For factual queries ("Who founded Tesla?"), it cites Wikipedia and encyclopedic sources. For recommendation queries ("Best CRM for real estate"), it mixes Reddit, review sites, and recent blog posts. For comparison queries ("Salesforce vs HubSpot"), it favors review platforms and comparison articles. Aligning your content and citation seeding to the specific query types your customers use is key to maximizing ChatGPT visibility.
Where Perplexity Gets Its Data
Perplexity is the citation-heaviest AI platform, averaging 21.87 citations per response — nearly 3x ChatGPT’s average. For marketers, this is both an opportunity and a data goldmine. More citations mean more chances for your brand or content to appear. And because Perplexity shows its sources explicitly, you can reverse-engineer exactly what content drives its recommendations.
[Reddit dominates Perplexity’s citation landscape](/blog/reddit-most-important-platform), accounting for 46.7% of its top-10 most-cited domains. This is a staggering concentration. Nearly half of Perplexity’s most-referenced sources come from Reddit. If you want Perplexity visibility, Reddit is where you play. Period. Specifically, Perplexity favors Reddit threads that rank well on Google, have high engagement, and contain detailed, experience-based responses. The intersection of Google ranking + Reddit engagement is Perplexity’s sweet spot.
YouTube has emerged as the second most important social citation source for Perplexity, overtaking several traditional web domains. At 39.2% of social citations across AI platforms, YouTube has become a critical — and underappreciated — GEO channel. Perplexity increasingly references YouTube videos, especially for how-to, review, and comparison content. If your brand has YouTube content (product reviews, tutorials, comparison videos), those videos are eligible for Perplexity citation. YouTube GEO — optimizing video titles, descriptions, and content for AI citation — is an emerging discipline that few brands have tapped.
Perplexity’s live web search model makes it the most responsive platform to recent content changes. When you publish a new Reddit response or press article, Perplexity can discover and cite it within days — sometimes hours. This makes Perplexity the best platform for testing and iterating your GEO strategy. Seed a response in a Reddit thread, then check Perplexity within a week to see if it appears in relevant query results. If it does, you know your targeting is working. If it doesn’t, adjust and try again.
Perplexity also has the most transparent citation chain of any AI platform. For every response, you can see exactly which URLs were referenced. This transparency is invaluable for competitive analysis. Search for your category queries on Perplexity, examine the cited sources, and map which competitors are mentioned, where they’re mentioned, and what content is being cited. This gives you a direct blueprint for your own citation seeding targets.
"Perplexity is the canary in the coal mine for GEO," says Joel House. "Because its citation chain is fully transparent, it’s the best platform for testing whether your seeding strategy is working. If a Reddit response you placed shows up in Perplexity within a week, you know your targeting is correct. If it doesn’t, you adjust before scaling."
One underappreciated aspect of Perplexity’s citation behavior is its preference for primary sources and first-person accounts. When Perplexity has a choice between a listicle blog post and a detailed Reddit comment from someone with direct experience, it tends to favor the Reddit comment. This aligns with the broader trend toward experience-based content in AI citations (the 1.67x citation improvement for first-person writing). For citation seeding, this means your Reddit responses should be written from genuine personal experience, not as generic recommendations.
Where Gemini Gets Its Data
Gemini operates on a fundamentally different architecture than ChatGPT and Perplexity because it’s built on Google’s own search index. This gives it access to Google’s entire web graph, including signals that other AI platforms can’t access. Gemini powers Google’s AI Overviews, which now appear in 48% of search queries — making it the most widely deployed AI recommendation engine by query volume.
Gemini’s citation distribution is more even across source types compared to Perplexity’s Reddit-heavy pattern. It draws from a broad mix of web pages, news articles, review sites, forums, and Google’s own properties. Reddit accounts for approximately 6.6% of AI Overview citations — lower than Perplexity but still significant. YouTube, Google Business Profiles, and Google Shopping results all receive preferential weighting within Gemini’s ecosystem, reflecting Google’s natural bias toward its own properties.
For local businesses, Gemini is the most important AI platform to optimize for. Google Business Profile data feeds directly into Gemini’s recommendations. When someone asks Gemini "best pizza restaurant near me" or "top-rated plumber in Austin," Google Business Profile is the primary data source. This means NAP consistency (Name, Address, Phone), review volume on Google Reviews, and category accuracy on GBP are critical Gemini-specific GEO factors. Local businesses that optimize their GBP thoroughly often see disproportionate AI Overview visibility.
Gemini’s AI Overviews have a particular pattern that marketers need to understand: when an AI Overview appears, organic click-through rates below it drop by 34.5%. This means that if your brand isn’t in the AI Overview, you’re losing traffic even if you rank organically on the page. The AI Overview captures attention and provides the answer, reducing the need for users to click through to individual results. This makes Gemini/AI Overview visibility a prerequisite for maintaining organic traffic, not just an optional add-on.
Gemini also has the strongest preference for [structured data](/blog/what-is-structured-data-ai) of any AI platform. Schema markup (Organization, Product, FAQ, Review, LocalBusiness) directly influences how Gemini understands and presents your brand. Brands with comprehensive schema markup appear in AI Overviews at a meaningfully higher rate than brands without. If you only implement schema for one AI platform’s benefit, implement it for Gemini — the ROI is highest because of AI Overviews’ massive query coverage.
Citation Patterns Across the Buying Journey
AI citation behavior isn’t uniform across all query types. The sources AI models cite change dramatically depending on where the user is in their buying journey. Understanding these patterns lets you align your GEO strategy to the queries that matter most for conversion.
Early stage (awareness): Earned media dominates. When users ask broad informational queries like "What is CRM?" or "How does music licensing work?", AI models cite authoritative publications, Wikipedia, and educational content. At this stage, press coverage in industry publications and comprehensive educational content on your website have the highest citation probability. Reddit and review sites have less influence here because the user isn’t yet looking for recommendations.
Mid stage (consideration): UGC and peer reviews take over. When users move to comparison and recommendation queries like "Best CRM for small business" or "Salesforce vs HubSpot for real estate teams", the citation mix shifts dramatically toward user-generated content. Reddit threads, Quora answers, review sites (G2, Capterra, Trustpilot), and comparison articles become the primary sources. This is where citation seeding has the highest impact, because mid-stage queries are where AI models actively look for authentic user opinions and recommendations. The shift from "what is it" to "which one should I choose" maps directly to a shift from editorial sources to community sources.
Late stage (decision): Brand-owned and competitor content. When users ask specific queries like "Does HubSpot integrate with Stripe?" or "FreshBooks pricing 2026", AI models cite the brand’s own website, documentation, and pricing pages alongside competitor comparison content. At this stage, having accurate, current, and comprehensive product information on your website matters most. Outdated pricing pages, incomplete feature documentation, and missing integration information all reduce your visibility for decision-stage queries.
The strategic implication is that you need different GEO tactics for different journey stages. Press and authoritative content for awareness. Citation seeding and review building for consideration. Website optimization and schema markup for decision. Most brands under-invest in the consideration stage — the Reddit and review layer — which is exactly where AI models look hardest for authentic recommendations. The brands that dominate mid-funnel citations capture the highest-intent traffic because they’re being recommended at the moment of decision.
There’s also a recency dimension to buying-journey citations. AI models apply freshness weighting more aggressively for recommendation queries than for informational queries. A Wikipedia article about CRM from 2020 still gets cited for "What is CRM?" queries. But for "Best CRM in 2026?" queries, content from the last 30 days dominates. This means your citation seeding and content refresh cadence should prioritize mid-funnel and late-funnel content, where freshness has the biggest impact on visibility.
Actionable Takeaways: Where to Focus Your Effort
Let me synthesize the citation data into a concrete action framework. Here’s where each citation source maps to a GEO tactic:
| Citation Source | AI Platform Weight | GEO Tactic | MentionLayer Module |
|---|---|---|---|
| Reddit threads | Perplexity (46.7%), ChatGPT (8-12% for recs), Gemini (6.6%) | Citation seeding in high-authority threads | Citation Engine |
| Wikipedia | ChatGPT (47.9%), Claude (high) | Wikipedia mention strategy, entity accuracy | Entity Sync |
| News publications | All platforms (authority signal) | Press campaigns, thought leadership | PressForge |
| Review platforms | All platforms (trust signal) | Multi-platform review building | Review Engine |
| YouTube | Perplexity (high), Gemini (high) | YouTube GEO, video content optimization | YouTube GEO |
| Google Business | Gemini/AI Overviews (dominant for local) | GBP optimization, review management | Entity Sync |
| Schema markup | Gemini (highest impact), all platforms | Organization, Product, FAQ, Review schema | Technical GEO |
| Brand website | All platforms (10% of citations) | First-person content, regular updates | Content optimization |
If you can only do one thing: Focus on Reddit citation seeding. Reddit is the single highest-leverage source because it’s weighted heavily by Perplexity (46.7% of citations), increasingly referenced by ChatGPT for recommendation queries, included in Gemini’s AI Overviews, and present in Claude’s training data. No other source gives you cross-platform coverage like Reddit does.
If you can do three things: Add entity consistency (particularly Wikipedia and Google Business Profile) and review volume building (particularly Google Reviews and industry-specific platforms). These two additions cover the Wikipedia-heavy ChatGPT channel and the GBP-heavy Gemini channel, giving you broad AI platform coverage.
If you’re going all-in: Layer in press campaigns targeting the publications each AI model cites for your category (use Perplexity’s visible sources to identify them), YouTube content optimization for Perplexity and Gemini visibility, and comprehensive schema markup for Gemini’s AI Overviews. The brands running all six pillars simultaneously create a citation moat that’s extremely difficult for competitors to breach.
"We built MentionLayer’s citation engine around this exact data," says Joel House. "Every thread we target, every response we generate, is informed by which sources each AI platform actually cites. The brands using data-driven citation strategies consistently outperform those taking a spray-and-pray approach."
The data is clear: AI models have specific, measurable source preferences, and those preferences vary by platform. A targeted strategy that aligns your efforts to each platform’s citation behavior will dramatically outperform a generic "optimize everything" approach. Use the citation data in this article to prioritize your efforts, allocate your budget, and measure your progress against the sources that actually drive AI recommendations.
For a deeper dive into platform-specific strategies, read Platform-by-Platform GEO. For the complete Reddit strategy, see Why Reddit Is the Most Important Platform for AI Visibility. And for the full framework that ties citation sources to business outcomes, start with What Is GEO: The Complete Guide. Agencies looking to operationalize these insights can explore how MentionLayer automates the full cycle from audit through citation seeding and monitoring.
Before you decide which sources to chase, see which ones are already citing you. A free AI visibility audit checks where your brand shows up across ChatGPT, Perplexity, Gemini, and Claude and emails back a source-by-source gap report in about 20 minutes.
Frequently Asked Questions
Does Wikipedia still matter for AI citations?
Enormously — but primarily for ChatGPT. Wikipedia accounts for 47.9% of ChatGPT’s top-10 citations, making it the single most important source for ChatGPT visibility. It’s also significant for Claude, which was heavily trained on Wikipedia content. For Perplexity and Gemini, Wikipedia’s influence is lower. If your brand or relevant industry topics have Wikipedia pages, keeping them accurate and current is high-value GEO work.
Why does Perplexity cite so many more sources than ChatGPT?
Perplexity is architecturally a search engine with AI synthesis, while ChatGPT is an AI model with optional search. Perplexity performs live web searches for every query and cites its sources transparently (21.87 per response). ChatGPT relies more on its training knowledge and uses web browsing supplementally (7.92 per response). This fundamental design difference means Perplexity is more responsive to recent content and more influenced by what currently ranks on the web.
How quickly does new content get picked up by AI models?
It varies by platform. Perplexity can discover and cite new content within days or even hours, since it performs live web searches. ChatGPT with browsing typically picks up well-indexed content within 1-2 weeks. Gemini’s AI Overviews reference Google’s index, so content that’s indexed by Google is immediately available. Claude’s training data updates less frequently, so new content may take months to influence Claude’s responses. For fastest impact, target Perplexity first.
Which platform should I focus on first?
Start with Perplexity for two reasons: it’s the most responsive to GEO efforts (live web search, Reddit-heavy citations) and it provides the most transparent feedback loop (visible source citations). Use Perplexity as your testing ground to validate that your citation seeding is working. Once you see results on Perplexity, expand to ChatGPT and Gemini. Reddit-based citation seeding is particularly effective because Reddit is heavily weighted by Perplexity and increasingly referenced by all other platforms.
Do paid/sponsored results appear in AI citations?
Generally no. AI models overwhelmingly cite organic, earned sources rather than paid or sponsored content. Research shows that 90% of LLM citations are earned media. Paid search ads, sponsored content, and advertorials are not typically cited by AI models. This is one reason GEO is fundamentally an earned media strategy rather than a paid media strategy. The exception is Google’s AI Overviews, which occasionally integrate Shopping results for product queries — but even there, organic sources dominate.
Check Your AI Visibility Score
Run a free 5-pillar audit and see where your brand stands across Citations, AI Presence, Entities, Reviews, and Press.
Run Free Audit →Related Articles

The AI Visibility Index: We Tested 1,004 Businesses Across 5 AI Models. 65.9% Are Completely Invisible.

How AI Models Decide Which Brands to Recommend (And Why Yours Might Not Make the List)

Platform-by-Platform GEO: How to Optimize for ChatGPT vs Perplexity vs Gemini vs Claude

Reddit Is the Most Important Platform for AI Visibility (And Most Brands Ignore It)

Your Robots.txt Is Blocking ChatGPT: The AI Crawler Decision Framework
