How LLM bots are distorting web traffic
As large language models (LLMs) and AI-powered agents increasingly browse the web, they’re silently reshaping how traffic data is collected, analyzed, and interpreted. These bots mimic human-like navigation to retrieve, summarize, or test content, and they’re doing it a lot.
According to multiple analytics platforms, more than half of all internet traffic now comes from bots. And while some bots (like Google’s crawler) serve legitimate purposes, the new generation of AI-driven crawlers, from ChatGPT, Perplexity, Claude, and others, is introducing a new layer of noise into conversion data and experimentation results.
The new era of bot traffic
Bot activity isn’t new, but the scale and sophistication of LLM-based bots have changed the game.
Traditional web crawlers were easy to identify through user agents or IP ranges. LLM bots, however, often run client-side JavaScript, execute server-side fetch requests, and render dynamic content, something that was previously exclusive to real users. Worse than that, they're usually not identified by using user agents or IP ranges.
This creates a serious measurement blind spot:
- It inflates visitor counts from non-human sessions.
- It distorts engagement metrics such as session duration or page depth.
- It corrupts conversion data where bot-driven events appear as legitimate signups or interactions.
For growth and product teams relying on precise experimentation frameworks, this is not a trivial nuisance, but a threat to decision quality.
Why growth and product teams should care
Conversion rates are a foundational KPI. When bots count as visitors but don’t convert, your conversion rate plummets artificially. Conversely, if a bot triggers events or API calls mimicking conversions, you might end up overestimating success.
Think about this:
- A product page viewed 10,000 times by humans with 200 conversions yields a 2% conversion rate.
- Add 3,000 bot pageviews from AI scrapers, and suddenly your dashboard shows 1.5%.
That’s not a marketing problem. That’s a data integrity problem.
Misinterpreted data cascades through everything: experiment evaluations, cohort analyses, and growth loops. If your experimentation platform is fed polluted data, it’s effectively optimizing for non-human behavior.
Filtering bots after the data hits your analytics tool means your metrics are already contaminated.
Bots that render client-side JavaScript execute your tracking code, so their sessions are logged, their “pageviews” counted. Even if you later exclude them from reports, your experiment exposure counts, cookie quotas, and real-time segments are already skewed.
This makes early-stage product experiments, especially those running on limited samples, dangerously unreliable. A single spike in LLM traffic can flip your experiment winner overnight.
How leading platforms handle bot traffic
The trend is clear: everyone is building stronger detection. Most platforms rely heavily on static user-agent lists and after-the-fact event filtering. Let's take a look.
IAB blocklists
Many analytics and experimentation tools rely on the IAB/ABC International Spiders & Bots List to classify non-human traffic and exclude it from reports.
Amplitude blocks ingestion based on User-Agent matches to the IAB list, Optimizely applies IAB filtering across Web and Feature Experimentation, and Adobe Analytics lets you enable IAB rules and add custom bot rules (user agents and IP ranges).
SDK configs
In some platforms, client-side filtering happens by default since SDKs already cover it. However, you must pay attention to this if you're implementing experimentation tools on the server-side or using non-browser SDKs.
In Optimizely Feature Experimentation, for example, you need to ensure you're passing the user-agent on requests to unlock bot filtering.
Mixpanel filters a handful of bots by default but recommends setting $ignore or pattern-matching user agents to catch the rest. If you don’t pass the right attributes, bots slip into your metrics.
Experiment-level filters
For experimentation-focused platforms, there is usually an option to filter bots only from experiments instead of applying a general filter.
Statsig and Croct, for example, remove known bots from exposure data to keep experiment analysis clean. However, Statsig still serves flags/variants to bots unless you explicitly segment and override them. Croct, on the other hand, serves bots with the default content.
Server-side global filtering
Tealium’s server-side filtering drops events whose user agents match extensive bot patterns (from generic ‘bot/spider’ to specific crawlers). Similarly, Croct drops entire sessions and events for visitors who don't have any client-side events, indicating they're just bots indexing page content.
Blocking at the edge/server avoids polluting downstream analytics or billing in the first place.
Device intelligence and behavior signals
Vendors like Fingerprint detect automation using device fingerprints and behavioral signals to stop fake signups, scripted browsing, and ATO attempts, complementing list-based filters with real-time risk scoring.
WAF-level defenses
At the infra layer, platforms like Vercel ship managed rulesets that challenge non-browser traffic and maintain an “AI bots” list to log or deny crawlers like GPTBot/Perplexity, while allowing verified bots.
Caveat: reverse proxies in front of your app can degrade detection accuracy and increase false challenges.
Historical data
If you enable filtering mid-experiment or after data collection, most tools won’t retroactively fix old events. Optimizely (and others) explicitly warn you may need to discard or manually re-filter early data to avoid bias.
How Croct is solving this
At Croct, we’ve built bot filtering directly into the personalization and experimentation layer, not just analytics. This means we exclude bots before they even touch your metrics, keeping your quotas safe and untouched.
Our approach combines:
- User-agent intelligence, using an actively maintained database of known bots and crawlers.
- Dual-level filtering for both client and server side, ensuring bots will never see personalized or variant content.
- Experiment integrity, excluding bots from AB tests from the start.
This ensures your monthly visitor counts reflect real users, your conversion rates remain trustworthy, and your experiments produce statistically valid results, free from automated bias.
Croct’s bot filtering is enabled by default across all plans, so you don’t need to worry about complex setup or data cleanup.
The bottom line
AI bots are not going away. If anything, they’re becoming more indistinguishable from humans.
For growth and product teams, this means adapting analytics pipelines and experimentation frameworks to recognize and neutralize their impact.
Croct’s real-time filtering ensures your insights, experiments, and personalization decisions are based on real human behavior, not synthetic noise. Want to learn more about this? Check our documentation on bot filtering or talk to our team about ensuring your experiments stay human-first.