A realidade do tráfego
de IA e bots
O que 10 bilhões de solicitações, crawlers quebrados e a infraestrutura WordPress revelam sobre a nova realidade do tráfego de bots.
Ir para uma seção
Suas análises estão mentindo para você: uma parte significativa do “tráfego” do seu site não é humana.
Grande parte dos conselhos disponíveis não tem ajudado muito. Dizem para você bloquear tudo ou permitir tudo porque IA é o futuro. Nenhuma dessas abordagens ajuda realmente a gerenciar um site WordPress.
Ao longo do último ano, o tráfego de bots se tornou muito mais do que apenas uma questão de segurança ou SEO. Agora, é uma questão de infraestrutura. Crawlers estão acessando endpoints dinâmicos, entrando em loops de query strings, ignorando cache e criando padrões de tráfego que se parecem menos com indexação normal e mais com automações quebradas em larga escala.
Algumas descobertas importantes
To understand what changed, we looked at industry research, spoke with engineers and practitioners, and analyzed more than 10 billion requests across Kinsta-managed infrastructure. What emerged was not a case for blocking everything or allowing everything, but a case for better judgment.
Insights from
our contributors
“From an infrastructure perspective, there's no such thing as 'just bot traffic.' Every request is real work. At scale, inefficient crawling stops being a traffic problem and becomes a resource problem.”
“A maior parte do que estamos vendo não é maliciosa. São bots se comportando de forma ineficiente em escala, e é aí que os verdadeiros problemas começam.”
“O equívoco é pensar que o tráfego de bots é um problema simples de ‘bloquear ou permitir’. Na realidade, trata-se de política, visibilidade e controle econômico.”
Mais bots não são o problema.
O que mudou é como eles se comportam.
Durante anos, a conversa sobre tráfego de bots focou no volume.
As equipes monitoravam quanto do tráfego era automatizado, filtravam os agentes maliciosos mais óbvios e seguiam em frente. Essa abordagem funcionava quando a maioria dos bots se comportava de forma previsível, rastreando páginas, indexando conteúdo e indo embora.
Esse modelo não funciona mais. Nos últimos dois anos, bots desenvolvidos não apenas para indexar conteúdo para resultados de pesquisa, mas também para ingerir conteúdo em larga escala para treinamento de modelos, geração aumentada por recuperação e consultas acionadas por usuários inundaram a web. Esses crawlers são mais agressivos, mais rápidos e fundamentalmente menos comportados do que qualquer coisa que existia antes.
No final de 2025, crawlers de IA representavam 4,2% das solicitações HTML na rede do Cloudflare e, quando combinados com o tráfego de crawlers como o Googlebot, esse número chegou a 8,5%. Ao mesmo tempo, equipes responsáveis pela operação e gerenciamento de sites começaram a observar padrões como solicitações repetidas, loops e acessos em alto volume a endpoints de baixo valor que não se pareciam em nada com rastreamento tradicional.
Esses 4,2% representam uma média anual. O número real variou de 2,4% no início de abril para 6,4% no fim de junho, quase três vezes mais em apenas um ano. Somente o GPTBot cresceu 305% entre maio de 2024 e maio de 2025. De toda a atividade de crawling de IA, 80% é usada exclusivamente para treinamento de modelos, e não para busca ou consultas de usuários. Isso não gera nenhum tráfego de referência de volta para o seu site.
Relatório Anual Cloudflare Radar 2025
Somente o Googlebot representa cerca de 4,5% do tráfego HTML, mais do que todos os bots de IA não pertencentes ao Google combinados. Ele rastreou 11,6% das páginas web únicas, em comparação com os 3,6% do GPTBot, e atingiu um pico de 11% de todas as requisições HTML no final de abril. Bloqueá-lo para reduzir a carga do servidor seria uma das decisões mais contraproducentes que um proprietário de site poderia tomar.
Relatório Anual Cloudflare Radar 2025
O panorama geral
A maioria dos bots não está atacando.
Eles apenas estão presos em loops.
A maior parte dos crawlers de IA foi desenvolvida para seguir todos os links encontrados e registrar cada endereço de página único. Essa abordagem funciona bem em sites simples. Mas sites modernos, especialmente lojas eCommerce, geram URLs ligeiramente diferentes para essencialmente a mesma página.
Por exemplo, a equipe observou o meta-externalagent, crawler de IA do Facebook/Meta, percorrendo repetidamente variações de query strings em vários sites. Para um humano, um link de produto com filtro de cor, um link de carrinho com quantidade ou uma página de calendário com uma ordem de classificação parecem a mesma página. Para um bot seguindo URLs, cada uma delas parece completamente nova.
Então o bot segue o primeiro link… essa página gera outra variação, que o bot também segue. E depois outra. E outra… Ele não consegue reconhecer que está preso em um loop, e alguns desses loops passaram vários dias sem serem detectados antes que regras da infraestrutura os identificassem.
Esse tipo de comportamento nem sempre vem de sistemas altamente sofisticados.
Como destaca David Belson, do Cloudflare, nem todos os bots operam com o mesmo nível de disciplina: “Tem gente que ontem nem sabia o que estava fazendo, mas hoje criou um bot no vibe coding e soltou ele na internet, sem nem se preocupar em verificar o robots.txt.”
7,67 milhões de solicitações atingiram URLs de adicionar ao carrinho em 24 horas
Até mesmo o crawler do Google, aquele que você definitivamente não pode bloquear, acabou preso na mesma armadilha.
Para colocar esses números em perspectiva, 3,75 milhões de solicitações em 24 horas equivalem aproximadamente a uma solicitação a cada 23 milissegundos, durante todo o dia e toda a noite, cada uma tratada pelo servidor como uma nova solicitação, e não como algo que pode ser armazenado em cache.
Em escala, esse tipo de comportamento nem sempre é intencional.
"You can't just spray and pray… you've got to act like a responsible end user," Belson explains. "You can't be hammering a website with requests."
Your server doesn't know it's
talking to a bot
The behavior itself is not the issue. If every request were cheap, loops and repeated visits wouldn't matter much.
On a simple static page, most requests can be served from cache. The server returns a cached version of the page, and the cost per request remains low.
That model breaks down quickly on real-world WordPress sites, especially those running WooCommerce, search, filtering, or plugin-heavy functionality.
A large portion of traffic doesn't even hit static pages. It hits endpoints like:
These are not cacheable in the same way. They require the server to do actual work every time.
Each request triggers
A PHP thread (pka PHP worker) is reserved for the full duration of every request. Under sustained bot load, threads exhaust and legitimate visitors wait.
Dynamic pages query your database on every load. No cache layer can absorb this at scale.
Cart and checkout pages create or validate sessions, adding overhead even for bots that never convert.
The SEO cost
Is this an attack? Normal bot activity? Something in between? That ambiguity is exactly what makes it hard to remedy. Because the same patterns affect both performance and discoverability, the right response depends on what you're trying to protect.
Choose what you're
optimizing for
After seeing how bots behave and the impact they can have, the natural reaction is: Block them. But blocking bots indiscriminately isn't the answer, nor is leaving the door wide open.
As Belson puts it: "You need to take the first step and put a bouncer outside the door to decide who gets in and who doesn't."
Not all bots are harmful, and not all traffic should be treated the same way. Some bots drive discoverability, some consume resources without adding value, and others fall somewhere in between.
Even at the network level, the goal isn't to eliminate bots entirely. "I'm not a person who would tell anybody to block all bots," Belson says. "There's real value in some of that traffic."
O desafio agora não é decidir se os bots são bons ou ruins. É entender como diferentes decisões afetam seu site e quais impactos você está disposto a aceitar.
As Cristian Lopez, Managing Editor at HostingAdvice, puts it: "The misconception is thinking it's simply a 'block or allow' issue. In reality, it's now a question of policy, visibility, and economic control."
Discoverability and performance
Search crawlers are essential for helping people discover your site, but they don't always operate efficiently, which calls for a balancing act. Blocking them too aggressively can limit your visibility in search results, while allowing unrestricted access can introduce unnecessary load, particularly when they start hitting dynamic pages that require real processing rather than being served from cache.
The goal is not to choose one over the other, but to control how much of each you allow, based on how your site actually behaves.
Access and resource cost
Some bots provide indirect value — AI systems referencing your content, tools indexing your pages, or services aggregating data across the web — but every request comes with a cost in terms of CPU usage, database queries, memory, and bandwidth. As that activity scales, those costs don't stay marginal; they accumulate and start to have a noticeable impact.
Not all access needs to be unrestricted. The value a bot provides should be weighed against the cost it introduces.
Control and simplicity
In simple cases, automation can handle bot management effectively, but the right approach ultimately depends on the type of site you're running, the kind of traffic you're seeing, and what matters most for your goals. Relying entirely on automation can simplify things, but it also means you are not shaping how those decisions are made for your specific site.
The best systems don't force a choice between ease and control. They allow you to start simple and adjust where it matters.
This overlap causes confusion. Traffic spikes, performance drops, and it's not always clear whether to block, allow, or ignore, even for the same pattern on two different sites.
"Should I allow bots?"
"Which bots, on which parts of my site, under what conditions?"
Answering this question requires a different way of thinking. We explore that in the next section.
Uma maneira melhor de decidir o que
permitir, verificar ou bloquear
There is no universal bot policy that works for every site. A WooCommerce store, a content site, a business website, and a staging environment do not face the same risks, and do not require the same solutions.
The right approach depends on what your site does, what kind of traffic it receives, and what you're trying to optimize for. In most setups, this level of decision-making is handled by infrastructure tooling rather than manually configured per-request, but understanding the logic helps you know what's running on your behalf, and when it makes sense to adjust.
What matters here is not just traffic, but the kind of visibility you want, whether it's search rankings, AI citations, or direct user visits.
Your performance issues are likely bots hitting WooCommerce's add-to-cart and checkout endpoints. These bypass the page cache entirely and force PHP execution and database queries on every single request. The fix isn't blocking everything. The goal is protecting specific high-cost paths.
/cart, /checkout, ?add-to-cart= paths via robots.txt/shop?add-to-cart=, and /checkout in robots.txt./shop, /product/, and category pages for your store to rank. Restrict it from specific dynamic endpoints only, not the whole site.The configurations above represent what managing bot traffic looks like when done by hand. In practice, Kinsta's Bot Protection handles the majority of these patterns automatically. Enable your desired level of protection once, and our system takes care of the rest (no per-path rules or manual exceptions required).
Most systems weren't designed
for this level of control
Most platforms either manage bot traffic automatically by making decisions behind the scenes or expose controls that require manual configuration.
Automatic systems catch obvious threats and allow known crawlers, but they don't account for how traffic behaves on specific parts of your site, or what it costs in context. In some cases, legitimate AI crawlers get blocked at the edge, creating a discoverability blind spot most teams never know about.
Manual controls offer more flexibility. But they often require a level of precision that most site owners don't have time to manage continuously. And without guidance, they're easy to misconfigure.
What's missing isn't just control, but usable control.
The ability to adjust behavior where it matters, without breaking essential traffic, and without rebuilding your entire approach from scratch every time something changes.
Most sites don't need complete automation or complete control. They need the ability to make targeted decisions without having to rebuild their entire traffic strategy every time patterns change.
Nesse ponto, o desafio não é identificar tráfego de bots. É gerenciá-lo de uma forma que reflita como seu site realmente funciona.
What the right response looks
like in different situations
By this point, the pattern is clear that there isn't a single rule that works everywhere. The right response depends on what kind of site you're running, what kind of traffic you're seeing, and how urgent the situation is.
What follows isn't a checklist. It's a way to think through what to do next based on where you are right now.
Start with visibility, then make one targeted decision
Before making changes, look at what your traffic actually consists of. You're not trying to identify every bot. You're looking for patterns: repeated requests to the same types of URLs, especially ones that shouldn't matter to a crawler, like cart endpoints or parameter-heavy pages. Most analytics tools or server logs will give you enough visibility to spot this activity.
Most platforms already filter the most clearly broken patterns, such as obvious loops or known abusive traffic. Make sure those baseline protections are active and give them time to run. They're usually conservative by design, which means they reduce noise without affecting legitimate visitors or search crawlers.
Once you see a pattern, act on that pattern (not everything at once). If bots are repeatedly hitting dynamic endpoints, limit access to those paths. If specific crawlers are scraping content aggressively, decide whether that access is worth the cost. The goal at this stage isn't perfection; it's to reduce unnecessary load without introducing new problems.
Apply this process across a few different client sites. The patterns you see on an e-commerce site, a content site, and a service site will be different, but consistent enough to build a repeatable approach for client conversations.
Bot traffic isn't going anywhere.
Your strategy should
By this point, the pattern is clear. Bot traffic is no longer something you can treat as occasional noise or filter out at the edges. It is a constant, evolving part of how websites are accessed and stressed.
What makes it difficult isn't just the volume, but the overlap. The same systems that help people find your site can also consume its resources, and patterns that appear to be normal crawling can behave like inefficient automation at scale.
So no single rule works everywhere.
The right approach depends on your site, your traffic, and what you're trying to protect. It requires understanding how your site actually behaves, and making decisions that reflect that reality.
This kind of shift isn't entirely new in the web's evolution. As Jordan Sprogis, Contributing Expert at HostingAdvice, puts it: "It's not that different from where SSL was, where it was a paid add-on for a long time… now, SSL certificates are included in just about every hosting package."
Most of the time, the goal is to reduce unnecessary load, preserve visibility where it matters, and maintain a system you can rely on as things change. What comes next will be harder to categorize. Agentic traffic, automated tools built to take actions, is already showing up in infrastructure data. Google recently announced a dedicated user-agent to log when its AI agents interact with sites. The responsible platforms will identify themselves, respect crawl delays, and avoid hammering endpoints that serve no purpose. Others won't. The line between a human visitor and an agent will continue to blur.
And when automated traffic inflates your visit counts, raw numbers no longer reflect reality. The signals that matter are the correlated ones: branded search volume, direct traffic, engagement quality, and revenue tied to real visitor behavior. If those metrics are also moving, you know you're visible where it counts.
Control how bots interact with your WordPress site without breaking search visibility
Kinsta's Bot Protection gives you environment-level control with sensible defaults, so you can manage how different types of traffic interact with your site without blocking search engines or compromising discoverability.

This report was brought to you by Kinsta.
Kinsta is a premium managed WordPress hosting platform with more than 230,000+ customers worldwide. #1 on G2 for satisfaction. 24/7 expert support in 10 languages.
