A realidade do tráfego
de IA e bots

O que 10 bilhões de solicitações, crawlers quebrados e a infraestrutura WordPress revelam sobre a nova realidade do tráfego de bots.

Role para explorar
Kinsta

Ir para uma seção

Pesquisa da Kinsta • 2026

Suas análises estão mentindo para você: uma parte significativa do “tráfego” do seu site não é humana.

Grande parte dos conselhos disponíveis não tem ajudado muito. Dizem para você bloquear tudo ou permitir tudo porque IA é o futuro. Nenhuma dessas abordagens ajuda realmente a gerenciar um site WordPress.

Ao longo do último ano, o tráfego de bots se tornou muito mais do que apenas uma questão de segurança ou SEO. Agora, é uma questão de infraestrutura. Crawlers estão acessando endpoints dinâmicos, entrando em loops de query strings, ignorando cache e criando padrões de tráfego que se parecem menos com indexação normal e mais com automações quebradas em larga escala.

Algumas descobertas importantes

300%
O aumento do tráfego de bots de IA em um ano
Relatório Akamai Digital Fraud & Abuse 2025
No último ano, o tráfego de bots orientados por IA deixou de ser ruído em segundo plano e passou a ser uma preocupação crescente sobre como os sites são rastreados e processados.
1 in 31
web visits is now an AI bot
TollBit State of the Bots Q4 2025
No início de 2025, essa taxa de tráfego estava mais próxima de 1 em 200. No fim do ano, ela passou para 1 em 31 nas redes da TollBit, o que representa uma mudança enorme em um período muito curto.
3.75M
add-to-cart hits from one bot in 24 hours
Dados de infraestrutura da Kinsta
Isso acontece quando o tráfego de bots interrompe o cache dinâmico, solicita endpoints caros ou força um crawler a consumir um volume elevado de recursos.
550M
requests filtered by a single loop rule in 30 days
Kinsta infrastructure data
One misbehaving pattern triggered enough traffic to justify its own mitigation rule. That tells you the problem is not just quantity. It is repetition, loops, and waste.
Toque em cada card para ver mais detalhes

To understand what changed, we looked at industry research, spoke with engineers and practitioners, and analyzed more than 10 billion requests across Kinsta-managed infrastructure. What emerged was not a case for blocking everything or allowing everything, but a case for better judgment.

Insights from our contributors

Daniel Pataki

From an infrastructure perspective, there's no such thing as 'just bot traffic.' Every request is real work. At scale, inefficient crawling stops being a traffic problem and becomes a resource problem.

Daniel Pataki
CTO, Kinsta
David Belson

A maior parte do que estamos vendo não é maliciosa. São bots se comportando de forma ineficiente em escala, e é aí que os verdadeiros problemas começam.

David Belson
Head de Data Insights, Cloudflare
Cristian Lopez

O equívoco é pensar que o tráfego de bots é um problema simples de ‘bloquear ou permitir’. Na realidade, trata-se de política, visibilidade e controle econômico.

Cristian Lopez
Editor-chefe, HostingAdvice
A mudança que ninguém esperava

Mais bots não são o problema. O que mudou é como eles se comportam.

Durante anos, a conversa sobre tráfego de bots focou no volume.

As equipes monitoravam quanto do tráfego era automatizado, filtravam os agentes maliciosos mais óbvios e seguiam em frente. Essa abordagem funcionava quando a maioria dos bots se comportava de forma previsível, rastreando páginas, indexando conteúdo e indo embora.

Esse modelo não funciona mais. Nos últimos dois anos, bots desenvolvidos não apenas para indexar conteúdo para resultados de pesquisa, mas também para ingerir conteúdo em larga escala para treinamento de modelos, geração aumentada por recuperação e consultas acionadas por usuários inundaram a web. Esses crawlers são mais agressivos, mais rápidos e fundamentalmente menos comportados do que qualquer coisa que existia antes.

No final de 2025, crawlers de IA representavam 4,2% das solicitações HTML na rede do Cloudflare e, quando combinados com o tráfego de crawlers como o Googlebot, esse número chegou a 8,5%. Ao mesmo tempo, equipes responsáveis pela operação e gerenciamento de sites começaram a observar padrões como solicitações repetidas, loops e acessos em alto volume a endpoints de baixo valor que não se pareciam em nada com rastreamento tradicional.

4.2%

Esses 4,2% representam uma média anual. O número real variou de 2,4% no início de abril para 6,4% no fim de junho, quase três vezes mais em apenas um ano. Somente o GPTBot cresceu 305% entre maio de 2024 e maio de 2025. De toda a atividade de crawling de IA, 80% é usada exclusivamente para treinamento de modelos, e não para busca ou consultas de usuários. Isso não gera nenhum tráfego de referência de volta para o seu site.

Relatório Anual Cloudflare Radar 2025

8.5%

Somente o Googlebot representa cerca de 4,5% do tráfego HTML, mais do que todos os bots de IA não pertencentes ao Google combinados. Ele rastreou 11,6% das páginas web únicas, em comparação com os 3,6% do GPTBot, e atingiu um pico de 11% de todas as requisições HTML no final de abril. Bloqueá-lo para reduzir a carga do servidor seria uma das decisões mais contraproducentes que um proprietário de site poderia tomar.

Relatório Anual Cloudflare Radar 2025

O panorama geral

Essa mudança não está afetando apenas a infraestrutura, ela também está mudando a forma como o conteúdo é descoberto. Crawlers estão gastando mais tempo em URLs de baixo valor, enquanto sistemas de IA estão cada vez mais exibindo respostas sem enviar tráfego de volta para a página original. O resultado: o comportamento dos bots impacta diretamente tanto os custos da sua infraestrutura quanto a sua visibilidade. A forma como os bots interagem com o seu site nunca foi tão importante.
Comportamento dos bots na prática

A maioria dos bots não está atacando. Eles apenas estão presos em loops.

A maior parte dos crawlers de IA foi desenvolvida para seguir todos os links encontrados e registrar cada endereço de página único. Essa abordagem funciona bem em sites simples. Mas sites modernos, especialmente lojas eCommerce, geram URLs ligeiramente diferentes para essencialmente a mesma página.

Por exemplo, a equipe observou o meta-externalagent, crawler de IA do Facebook/Meta, percorrendo repetidamente variações de query strings em vários sites. Para um humano, um link de produto com filtro de cor, um link de carrinho com quantidade ou uma página de calendário com uma ordem de classificação parecem a mesma página. Para um bot seguindo URLs, cada uma delas parece completamente nova.

O humano vêuma página de produto
vs
O bot vê6 URLs completamente diferentes
crawl_trace.log
0 solicitações
/product
Bot encontra uma página de produto
/product?color=red
Segue um link de filtro de cor
/product?color=red&size=M
A página gera uma variação de tamanho
/product?color=red&size=M&sort=asc
A ordenação cria outra URL única
/product?color=red&size=M&sort=asc&page=2
A paginação adiciona mais uma combinação
/product?color=red&size=M&sort=asc&page=2&stock=true
O filtro de estoque dobra novamente o número de URLs
loop detectado...

Então o bot segue o primeiro link… essa página gera outra variação, que o bot também segue. E depois outra. E outra… Ele não consegue reconhecer que está preso em um loop, e alguns desses loops passaram vários dias sem serem detectados antes que regras da infraestrutura os identificassem.

Esse tipo de comportamento nem sempre vem de sistemas altamente sofisticados.

Como destaca David Belson, do Cloudflare, nem todos os bots operam com o mesmo nível de disciplina: “Tem gente que ontem nem sabia o que estava fazendo, mas hoje criou um bot no vibe coding e soltou ele na internet, sem nem se preocupar em verificar o robots.txt.”

7,67 milhões de solicitações atingiram URLs de adicionar ao carrinho em 24 horas

Até mesmo o crawler do Google, aquele que você definitivamente não pode bloquear, acabou preso na mesma armadilha.

ClaudeBot
(3.75M)
48.9%
BLEXBot
(1.84M)
24%
GPTBot
(0.98M)
12.8%
Googlebot
(0.71M)
9.3%
AhrefsBot
(0.39M)
5.1%

Para colocar esses números em perspectiva, 3,75 milhões de solicitações em 24 horas equivalem aproximadamente a uma solicitação a cada 23 milissegundos, durante todo o dia e toda a noite, cada uma tratada pelo servidor como uma nova solicitação, e não como algo que pode ser armazenado em cache.

Em escala, esse tipo de comportamento nem sempre é intencional.

"You can't just spray and pray… you've got to act like a responsible end user," Belson explains. "You can't be hammering a website with requests."

Where the system breaks

Your server doesn't know it's talking to a bot

The behavior itself is not the issue. If every request were cheap, loops and repeated visits wouldn't matter much.

On a simple static page, most requests can be served from cache. The server returns a cached version of the page, and the cost per request remains low.

That model breaks down quickly on real-world WordPress sites, especially those running WooCommerce, search, filtering, or plugin-heavy functionality.

A large portion of traffic doesn't even hit static pages. It hits endpoints like:

?add-to-cart=
Filtered product pages
Search queries
Wishlist actions
AJAX-powered interactions
Calendar views with query parameters

These are not cacheable in the same way. They require the server to do actual work every time.

Each request triggers

PHP execution

A PHP thread (pka PHP worker) is reserved for the full duration of every request. Under sustained bot load, threads exhaust and legitimate visitors wait.

Database queries

Dynamic pages query your database on every load. No cache layer can absorb this at scale.

Session handling

Cart and checkout pages create or validate sessions, adding overhead even for bots that never convert.

The SEO cost

Google explicitly calls out faceted navigation and parameter-based URLs as a source of crawl inefficiency, where bots can explore near-infinite variations of the same page. Because each variation looks new, crawlers keep requesting them, consuming resources and slowing down the discovery of pages that actually matter.

Is this an attack? Normal bot activity? Something in between? That ambiguity is exactly what makes it hard to remedy. Because the same patterns affect both performance and discoverability, the right response depends on what you're trying to protect.

The trade-offs

Choose what you're optimizing for

After seeing how bots behave and the impact they can have, the natural reaction is: Block them. But blocking bots indiscriminately isn't the answer, nor is leaving the door wide open.

As Belson puts it: "You need to take the first step and put a bouncer outside the door to decide who gets in and who doesn't."

Not all bots are harmful, and not all traffic should be treated the same way. Some bots drive discoverability, some consume resources without adding value, and others fall somewhere in between.

Even at the network level, the goal isn't to eliminate bots entirely. "I'm not a person who would tell anybody to block all bots," Belson says. "There's real value in some of that traffic."

O desafio agora não é decidir se os bots são bons ou ruins. É entender como diferentes decisões afetam seu site e quais impactos você está disposto a aceitar.

As Cristian Lopez, Managing Editor at HostingAdvice, puts it: "The misconception is thinking it's simply a 'block or allow' issue. In reality, it's now a question of policy, visibility, and economic control."

01

Discoverability and performance

Search crawlers are essential for helping people discover your site, but they don't always operate efficiently, which calls for a balancing act. Blocking them too aggressively can limit your visibility in search results, while allowing unrestricted access can introduce unnecessary load, particularly when they start hitting dynamic pages that require real processing rather than being served from cache.

The goal is not to choose one over the other, but to control how much of each you allow, based on how your site actually behaves.

Search visibilityServer load
02

Access and resource cost

Some bots provide indirect value — AI systems referencing your content, tools indexing your pages, or services aggregating data across the web — but every request comes with a cost in terms of CPU usage, database queries, memory, and bandwidth. As that activity scales, those costs don't stay marginal; they accumulate and start to have a noticeable impact.

Not all access needs to be unrestricted. The value a bot provides should be weighed against the cost it introduces.

Indirect reachInfrastructure cost
03

Control and simplicity

In simple cases, automation can handle bot management effectively, but the right approach ultimately depends on the type of site you're running, the kind of traffic you're seeing, and what matters most for your goals. Relying entirely on automation can simplify things, but it also means you are not shaping how those decisions are made for your specific site.

The best systems don't force a choice between ease and control. They allow you to start simple and adjust where it matters.

Ease of managementPrecision and override

This overlap causes confusion. Traffic spikes, performance drops, and it's not always clear whether to block, allow, or ignore, even for the same pattern on two different sites.

The question isn't:

"Should I allow bots?"

It's:

"Which bots, on which parts of my site, under what conditions?"

Answering this question requires a different way of thinking. We explore that in the next section.

The decision framework

Uma maneira melhor de decidir o que permitir, verificar ou bloquear

There is no universal bot policy that works for every site. A WooCommerce store, a content site, a business website, and a staging environment do not face the same risks, and do not require the same solutions.

The right approach depends on what your site does, what kind of traffic it receives, and what you're trying to optimize for. In most setups, this level of decision-making is handled by infrastructure tooling rather than manually configured per-request, but understanding the logic helps you know what's running on your behalf, and when it makes sense to adjust.

What matters here is not just traffic, but the kind of visibility you want, whether it's search rankings, AI citations, or direct user visits.

01What type of site are you managing?
02What matters most right now?
Cart endpoint protection
Recommended approach

Your performance issues are likely bots hitting WooCommerce's add-to-cart and checkout endpoints. These bypass the page cache entirely and force PHP execution and database queries on every single request. The fix isn't blocking everything. The goal is protecting specific high-cost paths.

How to handle common crawler patterns
Googlebot / Bingbot
Allow with path restrictions
Fully allow, but block access to /cart, /checkout, ?add-to-cart= paths via robots.txt
AI training crawlers
Verificar
GPTBot, ClaudeBot e Amazonbot, eles não ganham nada acessando páginas de carrinho. Verifique no WAF.
Unverified bots
Block
Unknown scrapers have no reason to access store endpoints
Your automations
Whitelist by IP
Allow order sync tools, stock managers, and uptime monitors explicitly by IP range
3 things to do next
1Block all crawlers from /shop?add-to-cart=, and /checkout in robots.txt.
2Se você usa a Kinsta, habilite a prevenção de acesso de bots no MyKinsta (ou no Cloudflare) e configure padrões de URL de carrinho e lista de desejos para bloquear ou verificar.
3Audit WooCommerce permalink settings to reduce URL parameter sprawl. Session tokens and quantity suffixes generate loop-prone URL variants.
Trade-off to watch: Do not block Googlebot from your product pages. It needs to crawl /shop, /product/, and category pages for your store to rank. Restrict it from specific dynamic endpoints only, not the whole site.

The configurations above represent what managing bot traffic looks like when done by hand. In practice, Kinsta's Bot Protection handles the majority of these patterns automatically. Enable your desired level of protection once, and our system takes care of the rest (no per-path rules or manual exceptions required).

The current approach

Most systems weren't designed for this level of control

Most platforms either manage bot traffic automatically by making decisions behind the scenes or expose controls that require manual configuration.

Automatic systems catch obvious threats and allow known crawlers, but they don't account for how traffic behaves on specific parts of your site, or what it costs in context. In some cases, legitimate AI crawlers get blocked at the edge, creating a discoverability blind spot most teams never know about.

Manual controls offer more flexibility. But they often require a level of precision that most site owners don't have time to manage continuously. And without guidance, they're easy to misconfigure.

What's missing isn't just control, but usable control.

The ability to adjust behavior where it matters, without breaking essential traffic, and without rebuilding your entire approach from scratch every time something changes.

Most sites don't need complete automation or complete control. They need the ability to make targeted decisions without having to rebuild their entire traffic strategy every time patterns change.

Nesse ponto, o desafio não é identificar tráfego de bots. É gerenciá-lo de uma forma que reflita como seu site realmente funciona.

The most effective approaches today don't force a choice between automation and control. They provide safe defaults while allowing targeted adjustments where it actually matters.
What to do next

What the right response looks like in different situations

By this point, the pattern is clear that there isn't a single rule that works everywhere. The right response depends on what kind of site you're running, what kind of traffic you're seeing, and how urgent the situation is.

What follows isn't a checklist. It's a way to think through what to do next based on where you are right now.

Start with visibility, then make one targeted decision

1
Understand what is actually hitting your site

Before making changes, look at what your traffic actually consists of. You're not trying to identify every bot. You're looking for patterns: repeated requests to the same types of URLs, especially ones that shouldn't matter to a crawler, like cart endpoints or parameter-heavy pages. Most analytics tools or server logs will give you enough visibility to spot this activity.

2
Let baseline protections do their job

Most platforms already filter the most clearly broken patterns, such as obvious loops or known abusive traffic. Make sure those baseline protections are active and give them time to run. They're usually conservative by design, which means they reduce noise without affecting legitimate visitors or search crawlers.

3
Make one targeted change

Once you see a pattern, act on that pattern (not everything at once). If bots are repeatedly hitting dynamic endpoints, limit access to those paths. If specific crawlers are scraping content aggressively, decide whether that access is worth the cost. The goal at this stage isn't perfection; it's to reduce unnecessary load without introducing new problems.

For agencies

Apply this process across a few different client sites. The patterns you see on an e-commerce site, a content site, and a service site will be different, but consistent enough to build a repeatable approach for client conversations.

Where we go from here

Bot traffic isn't going anywhere.
Your strategy should

By this point, the pattern is clear. Bot traffic is no longer something you can treat as occasional noise or filter out at the edges. It is a constant, evolving part of how websites are accessed and stressed.

What makes it difficult isn't just the volume, but the overlap. The same systems that help people find your site can also consume its resources, and patterns that appear to be normal crawling can behave like inefficient automation at scale.

So no single rule works everywhere.

The right approach depends on your site, your traffic, and what you're trying to protect. It requires understanding how your site actually behaves, and making decisions that reflect that reality.

This kind of shift isn't entirely new in the web's evolution. As Jordan Sprogis, Contributing Expert at HostingAdvice, puts it: "It's not that different from where SSL was, where it was a paid add-on for a long time… now, SSL certificates are included in just about every hosting package."

Most of the time, the goal is to reduce unnecessary load, preserve visibility where it matters, and maintain a system you can rely on as things change. What comes next will be harder to categorize. Agentic traffic, automated tools built to take actions, is already showing up in infrastructure data. Google recently announced a dedicated user-agent to log when its AI agents interact with sites. The responsible platforms will identify themselves, respect crawl delays, and avoid hammering endpoints that serve no purpose. Others won't. The line between a human visitor and an agent will continue to blur.

And when automated traffic inflates your visit counts, raw numbers no longer reflect reality. The signals that matter are the correlated ones: branded search volume, direct traffic, engagement quality, and revenue tied to real visitor behavior. If those metrics are also moving, you know you're visible where it counts.

The sites that navigate bot traffic well won't be the ones that blocked the most. They'll be the ones whose operators understood what they were optimizing for and made deliberate decisions about it.
Kinsta bot protection

Control how bots interact with your WordPress site without breaking search visibility

Kinsta's Bot Protection gives you environment-level control with sensible defaults, so you can manage how different types of traffic interact with your site without blocking search engines or compromising discoverability.

G2 #1 WordPress Hosting — Fall 2025

This report was brought to you by Kinsta.

Kinsta is a premium managed WordPress hosting platform with more than 230,000+ customers worldwide. #1 on G2 for satisfaction. 24/7 expert support in 10 languages.

Someone on your team should probably see this.

Share on LinkedIn
KinstaManaged hosting for WordPress