Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

AI Web Crawlers Are Destroying Websites in Their Never-Ending Hunger for Any and All Content
The Register ^ | Steven J. Vaughan-Nichols

Posted on 09/01/2025 1:41:35 PM PDT by nickcarraway

Opinion With AI's rise, AI web crawlers are strip-mining the web in their perpetual hunt for ever more content to feed into their Large Language Model (LLM) mills. How much traffic do they account for? According to Cloudflare, a major content delivery network (CDN) force, 30% of global web traffic now comes from bots. Leading the way and growing fast? AI bots.

Cloud services company Fastly agrees. It reports that 80% of all AI bot traffic comes from AI data fetcher bots. So, you ask, "What's the problem? Haven't web crawlers been around since 1993 with the arrival of the World Wide Web Wanderer in 1993?" Well, yes, they have. Anyone who runs a website, though, knows there's a huge, honking difference between the old-style crawlers and today's AI crawlers. The new ones are site killers.

Fastly warns that they're causing "performance degradation, service disruption, and increased operational costs." Why? Because they're hammering websites with traffic spikes that can reach up to ten or even twenty times normal levels within minutes.

Moreover, AI crawlers are much more aggressive than standard crawlers. As the InMotionhosting web hosting company notes, they also tend to disregard crawl delays or bandwidth-saving guidelines and extract full page text, and sometimes attempt to follow dynamic links or scripts.

The result? If you're using a shared server for your website, as many small businesses do, even if your site isn't being shaken down for content, other sites on the same hardware with the same Internet pipe may be getting hit. This means your site's performance drops through the floor even if an AI crawler isn't raiding your website.

Smaller sites, like my own Practical Tech, get slammed to the point where they're simply knocked out of service. Thanks to Cloudflare Distributed Denial of Service (DDoS) protection, my microsite can shrug off DDoS attacks. AI bot attacks – and let's face it, they are attacks – not so much.

Even large websites are feeling the crush. To handle the load, they must increase their processor, memory, and network resources. If they don't? Well, according to most web hosting companies, if a website takes longer than three seconds to load, more than half of visitors will abandon the site. Bounce rates jump up for every second beyond that threshold.

So when AI searchbots, with Meta (52% of AI searchbot traffic), Google (23%), and OpenAI (20%) leading the way, clobber websites with as much as 30 Terabits in a single surge, they're damaging even the largest companies' site performance.

Now, if that were traffic that I could monetize, it would be one thing. It's not. It used to be when search indexing crawler, Googlebot, came calling, I could always hope that some story on my site would land on the magical first page of someone's search results so they'd visit me, they'd read the story, and two or three times out of a hundred visits, they'd click on an ad, and I'd get a few pennies of income. Or, if I had a business site, I might sell a widget or get someone to do business with me.

AI searchbots? Not so much. AI crawlers don't direct users back to the original sources. They kick our sites around, return nothing, and we're left trying to decide how we're to make a living in the AI-driven web world.

Yes, of course, we can try to fend them off with logins, paywalls, CAPTCHA challenges, and sophisticated anti-bot technologies. You know one thing AI is good at? It's getting around those walls.

As for robots.txt files, the old-school way of blocking crawlers? Many – most? – AI crawlers simply ignore them.

For example, Perplexity has been accused by Cloudflare of ignoring robots.txt files. Perplexity, in turn, hotly denies this accusation. Me? All I know is I see regular waves of multiple companies' AI bots raiding my site.

There are efforts afoot to supplement robots.txt with llms.txt files. This is a proposed standard to provide LLM-friendly content that LLMs can access without compromising the site's performance. Not everyone is thrilled with this approach, though, and it may yet come to nothing.

In the meantime, to combat excessive crawling, some infrastructure providers, such as Cloudflare, now offer default bot-blocking services to block AI crawlers and provide mechanisms to deter AI companies from accessing their data. Other programs, such as the popular open-source and free Anubis AI crawler blocker, just attempt to slow down their visits to a, if you'll pardon the expression, a crawl.

In the arms race between all businesses and their websites and AI companies, eventually, they'll reach some kind of neutrality. Unfortunately, the web will be more fragmented than ever. Sites will further restrict or monetize access. Important, accurate information will end up siloed behind walls or removed altogether.

Remember the open web? I do. I can see our kids on the Internet, where you must pay cash money to access almost anything. I don't think anyone wants a Balkanized Internet, but I fear that's exactly where we're going.


TOPICS: Business/Economy; Computers/Internet; Conspiracy
KEYWORDS: ai; aidamages; searchengines; webcrawlers

Click here: to donate by Credit Card

Or here: to donate by PayPal

Or by mail to: Free Republic, LLC - PO Box 9771 - Fresno, CA 93794

Thank you very much and God bless you.


Navigation: use the links below to view more comments.
first previous 1-2021-4041-45 next last
To: larrytown

Yep, I am done with Cloudflare. I stop and dump the tab, screw them... I’m not going to support the site and will go elsewhere.


21 posted on 09/01/2025 2:58:46 PM PDT by Openurmind (AI - An Illusion for Aptitude Intrusion to Alter Intellect. )
[ Post Reply | Private Reply | To 17 | View Replies]

To: House Atreides

I’ve never seen a single Freeper who was a Zeeper. I’ve seen posters who don’t like Putin, but I’ve never seen a single one who love Zelensky. Can you prove they exist?


22 posted on 09/01/2025 3:02:18 PM PDT by nickcarraway
[ Post Reply | Private Reply | To 10 | View Replies]

To: Harmless Teddy Bear

Yes, but since you are paying them, they can prove you were actually the one doing it - there are credit card records. It might as well be a signed confession.


23 posted on 09/01/2025 3:03:49 PM PDT by nickcarraway
[ Post Reply | Private Reply | To 9 | View Replies]

To: Harmless Teddy Bear

Do you have one you recommend?


24 posted on 09/01/2025 3:04:11 PM PDT by nickcarraway
[ Post Reply | Private Reply | To 9 | View Replies]

To: TexasGator

But are they any good?


25 posted on 09/01/2025 3:04:55 PM PDT by nickcarraway
[ Post Reply | Private Reply | To 13 | View Replies]

To: larrytown

But how would one get to content like your without a Search Engine?


26 posted on 09/01/2025 3:05:55 PM PDT by nickcarraway
[ Post Reply | Private Reply | To 17 | View Replies]

To: Openurmind

It all makes sense to me - NOW

There are (unspoken) things happening.

but this now makes perfect sense as to
why some of these things are happening.


27 posted on 09/01/2025 3:14:03 PM PDT by SaveFerris (Luke 17:28 ... as it was in the Days of Lot; They did Eat, They Drank, They Bought, They Sold ......)
[ Post Reply | Private Reply | To 19 | View Replies]

To: nickcarraway
Yes, but since you are paying them, they can prove you were actually the one doing it - there are credit card records. It might as well be a signed confession.

Who gives them a credit card?

Buy a visa card with cash. No record.

Use a free, throw away email account. No record.

You are only a product if you want to be.

28 posted on 09/01/2025 3:18:01 PM PDT by Harmless Teddy Bear ( Not my circus. Not my monkeys. But I can pick out the clowns at 100 yards.)
[ Post Reply | Private Reply | To 23 | View Replies]

To: nickcarraway

I don’t get what the bots are looking for. What “content”? What of value are they “raiding”?


29 posted on 09/01/2025 3:23:36 PM PDT by TalBlack (Their god is government. Prepare for a religious war.https://freerepublic.com/perl/post?id=4322961%2)
[ Post Reply | Private Reply | To 1 | View Replies]

To: SaveFerris

Yep... Been warning for a year now, it is the Beast. But the greedy do not care at all how much this tech costs OTHERS. All they care about is themselves and their own selfishness...

Here is a reality the greedy idiots have not thought about. The FR pays for hosting depending on how much Bandwidth traffic is exchanged. Not they are paying a whole bunch extra bot AI bots burning up Bandwidth so that someone else can benefit from it. The FR is paying for someone else’s AI use... A LOT of someone else’s...

The first thing I would do right now is block “Bot” access for everything but just a few popular normal search engine webcrawlers.


30 posted on 09/01/2025 3:27:52 PM PDT by Openurmind (AI - An Illusion for Aptitude Intrusion to Alter Intellect. )
[ Post Reply | Private Reply | To 27 | View Replies]

To: nickcarraway

“But how would one get to content like your without a Search Engine?”

AI and search engines are totally different critters. Search engine bots only hit your site a few times a week. These AI bots are overwhelming sites just like a DDoS attack.

The AI appetite is way too HUGE and aggressive...


31 posted on 09/01/2025 3:30:57 PM PDT by Openurmind (AI - An Illusion for Aptitude Intrusion to Alter Intellect. )
[ Post Reply | Private Reply | To 26 | View Replies]

To: Openurmind

So, the AI is just using the info to crunch. I know some writers are suing AI companies for using their works without permission.


32 posted on 09/01/2025 3:36:13 PM PDT by nickcarraway
[ Post Reply | Private Reply | To 31 | View Replies]

To: TalBlack
They are sucking up every bit of information on your website and using it to make "content" with no filters as to context, (which is how the LLMs tell you that glue and rock are tasty topping for pizza) and no reference as to where the information came from.
33 posted on 09/01/2025 3:44:51 PM PDT by Harmless Teddy Bear ( Not my circus. Not my monkeys. But I can pick out the clowns at 100 yards.)
[ Post Reply | Private Reply | To 29 | View Replies]

To: nickcarraway

“So, the AI is just using the info to crunch. I know some writers are suing AI companies for using their works without permission.”

Exactly right. But the big issue is the huge load it is putting on servers because there is just so much of it from so many sources... The more load on a site, the more it costs the site owner to keep their site up because of that huge extra load. And what do the site owners get? Nothing but added costs and stolen content...

It is really bad and needs some controls put in place so that website owners can defend themselves from being overwhelmed by it...


34 posted on 09/01/2025 3:53:47 PM PDT by Openurmind (AI - An Illusion for Aptitude Intrusion to Alter Intellect. )
[ Post Reply | Private Reply | To 32 | View Replies]

To: nickcarraway

“ I’ve never seen a single Freeper who was a Zeeper.”
*********************************

Ye shall know them by their ZEEPS. They also tend to deny the existence of ZEEPERS and frequently live off of OTHER-PEOPLE’S-MONEY as they seldom carry their share of the weight and contribute.


35 posted on 09/01/2025 4:10:32 PM PDT by House Atreides (I’m now ULTRA-MAGA-PRO-MA)
[ Post Reply | Private Reply | To 22 | View Replies]

To: nickcarraway

I am the IT guy for some large websites. One gets over 120 million monthly pageviews. It gets hammered by AI-related bots, but we’ve got good caching, and know how to deal with bots and such.

AI web crawlers can be a problem for some sites, but not for those with quality hosting.


36 posted on 09/01/2025 4:35:43 PM PDT by Theo (FReeping since 1997 ... drain the swamp.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Theo
Quality comes at a price.

So the big established guys are protected and the little and medium sized guys get hammered.

Now just who would that benefit?

37 posted on 09/01/2025 5:17:33 PM PDT by Harmless Teddy Bear ( Not my circus. Not my monkeys. But I can pick out the clowns at 100 yards.)
[ Post Reply | Private Reply | To 36 | View Replies]

To: bk1000

I agree. The internet is becoming useless, unless you are buying something and even that is becoming a PIA.

I am close to retirement and once I hang up my hat I wil likely stop using a computer except to do my taxes.


38 posted on 09/01/2025 5:28:05 PM PDT by suijuris
[ Post Reply | Private Reply | To 6 | View Replies]

To: Harmless Teddy Bear

Screw the little guys - how is it my problem is their under-capitalized?

/sarc


39 posted on 09/02/2025 5:28:41 AM PDT by larrytown (A Cadet will not lie, cheat, steal, or tolerate those who do. Then they graduate...)
[ Post Reply | Private Reply | To 37 | View Replies]

To: larrytown
Lots of people will say that with a straight face.

And they will also be in favor of prohibiting the smaller websites from using anything but top of the line extremely expensive measures to defend themselves.

It's the "Saturday Night Special" rule. Small cheap guns must be taken off the market because poor people might use them to defend themselves. Yes they said it was to "fight crime" but in reality you do that by locking up criminals. Which they were and are curiously reluctant to do.

If you could not afford to buy a top of the line name brand firearm you obviously did not deserve to carry. The Second Amendment did not apply to you.

40 posted on 09/02/2025 9:51:24 AM PDT by Harmless Teddy Bear ( Not my circus. Not my monkeys. But I can pick out the clowns at 100 yards.)
[ Post Reply | Private Reply | To 39 | View Replies]


Navigation: use the links below to view more comments.
first previous 1-2021-4041-45 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson