Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

AI Web Crawlers Are Destroying Websites in Their Never-Ending Hunger for Any and All Content
The Register ^ | Steven J. Vaughan-Nichols

Posted on 09/01/2025 1:41:35 PM PDT by nickcarraway

Opinion With AI's rise, AI web crawlers are strip-mining the web in their perpetual hunt for ever more content to feed into their Large Language Model (LLM) mills. How much traffic do they account for? According to Cloudflare, a major content delivery network (CDN) force, 30% of global web traffic now comes from bots. Leading the way and growing fast? AI bots.

Cloud services company Fastly agrees. It reports that 80% of all AI bot traffic comes from AI data fetcher bots. So, you ask, "What's the problem? Haven't web crawlers been around since 1993 with the arrival of the World Wide Web Wanderer in 1993?" Well, yes, they have. Anyone who runs a website, though, knows there's a huge, honking difference between the old-style crawlers and today's AI crawlers. The new ones are site killers.

Fastly warns that they're causing "performance degradation, service disruption, and increased operational costs." Why? Because they're hammering websites with traffic spikes that can reach up to ten or even twenty times normal levels within minutes.

Moreover, AI crawlers are much more aggressive than standard crawlers. As the InMotionhosting web hosting company notes, they also tend to disregard crawl delays or bandwidth-saving guidelines and extract full page text, and sometimes attempt to follow dynamic links or scripts.

The result? If you're using a shared server for your website, as many small businesses do, even if your site isn't being shaken down for content, other sites on the same hardware with the same Internet pipe may be getting hit. This means your site's performance drops through the floor even if an AI crawler isn't raiding your website.

Smaller sites, like my own Practical Tech, get slammed to the point where they're simply knocked out of service. Thanks to Cloudflare Distributed Denial of Service (DDoS) protection, my microsite can shrug off DDoS attacks. AI bot attacks – and let's face it, they are attacks – not so much.

Even large websites are feeling the crush. To handle the load, they must increase their processor, memory, and network resources. If they don't? Well, according to most web hosting companies, if a website takes longer than three seconds to load, more than half of visitors will abandon the site. Bounce rates jump up for every second beyond that threshold.

So when AI searchbots, with Meta (52% of AI searchbot traffic), Google (23%), and OpenAI (20%) leading the way, clobber websites with as much as 30 Terabits in a single surge, they're damaging even the largest companies' site performance.

Now, if that were traffic that I could monetize, it would be one thing. It's not. It used to be when search indexing crawler, Googlebot, came calling, I could always hope that some story on my site would land on the magical first page of someone's search results so they'd visit me, they'd read the story, and two or three times out of a hundred visits, they'd click on an ad, and I'd get a few pennies of income. Or, if I had a business site, I might sell a widget or get someone to do business with me.

AI searchbots? Not so much. AI crawlers don't direct users back to the original sources. They kick our sites around, return nothing, and we're left trying to decide how we're to make a living in the AI-driven web world.

Yes, of course, we can try to fend them off with logins, paywalls, CAPTCHA challenges, and sophisticated anti-bot technologies. You know one thing AI is good at? It's getting around those walls.

As for robots.txt files, the old-school way of blocking crawlers? Many – most? – AI crawlers simply ignore them.

For example, Perplexity has been accused by Cloudflare of ignoring robots.txt files. Perplexity, in turn, hotly denies this accusation. Me? All I know is I see regular waves of multiple companies' AI bots raiding my site.

There are efforts afoot to supplement robots.txt with llms.txt files. This is a proposed standard to provide LLM-friendly content that LLMs can access without compromising the site's performance. Not everyone is thrilled with this approach, though, and it may yet come to nothing.

In the meantime, to combat excessive crawling, some infrastructure providers, such as Cloudflare, now offer default bot-blocking services to block AI crawlers and provide mechanisms to deter AI companies from accessing their data. Other programs, such as the popular open-source and free Anubis AI crawler blocker, just attempt to slow down their visits to a, if you'll pardon the expression, a crawl.

In the arms race between all businesses and their websites and AI companies, eventually, they'll reach some kind of neutrality. Unfortunately, the web will be more fragmented than ever. Sites will further restrict or monetize access. Important, accurate information will end up siloed behind walls or removed altogether.

Remember the open web? I do. I can see our kids on the Internet, where you must pay cash money to access almost anything. I don't think anyone wants a Balkanized Internet, but I fear that's exactly where we're going.


TOPICS: Business/Economy; Computers/Internet; Conspiracy
KEYWORDS: ai; aidamages; searchengines; webcrawlers

Click here: to donate by Credit Card

Or here: to donate by PayPal

Or by mail to: Free Republic, LLC - PO Box 9771 - Fresno, CA 93794

Thank you very much and God bless you.


Navigation: use the links below to view more comments.
first 1-2021-4041-43 next last
This seems hard for me to believe, given how shallow today's Search Engines are. 15 years ago, you could really dig down, but now it seems like all Search Engines just show a few results, and it's hard to dig down deeper than the most common things. And remember when they used to have a cache? Where did that go?


If anyone knows differently please let me know. But Search Engines today seem to have about 40% of the value they used to have. Does anyone know of one that actually allows you to search?

1 posted on 09/01/2025 1:41:35 PM PDT by nickcarraway
[ Post Reply | Private Reply | View Replies]

To: nickcarraway
Free search engines are a thing of the past.

You are going to have to pay for service now.

2 posted on 09/01/2025 1:45:52 PM PDT by Harmless Teddy Bear ( Not my circus. Not my monkeys. But I can pick out the clowns at 100 yards.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: nickcarraway

https://en.wikipedia.org/wiki/Dead_Internet_theory


3 posted on 09/01/2025 1:47:34 PM PDT by ClearCase_guy (Society has no reward for following the rules any more)
[ Post Reply | Private Reply | To 1 | View Replies]

To: nickcarraway

Oh absolutely believe it. No different than a DDoS attack... In fact a friend is battling it right now... It is getting hammered so fast and so much the security will not let it stay up for safety reasons... It is useless because of the massive amount of hits all at once...


4 posted on 09/01/2025 1:48:03 PM PDT by Openurmind (AI - An Illusion for Aptitude Intrusion to Alter Intellect. )
[ Post Reply | Private Reply | To 1 | View Replies]

To: Harmless Teddy Bear

Which ones do you pay for?


5 posted on 09/01/2025 1:49:50 PM PDT by nickcarraway
[ Post Reply | Private Reply | To 2 | View Replies]

To: nickcarraway

The bots will scream at each other while we watch cat videos. The internet will ultimately consume itself and we will be done with it. I’ll miss y’all, but I think it’s doing more harm than good.


6 posted on 09/01/2025 1:52:06 PM PDT by bk1000 (Banned from Breitbart)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Harmless Teddy Bear

If you pay for a search engine, they will be tracking you a lot more.


7 posted on 09/01/2025 1:52:49 PM PDT by nickcarraway
[ Post Reply | Private Reply | To 2 | View Replies]

To: nickcarraway

A true AI, “hungry for any and all content” would quickly ferret out the Globull Climate Hoax.

Whatever they are labeling AI today, it is programmed, and the programmer(s) are human.

HIGHLY recommend: The Myth of Artificial Intelligence by someone on the ground floor, Erik J. Larson


8 posted on 09/01/2025 2:00:25 PM PDT by Ronaldus Magnus III (Do, or do not, there is no try )
[ Post Reply | Private Reply | To 1 | View Replies]

To: nickcarraway
Actually they track you less.

I know that seems counter intuitive but they already are getting paid, by you. You just choose one that agrees not to track you and erases your data at the end of the session.

When you use the free ones they have a vested interest in sucking every bit of money they can out of selling their information on you to every buyer they can find.

If you can not tell what is being sold, you are the product.

9 posted on 09/01/2025 2:00:54 PM PDT by Harmless Teddy Bear ( Not my circus. Not my monkeys. But I can pick out the clowns at 100 yards.)
[ Post Reply | Private Reply | To 7 | View Replies]

To: nickcarraway

“ Anyone who runs a website, though, knows there’s a huge, honking difference between the old-style crawlers and today’s AI crawlers. The new ones are site killers.”
************************************

Kind of like incessant ZEEPER posters.


10 posted on 09/01/2025 2:04:06 PM PDT by House Atreides (I’m now ULTRA-MAGA-PRO-MA)
[ Post Reply | Private Reply | To 1 | View Replies]

To: nickcarraway

Fine. I hope AI likes clickbait.


11 posted on 09/01/2025 2:06:28 PM PDT by Fledermaus ("It turns out all we really needed was a new President!")
[ Post Reply | Private Reply | To 1 | View Replies]

To: Openurmind

[Oh absolutely believe it. No different than a DDoS attack... In fact a friend is battling it right now...]

That sounds correct...


12 posted on 09/01/2025 2:07:43 PM PDT by SaveFerris (Luke 17:28 ... as it was in the Days of Lot; They did Eat, They Drank, They Bought, They Sold ......)
[ Post Reply | Private Reply | To 4 | View Replies]

To: Harmless Teddy Bear

“Free search engines are a thing of the past.”

GEMINI:

No, many free search engines exist, including DuckDuckGo, Brave, and Startpage. Some, like Kagi, offer paid versions.


13 posted on 09/01/2025 2:07:54 PM PDT by TexasGator (1The 750 hp Florida Gnat)
[ Post Reply | Private Reply | To 2 | View Replies]

To: Ronaldus Magnus III

People should put fake crap and lies out. Confuse it like Norman in the Star Trek OS episode “I Mudd”.

You ask it to show an elephant and it might come back with a giraffe!


14 posted on 09/01/2025 2:09:03 PM PDT by Fledermaus ("It turns out all we really needed was a new President!")
[ Post Reply | Private Reply | To 8 | View Replies]

To: nickcarraway

Next up - Bot Blockers. “Nive website you got here; be a shame if anything happened to it.”


15 posted on 09/01/2025 2:15:08 PM PDT by Bernard (Issue an annual budget. And Issue a federal government balance sheet. Let's see what we got.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: nickcarraway
80% of all AI bot traffic comes from AI data fetcher bots.

Gonna need a biologist to figure out all this.

16 posted on 09/01/2025 2:18:27 PM PDT by Libloather (Why do climate change hoax deniers live in mansions on the beach?)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Openurmind

Dittos - I’ve had to shut down a semi-dormant site that had many years of content in forum structures. These AI bots slam servers, offer nothing of value to the resource owners.

Simply put, they steal content and offer no useful purpose in return. This will kill the internet as a resource, and make it just another way to control and tax information.

Even the Cloudflare solutions are limited, but it’s a start - truly amazing to see where the Cloudflare interstitial screens are showing up.

Good opportunity for some clever computer engineers to make The Next Big Thing - something that can control AI and it’s amoral actions.


17 posted on 09/01/2025 2:41:57 PM PDT by larrytown (A Cadet will not lie, cheat, steal, or tolerate those who do. Then they graduate...)
[ Post Reply | Private Reply | To 4 | View Replies]

To: All

AND they are making it miserable for us real people too.

I’d say that 20% of the sites I try to visit, (and increasing daily) for research, entertainment or commerce have some sort of ‘bot’ filter. that want me to ‘verify that I am human’, turn off my VPN, lower my ‘shields’, and let THEM run THEIR spybots on my computer.

Unless it is something really important, I just move on to the next site that offers what I’m looking for without the verification page.

I’d estimate that the ‘prove you are human’ website sentries have cost various online sellers a thousand dollars of my money not being spent with them, and who knows how many dollars of ad money the information / entertainment websites have lost because I didn’t click through all their nonsense.

And my tin-hated alter ego says this is just one giant step along the path to being required to have a Digital ID and a social credit score to use the internet at all, and not really that AI is costing website owners millions of dollars.


18 posted on 09/01/2025 2:46:19 PM PDT by LegendHasIt
[ Post Reply | Private Reply | To 1 | View Replies]

To: SaveFerris

A normal webcrawler only hits sites a few times a week... This AI in overwhelming the system bad... and the only way to keep it out is to shut off all your SEO bot access which is not at all good for business.

This needs to stop... It is destroying the internet...


19 posted on 09/01/2025 2:51:37 PM PDT by Openurmind (AI - An Illusion for Aptitude Intrusion to Alter Intellect. )
[ Post Reply | Private Reply | To 12 | View Replies]

To: larrytown

Yes, see #19...


20 posted on 09/01/2025 2:52:41 PM PDT by Openurmind (AI - An Illusion for Aptitude Intrusion to Alter Intellect. )
[ Post Reply | Private Reply | To 17 | View Replies]


Navigation: use the links below to view more comments.
first 1-2021-4041-43 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson