Why is Google fighting scraping?

Thursday the 16th, between midnight and 1 AM, the reports from Oseox RANKING became erroneous, and I quickly created a bug ticket for my favorite developer.

Upon waking up on Thursday morning, my WhatsApp and emails logically contained several messages from users who were alarmed upon receiving the night’s reports.

(To clarify, as I’ve seen this hypothesis mentioned, the issue is not related to the `num=100` parameter being targeted first. We’ve always offered scraping without this parameter, and that didn’t stop Google from shutting the door.)

This blockage will, of course, also impact our indexing check on Oseox LINK and Oseox INDEX (but not the forcing feature).

Incidentally, we had just changed our methods for LINK and have been working for months on developing a new tool that heavily relies on scraping Google.

So, the year is off to a GREAT start. 🙂

The following morning, we also observed that all Google scraping professionals were affected. This, of course, is far worse than just a simple bug to fix.

Many years ago, Google wasn’t very kind to scrapers, but eventually opened the floodgates wide, despite scraping always being officially prohibited (some tools need to remember that they were blocked in 2012, for example…).

google scraping

We can see the cat-and-mouse game has begun, as between Thursday and Friday, some solutions are already no longer working… This suggests a long struggle ahead.

It will always be possible to scrape Google in one way or another. The problem will be the speed and cost, which will inevitably have consequences.

Because scraping slower+less and scraping fast+more are worlds apart.

It’s like Alerting/Monitoring. Detecting a TITLE change on one URL once a day is entirely different from doing it intensively and massively.

Scraping without JS or with JS is far from trivial. If even Google does it without most of the time, it’s because the implications are significant…

Does Google want to kill SEO tools?

I think they don’t care much; we don’t represent much.

However, that could be an option.

Google’s Search team has been particularly mistreated on Twitter by SEOs following years of revelations about “lies” from the Mountain View giant.

In the end, their representative closest to SEOs, John Mueller, outright left Twitter.

Could this be a small revenge? Unlikely 🙂

Who does Google want to fight?

Google is facing a medium-term existential threat. They themselves announced a “code red.” ChatGPT broke all world records for the fastest adoption of a new product.

Ad revenues won’t grow endlessly, OpenAI and others are lurking. The cloud is driving growth, but for how long?

Some users have already completely changed their habits.

You’ve undoubtedly heard about Nvidia, whose market valuation exceeded Google’s in just one year. How? By selling ultra-powerful, innovative chips (GPUs). Purchase orders worth tens of billions have been confirmed by all major players.

For example, xAI (Twitter/Grok/Musk). They recently bought and assembled over 100,000 Nvidia GPUs in just 122 days.

They’ve thus announced building the largest “AI supercomputer” in the world.

For search and AI, it’s quite useful to have all the data 🙂 and therefore to have an index similar to Google’s.

Google has built its index over more than 25 years. Catching up on all that quickly is a race. Crawling is essential. It’s really a race.

If you analyze your site logs, you’ve probably already noticed that OpenAI’s bots sometimes behave like hooligans, respecting nothing.

Scraping Google can be very useful for guiding one’s crawl and/or simply building a competitor, validating/verifying quality, etc. Dataset providers and training solution providers may have an intense need for data, often passing through Google. Blocking all this slows down competitors who rely on these services.

This makes me lean toward this hypothesis as the primary reason: blocking AI/Search competitors.

Cost-saving?

For years, I’ve talked extensively on my french blog about indexing issues.

Google has clearly changed its approach, and one reason could simply be cost-saving.

Giving away its SERPs for free to paid services isn’t very beneficial for Google, and at the end of the month, it’s Google footing the electricity bill.

This anti-scraping move could simply be the continuation of this new approach.

Towards a Google API?

A Google API that directly and quickly provides accurate results at a competitive price could be ideal for Google.

But it would kill many SEO tools that have developed significant expertise and invested in their infrastructure for scraping.

However, this is likely a market too small to really interest our best friend/enemy.

The moral

Who is the biggest scraper in the world over the past 25 years and who wouldn’t exist without scraping? Google.

It’s food for thought…

Good luck to everyone in this sector.

Aurélien Bardon