HUMAN BLOG

How Content Scraping and Agentic AI Seeds Low-Quality Sites and Drains Publisher Revenue

Read time: 5 minutes

Adam Sell

July 10, 2025

Agentic AI, AI, Automated Threats, content scraping, Research & Detection

Web scraping is a substantial threat to many organizations, especially those in the travel/hospitality and retail/ecommerce spaces. But scraping attacks aren’t limited to those industries. Indeed, the rate of scraping attacks on streaming and media businesses, including publishers, has risen steadily for the last three years.

The recent Quadrillion Report: 2025 Cyberthreat Benchmarks uncovered that the rate of scraping attacks on streaming and media has jumped 56% year over year, and now accounts for 16.37% of all scraping attempts observed by the Human Defense Platform. One HUMAN customer in the industry faced hundreds of millions of scraping attempts per month across its many digital properties.

What might be more interesting than the number, though, is why

banner image

From Scrape to Scam: How Content and IP Theft fuels Made-For-Advertising Sites

Here’s the threat model from 10,000 feet: 

The true travesty of scraping media sites is that it turns legitimate, high-quality content into made-for-advertising (MFA) fodder. 

Threat actors use scraped articles, headlines, or media assets to republish on low-quality, ad-saturated, and maliciously formatted affiliate sites, as well as to spoof legitimate publishers. The whole purpose of MFA sites is to generate impressions, not quality engagement. 

And this isn’t a small threat, either. The Human Defense Platform observed 1.68 trillion MFA-associated bid requests in the last 30 days. 

The Business Impacts of Content Scraping

There are significant tangible business consequences for scraped content that fuels MFA sites. Those  include:

Even worse, these consequences can prove challenging to recover from in a timely fashion for legitimate publishers. Websites take time to build up both search engine optimization and traffic. This means the fallout from having your content scraped for MFA sites can be fast-hitting and long-lasting.

What about AI and Content Scraping?

While many MFA schemes are built manually by threat actors, HUMAN researchers have observed the use of AI in building scam-focused websites based on scraped data. Just last year, the Phish ‘n’ Ships operation uncovered by HUMAN’s Satori Threat Intelligence and Research Team centered on pages that researchers believe were created using data scraped from legitimate websites.

Where it gets scary is when malicious agentic AI enters the equation. In this scenario, the threat model looks more like:

And this threat model is doable today. But it all starts with scraped content, as the only way for the threat actor behind this scheme to avoid simple detection measures is for their pages to have meaningful content on them. No content, no fraudster joy.

To be clear, though, the majority of AI agents are not malicious. Publishers may want to allow some AI agents to crawl their sites to surface content to potential consumers. And there’s an option for publishers to monetize access to their content by AI agents via pay-per-crawl solutions such as HUMAN’s integration with TollBit, which introduces a token system through which agents can pay for the content they access. 

Actionable Defenses

This does not mean, however, that publishers must remain at the mercy of fraudsters stealing their content for MFA sites. There are several mitigation tactics publishers can employ:

In a similar fashion to how the Human Defense Platform distinguishes good bots from bad bots, the platform can also distinguish beneficial AI agents from malicious AI agents. Allowing beneficial agents while blocking malicious ones can help a business embrace agentic AI and gain an edge on their competitors.

Conclusion: A Hidden Leak in the Revenue Funnel

Scraping isn’t a future threat. It’s a present drain on publishers’ finances, intellectual property, and reputation. And the increasing infusion of AI into the mix will only serve to exacerbate these consequences. 

HUMAN can surface and stop scraping attacks in real-time. HUMAN’s Scraping Defense protects web and mobile applications from web scraping, providing the highest level of detection accuracy for even the most sophisticated scraping bot and agentic AI-powered attacks.

Get visibility and control over AI agents and agentic browsers on your website.
Spread the Word