Web scraping is a substantial threat to many organizations, especially those in the travel/hospitality and retail/ecommerce spaces. But scraping attacks aren’t limited to those industries. Indeed, the rate of scraping attacks on streaming and media businesses, including publishers, has risen steadily for the last three years.
The recent Quadrillion Report: 2025 Cyberthreat Benchmarks uncovered that the rate of scraping attacks on streaming and media has jumped 56% year over year, and now accounts for 16.37% of all scraping attempts observed by the Human Defense Platform. One HUMAN customer in the industry faced hundreds of millions of scraping attempts per month across its many digital properties.
What might be more interesting than the number, though, is why.
From Scrape to Scam: How Content and IP Theft fuels Made-For-Advertising Sites
Here’s the threat model from 10,000 feet:
- The threat actor sends scraping-trained bots out to scrape publishers for high-value content.
- Stolen content is loaded onto a threat actor-owned website that’s chock-full of ads.
- Threat actor sends different, human-like bots to visit the website and load the ads, cashing out on the stolen content.
The true travesty of scraping media sites is that it turns legitimate, high-quality content into made-for-advertising (MFA) fodder.
Threat actors use scraped articles, headlines, or media assets to republish on low-quality, ad-saturated, and maliciously formatted affiliate sites, as well as to spoof legitimate publishers. The whole purpose of MFA sites is to generate impressions, not quality engagement.
And this isn’t a small threat, either. The Human Defense Platform observed 1.68 trillion MFA-associated bid requests in the last 30 days.
The Business Impacts of Content Scraping
There are significant tangible business consequences for scraped content that fuels MFA sites. Those include:
- Lost organic traffic due to duplicate content penalties
- Ad revenue siphoning through fraudulent impressions
- Brand dilution as real content appears on disreputable MFA clones
- Server cost spikes from constant bot hits
Even worse, these consequences can prove challenging to recover from in a timely fashion for legitimate publishers. Websites take time to build up both search engine optimization and traffic. This means the fallout from having your content scraped for MFA sites can be fast-hitting and long-lasting.
What about AI and Content Scraping?
While many MFA schemes are built manually by threat actors, HUMAN researchers have observed the use of AI in building scam-focused websites based on scraped data. Just last year, the Phish ‘n’ Ships operation uncovered by HUMAN’s Satori Threat Intelligence and Research Team centered on pages that researchers believe were created using data scraped from legitimate websites.
Where it gets scary is when malicious agentic AI enters the equation. In this scenario, the threat model looks more like:
- A custom AI-powered agent is instructed to build out a new MFA site based on celebrity gossip.
- The agent searches for news articles about [insert celebrity couple du jour here] and grabs the headline/copy/images.
- The agent alters the copy and images to avoid detection.
- Using a threat actor-provided web registrar login, the agent creates a new site and generates web pages using the stolen copy.
- The agent embeds numerous ad slots on these pages, generates programmatic ads for the pages, and loads them into a threat-actor provided supply-side platform (SSP).
- And finally, the agent builds ads promoting the new pages and feeds them into content recommendation engines for other threat actor-owned sites.
And this threat model is doable today. But it all starts with scraped content, as the only way for the threat actor behind this scheme to avoid simple detection measures is for their pages to have meaningful content on them. No content, no fraudster joy.
To be clear, though, the majority of AI agents are not malicious. Publishers may want to allow some AI agents to crawl their sites to surface content to potential consumers. And there’s an option for publishers to monetize access to their content by AI agents via pay-per-crawl solutions such as HUMAN’s integration with TollBit, which introduces a token system through which agents can pay for the content they access.
Actionable Defenses
This does not mean, however, that publishers must remain at the mercy of fraudsters stealing their content for MFA sites. There are several mitigation tactics publishers can employ:
- Rate limiting restricts the number of authentication requests per unit of time to combat the high volume of MFA prompts during an attack.
- Behavior fingerprinting enables significantly greater accuracy in distinguishing between actual human behavior and malicious bot activity.
- Bot detection that doesn’t impact real readers helps distinguish between bots and human activity, allowing publishers to make informed decisions based on accurate data for real human visitors.
- Watermarking or content cloaking for syndication-only elements embeds a unique identifier, such as a logo, text, or pattern, into an asset to establish ownership and prevent unauthorized use.
- Charge bots to scrape content by sending traffic to a paywall that only allows content access if it abides by your paid access policies.
In a similar fashion to how the Human Defense Platform distinguishes good bots from bad bots, the platform can also distinguish beneficial AI agents from malicious AI agents. Allowing beneficial agents while blocking malicious ones can help a business embrace agentic AI and gain an edge on their competitors.
Conclusion: A Hidden Leak in the Revenue Funnel
Scraping isn’t a future threat. It’s a present drain on publishers’ finances, intellectual property, and reputation. And the increasing infusion of AI into the mix will only serve to exacerbate these consequences.
HUMAN can surface and stop scraping attacks in real-time. HUMAN’s Scraping Defense protects web and mobile applications from web scraping, providing the highest level of detection accuracy for even the most sophisticated scraping bot and agentic AI-powered attacks.