HUMAN Blog

API Bot Attacks: The Hidden Threat to Application Security

By HUMAN

The primary way that applications and websites talk to each other is becoming a prime target for hackers. We’re talking about APIs, or application programming interfaces. Almost every application today has them. According to Programmable Web, there are over 20,000 public APIs available from different websites and applications. There are likely far more than that. Counting the internal APIs that are increasingly used by companies using microservice and server-less infrastructures, APIs now likely number in the tens of millions or more. It is not an overstatement to say that hundreds of billions of dollars in online business relies on APIs to smoothly function.

Why API Bot Attacks are a Big Problem for E-commerce Businesses

This growing ubiquity makes APIs a juicy target for malicious hackers trying to exploit weaknesses in these connection points. In particular, API attacks use bot networks to execute account takeover (ATO) and carding attacks, scrape content, and disrupt e-commerce security. In our research, we found that, on many websites and applications, more than 75% of login requests from API endpoints are malicious. On some applications, as much as 20% of all product page API requests are malicious. Overall, 10-15% or more of all API requests come from malicious sources. This is a massive shift in how attacks are happening. According to our research, API attacks are growing quickly in volume and intensity across a wide range of applications in different sectors of e-commerce and media.

The growth in API attacks is driven by the simple fact that they are easier and more economical to mount, while being harder to detect than legacy browser-based botnet attacks. Ultimately, as applications move towards even greater usage of APIs, the way your organization secures its APIs against bot attacks will become a linchpin of your online security efforts. This is particularly important because API attacks are rapidly growing more sophisticated. Developers formerly could count on rate-limiting API access, blocking protocols that were likely evidence of malicious activity or other simple tricks to stop attackers. Smarter fraudsters are now leveraging the full power of cloud computing and distributed networks to mount attacks that are both harder to detect and are constantly evolving. This creates an incredibly dynamic attack landscape with hourly changes and a high-speed cat-and-mouse game running between attackers and application security teams.

What is an API?

Web APIs are open endpoints that expose the functionality of applications to the outside world and allow developers to easily interact with applications without having to write customized code or to have a deep understanding of the applications’ structure. For example, the Google Maps API allows developers to ask geospatial questions or import the functionality of Google Maps into an external application. The Stripe API allows any developer to add Stripe payment gateway functionality to a web or mobile application with a few lines of code and serves as a secure connection back into the Stripe infrastructure.

Some APIs require that developers sign up for an API key to gain access to the API. Many APIs are entirely open because the organization publishing the API does not want to discourage usage. The key point, however, is that APIs are supposed to be open and easy to access to make it easier to interact with or consume information and data that an organization wants the world to access.

Many e-commerce businesses use APIs that serve both internal and external purposes. For example, an e-commerce vendor may have a single API with pricing and product data that provides data for the company’s website, mobile application, widgets for affiliate networks, third-party reseller sites, and for good bots like search engine spiders for Google Shopping. To properly secure an API, an application must be able to actively assess whether an API is good, bad or unknown. Good and bad requests can look very similar. For example, a legitimate request for pricing and availability data from Google Shopping could be asking for the exact same information as a malicious group mounting a scraping attack to probe price changes and inventory of competitors. Because API attacks evolve so quickly, the proper treatment of each API request must be determined dynamically in real-time rather than through a set of rigid rules.

Why It’s So Hard to Spot and Stop API Bot Attacks

Unlike requests that have to go through browsers or native app agents, APIs can serve as a direct pipeline into specific resources and actions. This makes them a very attractive attack vehicle for carding, credential stuffing and ATO, scraping, and other types of attacks. APIs are also harder to defend with traditional means because there are far fewer clues as to whether an API call is legitimate or malicious than traditional browser requests. More specifically, with API attacks, bots are requesting the same information as they would via a browser attack but they offer no information about the browser agent or version, device type, cookies, and other data that can be useful in spotting bot attacks. Because API attacks tend to be entirely virtual, they can be easily spun up, spun down and moved from one cloud vendor to another with a rotating array of IP addresses and proxy networks for obfuscation.

For these reasons, resources required to mount API attacks are also much less than those required for browser attacks. Common browser bot attacks use “headless” browsers - browsers which are executed via command line and can run JavaScript, mimicking human behaviour. Headless browsers are usually more expensive to use in attacks, thus APIs allow attackers to use widely available, basic and less expensive capabilities.

In addition, mounting bot attacks against individual mobile applications requires significant effort; each application is different and might require different bot capabilities. Attacking mobile APIs, however, is simple and can leverage the same infrastructure and attack mechanisms as attacking direct APIs and web APIs. What’s more, mobile apps are where traffic is growing the fastest. So mobile APIs become a better environment to conceal bad behavior than far more lightly used website versions of food delivery applications like DoorDash and Caviar, e-learning apps like Quizlet, or mobility apps like Uber and Lyft.

In many cases, APIs also allow attackers to get closer access to the core application infrastructure. When an e-commerce company has a unified API that it uses to present pricing information or log-in credentials across web and mobile applications, then this usually means that the attacker is one step removed from getting access to very critical assets.

The upshot? API attacks are easier to mount, require less resources, and can be much harder to detect.

How to Stop API Attacks

Unfortunately, traditional methods for blocking web attacks are insufficient for blocking API attacks in real time. Web Application Firewalls (WAFs) use static methods like rate limiting API calls, blocking requests from unexpected protocols, and looking for attack signatures. WAFs cannot analyze dynamic real-time behaviors and signals. For this reason, WAFs often end up blocking legitimate traffic or allowing malicious traffic. Newer API bots are easily able to evade WAFs and traditional signature-based detection mechanisms. To beat API bots you need a new defensive methodology driven by machine learning, sophisticated behavior modeling, and a constant real-time feedback loop. We describe this methodology as “Collect, Detect, Mitigate, Learn.”

Collect the Signals and Build the Models

The first step is to collect behavioral, network and other fingerprints from normal users as baseline to detect API bot behaviors in runtime. These can include signals from how real users behave, what their Web API traffic throws off, cookie analysis (and their absence), and signals from mobile applications such as mobile IDs and application tokens. For direct APIs, you need to look for signals in the network such as network response times and patterns, network fingerprinting, and evidence of obfuscation techniques such as using proxy networks. These signals should be combined with internal and external reputation feeds to evaluate the likelihood that a call is coming from a good user or a good bot rather than a malicious bot. Lastly, you must include feedback loops that are application specific - such as changes in conversion rates, log-in success rates, and traffic volumes to product pages, to name a few. All of this data can be used to build robust models of what is good, bad and unknown API traffic. It is crucial that these models be flexible and have the ability to incorporate data in real-time to block dynamic and constantly evolving API attacks.

Detect Bots by Processing API Request Signals

With the model, you can detect malicious API bots by continuously processing signals given off by each API request. You will need to use advanced machine learning and behavioral analytics built to respond at web scale and in real time. The detection model will be constantly comparing behaviors and signals to those of real user signals, and assigning a risk score to each API request. This allows website and application operators and security teams to spot anomalies and create an accurate confidence interval on API calls.

Mitigate Bad Bots Instantly

When a malicious request is detected at a high confidence interval, your system should block the request before it accesses the API and extracts any information in return. This decision must happen in milliseconds to not make real users wait. Broadly, we break the proper response types into four groups:

  • Hard block - block the user and terminate their session
  • Allowlist - determine the user or the bot is legitimate and allow them in
  • Rate limit - protect application performance by setting limits of how many time a partner bot or good bot can access an API
  • Redirect - send an API request to a specific URL for further instructions or further user actions
  • More actions can be applied like slow and delay the user (instead of blocking) and monitor the activity

You can also take steps to pull more information out of the API. For example, “honey pots” can present information that is hidden from normal users. Only malicious APIs would see them and seek to access them. Alternatively, smart, dynamic application security platforms can dynamically append data to requests with instructions for how the application should behave based on the analysis of the request.

Learn Continuously, Update Constantly

For this methodology to work, you need to continuously update models of what bad API behavior looks like. This is the only way to constantly improve bot detection and accuracy. Achieving this is only possible with dynamic models that ingest data in real time and modify the model to take each new finding into account. This is the domain of continuous machine learning systems that were, until a few years back, too computationally intensive and difficult to run as real-time feedback loops.

As API attacks continue to increase and the speed at which they evolve accelerates, protecting online applications will require far more agility and speed than traditional security tools can deliver. It also requires a far more dynamic model that incorporates continuous learning to spot and stop API attacks before they happen, with extremely high levels of accuracy. The only way to do this effectively is with machine learning and a flexible, adaptable methodology that can handle real-time detection and mitigation without users even noticing.