PerimeterX is a website security service that takes care of malicious bot attacks while preserving the user experience uninterrupted. Unfortunately, its advanced protective measures also deny access to crawlers and scrapers.
Fortunately, there’s a way to bypass PerimeterX to get the data you need, and we’ll show you how. But first, let’s learn more about the problem at hand.
What Is PerimeterX and How It Works
PerimeterX is a cybersecurity company that provides solutions to protect websites, APIs, and microservices from various types of automated attacks, like Account Takeover (ATO) and Distributed Denial-of-Service (DDoS).
It works by deploying an AI-based platform to detect and block suspicious activity in real time. It identifies and mitigates threats by combining active and passive server and client-side techniques.
These anti-bot measures stay in the way of your scraper, but you can get around them and ensure your access with the steps outlined below.
How to Bypass PerimeterX
The best course of action is to implement a ready-to-use PerimeterX bypass tool such as ZenRows.
Alternatively, a key process is to reverse engineer its client-side bot detection script to understand its internals. Let’s see how to do that:
- Analyze the Network log. By sending and analyzing GET and POST requests, you can learn much about PerimeterX’s behavior, but more is needed.
- Deobfuscate the PerimeterX JavaScript challenge. That way, you can discover what checks the client-side bot detection performs and how to replicate the challenge-solving behavior.
- Analyze the obfuscated script. You’ll need to figure out how the payload is encrypted and how the PerimeterX cookies are set. You can also learn how to avoid WebGL fingerprinting, automated browser checks, sandboxing checks, and user input event tracking.
That will help you to avoid PerimeterX bot detection measures. Additionally, you can take other steps as well. Here are a few examples:
- Behavioral analysis: With the help of machine learning algorithms, PerimeterX can distinguish bots from humans based on their behavioral patterns. If you employ a headless browser like Selenium, you can automate tasks like clicking buttons and filling out forms to effectively simulate human-like interactions with the site.
- IP filtering: PerimeterX maintains a database of IPs associated with bots, datacenters, proxies, and VPNs. It assigns a score to every IP visiting a protected website and blocks requests based on that. The solution is using premium rotating proxies that provide residential IPs to avoid raising suspicion.
- Fingerprinting and blacklisting: Techniques like canvas fingerprinting allow PerimeterX to identify bots, even when using different IPs, and add them to a blacklist for future reference. The way to handle this defense is to collect data from real users’ devices and inject it into your scraper. However, you’ll need a lot of device data to avoid raising red flags.
- Checking HTTP request headers: Bots usually have non-browser request headers that give them away. PerimeterX can easily detect those and prevent access to a website. That’s why you’ll have to create an array of real HTTP headers to rotate when scraping.
As you can see, bypassing PerimeterX requires a lot of time, resources, and technical knowledge. You can try using public libraries like Puppeteer Stealth, but as their source code is openly accessible, PerimeterX likely uses it to update its defenses.
And yet, you can make use of private software designed to bypass such security measures. For example, a web scraping API with an advanced anti-bot bypass toolkit that can handle all PerimeterX detection techniques.
With premium residential proxies, randomized HTTP request headers, JavaScript rendering, and other great features, you won’t have to worry about being blocked again.
Conclusion
Bypassing PerimeterX is something you’ll likely have to do at some point, as it’s a security service many websites employ. If not handled properly, your scraper will get detected and blocked because of the advanced anti-bot measures PerimeterX uses.
Furthermore, creating a custom bypassing solution on your own can take a lot of time and effort, so your safest bet is going for tried-and-tested software. ZenRows will get you the data from any PerimeterX-protected website with a single API call. Use the 1,000 free API credits you get upon registration to check it out.