Blocking AI bots from your website

AI is becoming something of an epidemic on the internet for small website operators like community forums. AI crawler bots wholeheartedly ignore the gentlemen’s agreement set forth in robots.txt and perform what amounts to DDoS-attacks on unprepared websites. I’m working on a tool that will feed my firewalls with blocklists that will exclude these bots.

I named my tool “Blockafeller”, and it’s aimed at allowing a firewall to block things that are not IP-addresses. Currently that means DNS names and AS numbers.

What’s an AS number? Large businesses and network operators connect to the internet autonomously. That means they carry their own responsiblity for routing IP traffic to, from and through their networks using the BGP protocol. The details of BGP are irrelevant here, but it’s useful to know that all such network operators have unique numbers that identify them.

Let’s say you want to block the Ali Baba cloud environment through your firewall. It has a massive number of IP-addresses, so where to begin? Fortunately the whole cloud platform has a single AS number 24429 that you can look up. Blockafeller does this for you, listing out all IP space allocated to Ali Baba at a particular moment in time. The result is a list of close to 300 unique IP ranges that you can simply feed into your firewall for blocking before they hit your web server.

Why would you do this? If you’re serving human traffic primarily, blocking the networks for major cloud providers takes out most of the AI bot traffic. No humans live inside these cloud datacenters, so you won’t affect your visitors. Large search engines generally operate outside the AS numbers of the cloud platforms. So Googlebot won’t come from the same ASN as the people renting servers through the Google Cloud Platform.

For now, I ran a blocklist that will block most of the public cloud. You can get it here. It contains IP-blocks for AWS, most of Microsoft, Google Cloud Platform and Ali Baba cloud. Blockafeller itself will be released some time next week.