In the should t/suki registration be open to the public? thread, the topic of blocking LLM scrapers with Anubis came up. Since it was off-topic, I’ve created a topic here to discuss it.
If you aren’t familiar, Anubis is a relatively new tool which blocks scrapers by requiring connections to solve a proof of work, making it hard enough for most scrapers to access your website, preventing them from scraping all of your data. It’s successful and viral enough that it’s used by some pretty big names now, including UNESCO.
However, accessibility is a concern with deploying something like Anubis. The Anubis docs say this:
Anubis is a bit of a nuclear response. This will result in your website being blocked from smaller scrapers and may inhibit “good bots” like the Internet Archive. You can configure bot policy definitions to explicitly allowlist them and we are working on a curated set of “known good” bots to allow for a compromise between discoverability and uptime.
And when Xe Iaso (the developer of Anubis) first posted about it, this is what they had to say about accessibility:
This will also lock out users who have JavaScript disabled, prevent your server from being indexed in search engines, require users to have HTTP cookies enabled, and require users to spend time solving the proof-of-work challenge.
This does mean that users using text-only browsers or older machines where they are unable to update their browser will be locked out of services protected by Anubis. This is a tradeoff that I am not happy about, but it is the world we live in now.
I think I concur with Xe Iaso that, unfortunately, something like this is necessary despite the downsides. I’ve actually had problems with LLM scrapers on a self-hosted instance of gitea that I ran with a friend, before making git.tsuki.games
. It took our service offline and we solved it just by restricting access to logged-in users only. However, that’s not something I currently plan on doing with forum.tsuki.games
.
At the moment, I’m inclined to install Anubis for forum.tsuki.games
soon, maybe in two weeks or so, but until then I think it’s worthwhile to have a conversation about it.