This task documents the effort to get Anibus on XTools to help shield us from the armies of AI bots and web crawlers.
I successfully got Anubis runnging on a third-party wiki running on Ubuntu / Apache. Cloud VPS presents a different challenge, since TLS termination happens by the Cloud Services web proxy.
Environment
- Project xtools, VPS instance xtools-dev06.xtools.eqiad1.wikimedia.cloud
- Debian Bullseye
- Apache for routing requests to the application.
Steps taken
Changes made in Horizon
None! We need a web proxy pointing to port 80 and an ingress security group allowing traffic, which we already have on this instance.
Apache configuration
- Enable mod_headers with sudo a2enmod headers
- Enable mod_proxy_http with sudo a2enmod proxy_http
- Adjust the conf file to add Anubis as a reverse proxy:
# Forward traffic to Anubis <VirtualHost *:80> ServerAdmin tools-xtools-dev@toolforge.org ServerName xtools-dev.wmcloud.org DocumentRoot /var/www/public ErrorLog ${APACHE_LOG_DIR}/error-anubis.log CustomLog ${APACHE_LOG_DIR}/access-anubis.log combined LogFormat "%{X-Forwarded-For}i %t \"%r\" %>s \"%{Referer}i\" \"%{User-Agent}i\"" # These headers need to be set or else Anubis will # throw an "admin misconfiguration" error. RemoteIPHeader X-Forwarded-For SetEnvIf X-Forwarded-For "(.*)" saved_x_forwarded_for=$1 RequestHeader set "X-Real-Ip" "%{saved_x_forwarded_for}e" ProxyPreserveHost On # Replace 9000 with the port Anubis listens on ProxyPass / http://[::1]:9000/ ProxyPassReverse / http://[::1]:9000/ </VirtualHost> # Tell apache to listen on port 3001 so that it can serve the actual website Listen 0.0.0.0:3001 <VirtualHost *:3001> DocumentRoot /var/www/public ServerName xtools-dev.wmcloud.org # Same config we normally use # … </VirtualHost>
- Restart Apache with sudo service apache2 restart
Installing Anubis
I went with the native approach and am using Alien to build a Debian package from an RPM package:
$ sudo apt install alien … Setting up alien (8.95.4) ... $ wget https://github.com/TecharoHQ/anubis/releases/download/v1.21.1/anubis-1.21.1-1.x86_64.rpm … 2025-07-23 05:30:00 (125 MB/s) - 'anubis-1.21.1-1.x86_64.rpm' saved [12737057/12737057] sudo $ sudo alien -d anubis-1.21.1-1.x86_64.rpm … anubis_1.21.1-2_amd64.deb generated $ sudo apt install ./anubis_1.21.1-2_amd64.deb … Setting up anubis (1.21.1-2) ...
Running Anubis
For initial testing, we can use:
$ anubis -bind=0.0.0.0:9000 -target=http://0.0.0.0:3001 -cookie-domain xtools-dev.wmcloud.orgFor production, we'll want a systemd service:
[Unit] Description=Anubis Service [Service] User=root Group=root ExecStart=/usr/bin/anubis -bind 0.0.0.0:9000 -target=http://0.0.0.0:3001 -cookie-domain=xtools-dev.wmcloud.org Restart=always [Install] WantedBy=multi-user.target
Enable and start the service:
- sudo systemctl enable anubis
- sudo systemctl start anubis
Conclusion
In a single word: amazeballs
Anubis was deployed with XTools 3.22 and the results are stunning. It would appear that the lion's share of errors we were receiving regularly were due to bots hogging up resources.
A typical 24-hour period prior to Anubis:
And what we have so far post-Anubis:
It hasn't been but about 8 hours, but I think it's safe to say this Anubis thing works! The load, network and CPU usage is dramatically lower than before. I'm not able to visualize the error rate, but I can tell you prior to Anubis, we'd get a query timeout error (or something similar) every few minutes. So far, we have seen zero errors after deploying Anubis! :mindblown: I even tested to make sure error reporting wasn't broken. It's not. There have really been no errors at all :)
Ever since we launched XTools 3.0 in 2017, we have had to deal with abusive bots. I guesstimate my total time spent on counter-bot measures over the years must be in the hundreds of hours. Hopefully now, once and for all, that is a problem of the past!
It is also worth noting that the Page History tool (which is among the most popular), now caches its full results for 10 minutes like every other tool. That surely is also helping a great deal, as before it would re-query on every request. We already had counter-bot measures for this, so I do not believe it to be a significant factor in reduction of CPU etc., but it is certainly a factor.

