Page MenuHomePhabricator

Add Anubis to XTools
Closed, ResolvedPublic

Description

This task documents the effort to get Anibus on XTools to help shield us from the armies of AI bots and web crawlers.

I successfully got Anubis runnging on a third-party wiki running on Ubuntu / Apache. Cloud VPS presents a different challenge, since TLS termination happens by the Cloud Services web proxy.

Environment
Steps taken

Changes made in Horizon

None! We need a web proxy pointing to port 80 and an ingress security group allowing traffic, which we already have on this instance.

Apache configuration

  1. Enable mod_headers with sudo a2enmod headers
  2. Enable mod_proxy_http with sudo a2enmod proxy_http
  3. Adjust the conf file to add Anubis as a reverse proxy:
xtools.conf
# Forward traffic to Anubis
<VirtualHost *:80>
       ServerAdmin tools-xtools-dev@toolforge.org
       ServerName xtools-dev.wmcloud.org
       DocumentRoot /var/www/public
       ErrorLog ${APACHE_LOG_DIR}/error-anubis.log
       CustomLog ${APACHE_LOG_DIR}/access-anubis.log combined
       LogFormat "%{X-Forwarded-For}i %t \"%r\" %>s \"%{Referer}i\" \"%{User-Agent}i\""

       # These headers need to be set or else Anubis will
       # throw an "admin misconfiguration" error.
       RemoteIPHeader X-Forwarded-For
       SetEnvIf X-Forwarded-For "(.*)" saved_x_forwarded_for=$1
       RequestHeader set "X-Real-Ip" "%{saved_x_forwarded_for}e"

       ProxyPreserveHost On

       # Replace 9000 with the port Anubis listens on
       ProxyPass / http://[::1]:9000/
       ProxyPassReverse / http://[::1]:9000/
</VirtualHost>

# Tell apache to listen on port 3001 so that it can serve the actual website
Listen 0.0.0.0:3001

<VirtualHost *:3001>
        DocumentRoot /var/www/public
        ServerName xtools-dev.wmcloud.org
        # Same config we normally use
        # …
</VirtualHost>
  1. Restart Apache with sudo service apache2 restart

Installing Anubis
I went with the native approach and am using Alien to build a Debian package from an RPM package:

$ sudo apt install alien

Setting up alien (8.95.4) ...
$ wget https://github.com/TecharoHQ/anubis/releases/download/v1.21.1/anubis-1.21.1-1.x86_64.rpm

2025-07-23 05:30:00 (125 MB/s) - 'anubis-1.21.1-1.x86_64.rpm' saved [12737057/12737057]
sudo 
$ sudo alien -d anubis-1.21.1-1.x86_64.rpm

anubis_1.21.1-2_amd64.deb generated
$ sudo apt install ./anubis_1.21.1-2_amd64.deb

Setting up anubis (1.21.1-2) ...

Running Anubis

For initial testing, we can use:

$ anubis -bind=0.0.0.0:9000 -target=http://0.0.0.0:3001 -cookie-domain xtools-dev.wmcloud.org

For production, we'll want a systemd service:

/etc/systemd/system/anubis.service
[Unit]
Description=Anubis Service

[Service]
User=root
Group=root
ExecStart=/usr/bin/anubis -bind 0.0.0.0:9000 -target=http://0.0.0.0:3001 -cookie-domain=xtools-dev.wmcloud.org
Restart=always

[Install]
WantedBy=multi-user.target

Enable and start the service:

  1. sudo systemctl enable anubis
  2. sudo systemctl start anubis
Conclusion

In a single word: amazeballs

Anubis was deployed with XTools 3.22 and the results are stunning. It would appear that the lion's share of errors we were receiving regularly were due to bots hogging up resources.

A typical 24-hour period prior to Anubis:

Screenshot from 2025-08-20 18-36-53.png (766×2 px, 290 KB)

And what we have so far post-Anubis:

Screenshot from 2025-08-20 18-39-03.png (766×2 px, 232 KB)

It hasn't been but about 8 hours, but I think it's safe to say this Anubis thing works! The load, network and CPU usage is dramatically lower than before. I'm not able to visualize the error rate, but I can tell you prior to Anubis, we'd get a query timeout error (or something similar) every few minutes. So far, we have seen zero errors after deploying Anubis! :mindblown: I even tested to make sure error reporting wasn't broken. It's not. There have really been no errors at all :)

Ever since we launched XTools 3.0 in 2017, we have had to deal with abusive bots. I guesstimate my total time spent on counter-bot measures over the years must be in the hundreds of hours. Hopefully now, once and for all, that is a problem of the past!

It is also worth noting that the Page History tool (which is among the most popular), now caches its full results for 10 minutes like every other tool. That surely is also helping a great deal, as before it would re-query on every request. We already had counter-bot measures for this, so I do not believe it to be a significant factor in reduction of CPU etc., but it is certainly a factor.

Event Timeline

MusikAnimal triaged this task as High priority.
MusikAnimal changed the task status from Open to In Progress.Jul 23 2025, 4:45 PM
MusikAnimal updated the task description. (Show Details)

I am able to start Anubis without issue, and Apache is running without errors, but the traffic never reaches either. So I think I'm doing something wrong in Horizon, either with the web proxy (I've tried port 9000 and 443) and/or the security groups.

Also of note is we need to set a few headers for Anubis to function:

RequestHeader set "X-Real-Ip" expr=%{X-Forwarded-For}
RequestHeader set X-Forwarded-Proto "https"
RequestHeader set "X-Http-Version" "%{SERVER_PROTOCOL}s"

I realize the X-Forwarded-For syntax doesn't work here… that's not an environment variable. But any rate, I think the need for these headers means we can't have the web proxy point directly to Anubis, which might be a problem (?)

Change xtools-dev.wmcloud.org web proxy to point to port 9000 (not sure if should use HTTPS or not)

The VPS web proxy acts as the TLS terminator, so we shouldn't need to use HTTPS.

Add a new security group to the xtools-dev06 instance that has an Ingress port 9000, and another with port 443 (I don't know what I'm doing so I'm attempting both)

Should only be needed for 9000.

Then adjust the conf file to add Anubis as a reverse proxy:

I think this is unnecessary. We're already configuring the web proxy to point to 9000, where anubis is running.

Also of note is we need to set a few headers for Anubis to function:

RequestHeader set "X-Real-Ip" expr=%{X-Forwarded-For}
RequestHeader set X-Forwarded-Proto "https"
RequestHeader set "X-Http-Version" "%{SERVER_PROTOCOL}s"

I realize the X-Forwarded-For syntax doesn't work here… that's not an environment variable. But any rate, I think the need for these headers means we can't have the web proxy point directly to Anubis, which might be a problem (?)

Not at all familiar with Anubis, but surely there's a way to configure Anubis to treat the XFF header as the user IP? If there's not, that would be a surprising limitation. Almost everything running behind a load balancer sends the end-user IP in XFF, so it might even be the default behaviour?

(If setting some headers is still required, we can run anubis on 9001 instead, and reverse-proxy port 9000 to 9001.)

anubis -bind localhost:9000 -target localhost:3001 -cookie-domain xtools-dev.wmcloud.org

Use of localhost looks suspicious. I think you need to bind to 0.0.0.0:9000

(Do take all this with a pinch of salt – my ops experience is very limited!)

Okay, Anubis is up and running now! Thanks to @SD0001 for the pointers. That helped.

Task description denotes every step I took to get it working. The next step is to evaluate effectiveness. I'll monitor the logs and report back (xtools-dev gets a fair amount of bot traffic, too).

Note that we currently aren't customizing Anubis in any way. The out of the box config was enough for my 3rd party wiki, but XTools might need to be a bit more strict.


Also, possible problem: XTools uses a single thread for both the db queries and all application logic, including presentation. This means if you browse to an XTools URL that takes a while (say, for a user with lots of edits), you will be in a loading state with the previous web page still visible until everything finishes. This was fine before, but now Anubis outputs an info page that the user will be stuck seeing for much longer than they should be. This could lead to confusion, where the user thinks the bot verification is stuck or taking a long time, when really it's just the SQL queries being ran by XTools.

I'm hoping to figure out a way to customize the info page to say "Waiting for results from XTools to be calculated…" (or something). Anubis claims such customization can't be done without the paid tier, but it is open source…

Alternatively, we could rework the main controller to end the session early, and have it poll to fetch results from the other process. This would be quite involved, and could potentially cause more problems as every request would result in two Apache processes.

I think I have confirmed that the X-Forwarded-For value is still seen by Apache, provided the client passes the Anubis challenge. This is good because I still want to be able to block IPs/ranges as needed, assuming some will manage to pass the challenge.

That said, all these internal IPs and 200 response codes I see in the logs must be actual bots! I tried changing the Anubis -xff-strip-private option, but I always see internal IPs on port 80. That's fine though, so long as the real IPs are passed on to the application. I even removed logging for port 80.

Unless I'm missing something, I think this is good to deploy to prod! Out of an abundance of caution, I'd like to set up a fresh VPS instance and switch traffic to it once it's deemed stable. This will require we request a temporary increase in our quota. I've filed T400853 for that.

New VM is running at https://xtools2.wmcloud.org, but I can't test Anubis without the XFF headers being enabled. I've filed T400964.

New VM is running at https://xtools2.wmcloud.org, but I can't test Anubis without the XFF headers being enabled. I've filed T400964.

Scratch that. I'm using https://xtools-dev.wmcloud.org now to test the new prod setup (with an API server). Everything seems to work great! We're also using Debian Trixie which was just made available to us, so we're running on PHP 8.4 now.

The final step is to make sure nothing breaks in PHP 8.4. I'll add it to the GitHub CI workflow for starters, and we'll just need to poke around xtools-dev.wmcloud.org manually to make sure everything looks alright. Worse comes to worse, I can rebuild the VMs using Debian Bookworm (PHP 8.2) which we already know works.

MusikAnimal updated the task description. (Show Details)
MusikAnimal moved this task from General / other to Complete on the XTools board.

And that's a wrap! Anubis appears to be working wonders :)

Resolving!