Page MenuHomePhabricator

Geohack tool frequently triggers the Toolforge front proxy's per-tool rate limit due to too much traffic
Closed, ResolvedPublicBUG REPORT

Description

The issue seems to be known here: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#%22receiving_more_traffic_than_it_can_handle%22

In my opinion the Wikimedia Toolforge Error calling the Geohack tool due to too much traffic occurs too frequently.
Example: opened the URL https://geohack.toolforge.org/geohack.php?pagename=Liste_der_Kulturdenkmale_in_B%C3%BCnsdorf&language=de&params=54.368713_N_9.743588_E_region:DE-SH_type:building&title=D%C3%B6rpstraat%2C+Kirchhof
and got the Wikimedia Toolforge Error like attached image

image.png (1×1 px, 69 KB)

This error occurs many times within an hour and in my opinion it's too frequently. Maybe there is a setting to raise the traffic permission for this Geohack tool to avoid or minimize these errors?

Links:

Event Timeline

The Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task. Thanks!

What the displayed link says. You should report this to the maintainers of the Geohack project. Not to Toolforge or the WMCS team.

bd808 renamed this task from Geohack tool runs too frequently into Wikimedia Toolforge Error due to too much traffic to Geohack tool frequently triggers the Toolforge front proxy's per-tool rate limit due to too much traffic.Nov 6 2025, 11:24 PM
bd808 edited projects, added Tools; removed Kubernetes.
bd808 added subscribers: taavi, Dispenser, Kolossos and 2 others.

I am reopening this because it actually is a legitimate topic of discussion. The geohack tool is linked from content space on enwiki and other wikis. It is also typically the highest activity Toolforge tool, often attempting to handle 60 requests per second or more (grafana dashboard).

The error page this bug report is about is triggered when the limit-per-tool rate counter in the Toolforge shared front proxy has seen more that 250 concurrent connections to geohack in the last 10s.

@taavi do we have any tracking that will let us see how often the concurrency limit is being tripped broken out by tool? I think the tool-dashboard for geohack gets data from the Kubernetes cluster which would not know when the layer above it's ingress has returned a 503 response to the client.

I would very much expect that a healthy portion of the traffic to geohack is unwanted bot requests, but there really is not much the tool itself can do about that.

If the tool were made to execute faster it might fall under the 250 concurrent request threshold more often. The tool is already running with --replicas 6 so there are not likely to be more gains from that. The tool is currently running via the php 7.4 legacy image from NFS mounted code. It is possible that some performance gains could be seen from migrating to a custom build service container where both a newer PHP and the ability to run without NFS could be added.

Some of those 6 replicas were hitting the half a core CPU limit. I've bumped the CPU limit to a core (and slightly lowered the replica count from 6 to 4) to see if that has an impact. So far it seems like the request rate showing in Grafana went up even more to around 80 rps?

@taavi do we have any tracking that will let us see how often the concurrency limit is being tripped broken out by tool? I think the tool-dashboard for geohack gets data from the Kubernetes cluster which would not know when the layer above it's ingress has returned a 503 response to the client.

Not directly, unfortunately. One could compare the general frontend and backend error rate metrics to see the number of requests rejected on the HAProxy layer, but that is not split per tool. I did check whether T343885: [promethus,haproxy] Move to haproxy internal metrics from haproxy_exporter would help with it, but unfortunately that doesn't seem to be the case.

Would it help if I rewrote it in Rust, with some caching? Should at least lower the CPU load...

Would it help if I rewrote it in Rust, with some caching? Should at least lower the CPU load...

Rust running from a container without NFS mounts would be about as fast as things could get. If you have the time and energy to work on that @Magnus I think it would be helpful.

Rust version is done: https://github.com/magnusmanske/geohack

Fully compatibly including path/parameters, a change should be invisible to the user, except faster and without occasional Toolforge errors.

There is a region.php script in the original repo, which I did not take forward. It used to be included in geohack.php but the include is now commented out. It also presents its own webpage, apparently querying a database, but it seems to silently fail.

Could one of the Toolforge overlords switch it over (build and restart scripts are included)? I could probably do it myself but it seems the tool hav been specifically configured to handle the load, and I don't want to break anything.

Magnus triaged this task as Unbreak Now! priority.Nov 19 2025, 1:23 PM

I tried to restart the webservice as the Rust version (in geohack subdir, ./build.sh / restart.sh) but it seems to be not working. I added the Procfile and everything. Help?

taavi lowered the priority of this task from Unbreak Now! to Needs Triage.Nov 19 2025, 1:32 PM

It seems like your Procfile is missing the actual command to execute? Try this:

web: GEOHACK_ADDRESS=0.0.0.0 GEOHACK_PORT=$PORT ./target/bin/geohack
Magnus changed the task status from Open to In Progress.Nov 19 2025, 1:47 PM

thanks, found it myself :-)

It's too early to make full conclusions yet, but so far it's looking very promising. CPU usage is way down:

image.png (349×425 px, 16 KB)

There's also a significant reduction in network usage, which I assume is from the new caching reducing the number of MW API calls:
image.png (337×808 px, 29 KB)

Fixed the CSS MIMEtype header. Seems to run smoothly on 1 CPU now. Feel free to up that if it's overloading.

bd808 assigned this task to Magnus.

Thanks for working on speeding the tool up @Magnus. You really made a difference here.

We do not yet have tracking for rate limits firing, but the rewrite really does seem to have removed the problem. CPU, RAM, and network utilization continue to report as greatly reduced while the traffic the tool handles successfully has remained at or above 60 req/s. It apparently even saw a spike of 165 req/s yesterday.

I have been tailing the HAProxy logs for about 20 minutes. The tool is currently handling around 65 req/s, and the concurrency counter is generally staying below 10 concurrent connections of the 250 allowed by the throttle.