Page MenuHomePhabricator

Wikistats is using a malformed user agent
Closed, ResolvedPublic

Description

This email arrived today at abuse@

Hi,

I just noticed some weird stuff in our logfile:

wiki.lll.lu:80 185.15.56.1 - - [28/Dec/2023:20:36:04 +0100] "GET /api.php?action=query&meta=siteinfo&siprop=statistics&format=php&maxlag=5 HTTP/1.1" 200 713 "-" "${user_agent}"
wiki.lll.lu:80 185.15.56.1 - - [28/Dec/2023:20:36:05 +0100] "GET /api.php?action=query&meta=siteinfo&format=php&maxlag=5 HTTP/1.1" 200 2804 "-" "${user_agent}"

[Thu Dec 28 20:36:05.244793 2023] [proxy_fcgi:error] [pid 1757738] [client 185.15.56.1:54274] AH01071: Got error 'PHP message: PHP Warning:  is_readable(): open_basedir restriction in effect. File(/gitinfo/info.json) is not within the allowed path(s): (/var/www/html/:/etc/:/usr/share/php/:/usr/share/mediawiki/:/var/lib/mediawiki/:/var/www/mediawiki/) in /usr/share/mediawiki/includes/GitInfo.php on line 173

What is going on here? Shouldn't the ${user_agent} string be replaced with the actual bot's name?

And what why is it probing for Git? (which we don't use)

Thanks,

Alain

This does not seem malicious or damaging but it would be nice to locate the source of this traffic and help them out with string substitution.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Responded with:

Hello, Alain.

That traffic is originating from Wikimedia Cloud Services, a public cloud used for both internal and volunteer-maintained services. I suspect that what you're seeing is resulting from a project that contains a copy/paste error, possibly a proof-of-concept experiment.

We will make an effort to locate the source of the traffic and correct the malformed requests but it may take some time as most staff is currently on break. If this traffic is causing you immediate distress or disruption, please follow up and I'll make it a priority.

I've created a tracking task for this which you can see at https://phabricator.wikimedia.org/T354101.

-Andrew

And what why is it probing for Git? (which we don't use)

The request is for the the Siteinfo part of the API which displays some statistics & info about the site and some metadata about the installation. There's certainly nothing harmful about it.

This does when you install mediawiki using git include the git revision of the install.

Hope that answers that part.

RhinosF1 triaged this task as High priority.

And that's also one of my vps projects, will try and get the fix deployed in the morning. Feel free to reach out if have any questions.

RhinosF1 renamed this task from Malformed web requests from cloud-vps to wiki.lll.lu to Wikistats is using a malformed user agent.Dec 30 2023, 11:05 AM
RhinosF1 moved this task from Deployment to Radar on the User-RhinosF1 board.
RhinosF1 lowered the priority of this task from High to Medium.
RhinosF1 added a subscriber: Dzahn.

I've fixed the UA being malformed although the configured UA is a browser one. I'm leaving this open in case @Dzahn has anything to add when back or wants to change to a more informative UA.

Hi all,

Alain: Yea, this is from a user project that gathers statistics about public MediaWikis. Of course you are right and $user_agent should have been replaced with the actual user agent telling you this and it was a bug that this wasn't the case. Thanks for reporting it and apologies for it. There isn't anything harmful to it though.

Andrew: See above, please forward to them if needed and thanks for making the ticket. Sorry about the log spam.

RhinosF1: Thank you for fixing it! It's really appreciated :)

It was all about some bad quotes, like you can see here:

https://gitlab.wikimedia.org/cloudvps-repos/wikistats/-/merge_requests/7/diffs?commit_id=23f8d7555f09c17e7ab710ac9fd355615b811892

Thanks again to RhinosF1 for fixing it.

Still keeping it open to discuss what actual UA string to use, but the bug is gone and it's now down to the expected config setting in the file under /etc

Dzahn lowered the priority of this task from Medium to Low.Dec 31 2023, 10:26 PM

I have deployed the permanent change to the user agent and running some updates.

We will have to keep an eye on how many wikis may not return results anymore as before.

I compared the numbers grouped by http status code for the mediawikis table and don't see a fundamental difference.

We need to clean up data like duplicate wikis that are redirects and delete the ones not working anymore but that's normal maintenance work.