Page MenuHomePhabricator

Patch Demo can publicly reveal User-Agents via the CheckUser extension
Closed, ResolvedPublic3 Estimated Story PointsSecurity

Description

Proof of concept:

  1. Person A creates a patch demo wiki using https://patchdemo.wmcloud.org/ using the default settings (except with the "Use (Catalyst) Kubernetes backend (beta)" option unticked, due to T384662).
  2. Person A takes actions on the demo wiki while logged in to an account.
  3. Person B logs into the demo wiki using the account "Patch Demo", which will have been automatically added to the checkuser group (https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/blob/6f56abcc924eab54eeef55a9d859aa6f00c6a213/new/postinstall.sh#L100-102). The credentials for this account appear to be public knowledge (see e.g., https://www.mediawiki.org/wiki/Design_System_Team/Test_environments#Patchdemo, https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/blob/6f56abcc924eab54eeef55a9d859aa6f00c6a213/new.php#L118). This is understandable given that the point of the wiki is to easily test patches; however, this means that anyone with an Internet connection can theoretically log-into & act as this account.
  4. After logging in, Person B navigates to Special:CheckUser, and runs a check against every account listed at Special:ListUsers. Person B now knows the User-Agent(s) used by Person A (and any other people that have used the demo wiki) -- who, to my knowledge, will not have been told that their User-Agent(s) can be accessed by anyone in this way.

Notes

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
move fix for T385156 to a static apache config filerepos/test-platform/catalyst/patchdemo!132jnucheT385156main
do not collect UA/client hints datarepos/test-platform/catalyst/ci-charts!69jnucheT385156main
do not collect UA/client hints datarepos/test-platform/catalyst/patchdemo!131jnucheT385156main
Customize query in GitLab

Event Timeline

Although a User-Agent string is arguably less sensitive than an IP address, to my knowledge it is still considered personal information, hence why I am submitting this security report.

So it is under the Foundation Privacy Policy, but I'm fairly certain that doesn't govern WMCS. Still, it would be nice not to disclose random user UAs if we didn't have to.

So it is under the Foundation Privacy Policy, but I'm fairly certain that doesn't govern WMCS.

Admittedly this might be a question best answered by a lawyer; but (to my knowledge) Patch Demo is operated by the WMF/a WMF team (judging by e.g. updates like https://wikitech.wikimedia.org/wiki/Catalyst/Updates/2025-01-26 being signed by a WMF software engineer), which (also to my knowledge) would mean that the Foundation Privacy Policy would apply by virtue of https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use#7.1_If_this_is_a_WMCS_Project_you_are_developing_or_administering_in_the_course_of_your_work_for_WMF

So I don't see the required privacy notice defined within that same wikitech doc section anywhere on the patchdemo site. So I'm not certain if there's been a legal exception here or not - a patchdemo maintainer might have more information on that.

So I don't see the required privacy notice defined within that same wikitech doc section anywhere on the patchdemo site. So I'm not certain if there's been a legal exception here or not - a patchdemo maintainer might have more information on that.

There has not as far as I'm aware having recently taken over patchdemo development responsibilities. Also IANAL the privacy policy seems ambiguous here to me.

While we could generate these passwords and share them on a per-environment basis (which would solve the problem of well-known passwords for all instances), it would still mean that, as in the WP:ANB post shared here by @A_smart_kitten, nothing stops folks from posting those random passwords (which will continue to be necessary to demo things like CheckUser).

Maybe we should get an explicit exception and add a notice to wikis instead. Open to other ideas.

@thcipriani - A Legal exception sounds like the best approach to me. Or at least making them aware of this issue. Do you or the Catalyst team want to get that process started? Privacy Engineering might be able to help shepherd this request to Legal via their weekly privacy check-in (cc: @sguebo_WMF).

Also IANAL the privacy policy seems ambiguous here to me.

Could you clarify which part seems ambiguous? It seems clear to me.

https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use#2._Definitions indicates that "user agent" is part of "Personal Information".

Since "Personal Information" is being collected, https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use#7.3.1_If_your_WMCS_Project_collects_Personal_Information states the requirements that must be followed, including the Privacy Statement. Notably the maintainer is supposed to "appropriate measures to protect the security of any Personal Information", which is the bug here.

The easiest solution IMO is to not store the user-agent and strip it at some proxy before CheckUser would see it. (Or some MediaWiki hack, etc.)

Note that we would likely also need to consider removing or faking any http-client-hints data. That could not be done via a modification of HTTP headers as it is usually collected via a JS API.

thcipriani triaged this task as High priority.EditedApr 3 2025, 12:42 AM

@thcipriani - A Legal exception sounds like the best approach to me. Or at least making them aware of this issue. Do you or the Catalyst team want to get that process started? Privacy Engineering might be able to help shepherd this request to Legal via their weekly privacy check-in (cc: @sguebo_WMF).

Emailed privacy@

Also IANAL the privacy policy seems ambiguous here to me.

Could you clarify which part seems ambiguous? It seems clear to me.

https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use#2._Definitions indicates that "user agent" is part of "Personal Information".

Since "Personal Information" is being collected, https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use#7.3.1_If_your_WMCS_Project_collects_Personal_Information states the requirements that must be followed, including the Privacy Statement. Notably the maintainer is supposed to "appropriate measures to protect the security of any Personal Information", which is the bug here.

Agree, cloud services terms are clear. The ambiguity is Foundation privacy policy#What this privacy policy does and does not cover.

The part that says, "some uses may be covered by separate privacy policies [e.g.,] services run by third parties (such as third-party developer projects on Wikimedia Cloud Services)" The term "third-party" is defined as "Individuals, entities, websites, services, products, and applications that are not controlled, managed, or operated by the Wikimedia Foundation."

So, third-party uses of Cloud Services "may be" (read: "are") covered by other policies. Is WMF using cloud services explicitly covered by this policy?

Regardless: I don't want to have this data.

The easiest solution IMO is to not store the user-agent and strip it at some proxy before CheckUser would see it. (Or some MediaWiki hack, etc.)

That's a good call. We're using apache for both the wikis. I tested CheckUser just now enabling mod_headers with RequestHeader unset User-Agent and that gets rid of the user agent. We could also fake one if that's helpful for developers. For now, let's remove.

Note that we would likely also need to consider removing or faking any http-client-hints data. That could not be done via a modification of HTTP headers as it is usually collected via a JS API.

hrm. Right now we can set $wgCheckUserClientHintsEnabled = false; But it's on our roadmap for users to be able to append things to LocalSettings.php (I guess we could always source some final settings file and document that...). Would that suffice?

Note that we would likely also need to consider removing or faking any http-client-hints data. That could not be done via a modification of HTTP headers as it is usually collected via a JS API.

hrm. Right now we can set $wgCheckUserClientHintsEnabled = false; But it's on our roadmap for users to be able to append things to LocalSettings.php (I guess we could always source some final settings file and document that...). Would that suffice?

We planned to drop that configuration variable, but could always keep that one long term for this use case.

I think setting that configuration value should suffice as long as there is no way to override it. It would mean that testing Client Hints data would not be possible on Patch Demo, but I think the other solution of faking the data would be harder to keep working long term.

We will also have planned other data collection, but I'll make sure we have a off switch for them too that is kept long term.

Something to potentially also consider - when a fix for this is merged, should something be done with these sorts of information that is currently stored in Patch Demo wiki databases?

As a random potential idea regarding the above: for all patch demo wikis with CheckUser installed, could a CheckUser maintenance script be run to purge all previously-collected/currently-stored CU data?

thcipriani set the point value for this task to 3.
thcipriani moved this task from Backlog to Ready on the Catalyst (olin) board.

Let me know if you need any assistance with CheckUser related questions or problems.

Following fixes are in prod now:

The CheckUser extension in new wiki environments will now gather no UA data. This applies for both legacy and Catalyst envs:

image.png (966×222 px, 16 KB)

As @thcipriani and @Dreamy_Jazz mentioned, we will need to be careful not to allow $wgCheckUserClientHintsEnabled to be overridden in the future.

I looked into possibly manually deleting the UA and hints data from existing envs but I immediately ran into a bunch of different DB tables containing portions of the data: cu_changes, cu_useragent_clienthints, cu_useragent_clienthints_map, cu_private_event and possibly more. The relations in the data not being immediately obvious to me. If we want to so dome cleanup, @A_smart_kitten's suggestion of running a maintenance script is probably the way to go, but I don't know who could write that. Probably we want to do that in a separate task though.

For now I think we can close this task if that looks good to everyone

If we want to so dome cleanup, @A_smart_kitten's suggestion of running a maintenance script is probably the way to go, but I don't know who could write that.

Random idea (probably needs double-checking to make sure it’d work): could the CUDMaxAge config variable ($wgCUDMaxAge?) temporarily be set to 0 for all wikis with CheckUser installed, and then the purgeOldData.php maintenance script run at all those wikis?

If we want to so dome cleanup, @A_smart_kitten's suggestion of running a maintenance script is probably the way to go, but I don't know who could write that.

Random idea (probably needs double-checking to make sure it’d work): could the CUDMaxAge config variable ($wgCUDMaxAge?) temporarily be set to 0 for all wikis with CheckUser installed, and then the purgeOldData.php maintenance script run at all those wikis?

This approach would work. It would however drop these rows completely which could cause issues if Patch Demo wikis were being re-used for testing and rows disappear that should have already been there. However, given that these are testing wikis that should not matter (especially as this would only be done once).

We could alternatively do the following on all Patch Demo wikis:

  1. Truncate cu_useragent_clienthints_map and cu_useragent_clienthints
  2. Blank cuc_agent from cu_changes, cule_agent from cu_log_event, and cupe_agent from cu_private_event

However, that would probably be harder to do.


I would also note that we should do something because the expiration without using the maintenance script only occurs if users make edits on the wiki (so a wiki that is no longer being used but has not been deleted will have this data). It probably makes sense to run the purge script long term independently of this task.

Random idea (probably needs double-checking to make sure it’d work): could the CUDMaxAge config variable ($wgCUDMaxAge?) temporarily be set to 0 for all wikis with CheckUser installed, and then the purgeOldData.php maintenance script run at all those wikis?

Never ran a maintenance script before, it looks like the command on the wiki env would look like: root@some-wiki:/var/www/html/w# ./maintenance/run.php extensions/CheckUser/maintenance/purgeOldData.php. Does that look correct to you folks?

This approach would work. It would however drop these rows completely which could cause issues if Patch Demo wikis were being re-used for testing and rows disappear that should have already been there. However, given that these are testing wikis that should not matter (especially as this would only be done once)

Besides the UA and hints data, what other kinds of data would be removed?

Random idea (probably needs double-checking to make sure it’d work): could the CUDMaxAge config variable ($wgCUDMaxAge?) temporarily be set to 0 for all wikis with CheckUser installed, and then the purgeOldData.php maintenance script run at all those wikis?

Never ran a maintenance script before, it looks like the command on the wiki env would look like: root@some-wiki:/var/www/html/w# ./maintenance/run.php extensions/CheckUser/maintenance/purgeOldData.php. Does that look correct to you folks?

That looks fine to me.

This approach would work. It would however drop these rows completely which could cause issues if Patch Demo wikis were being re-used for testing and rows disappear that should have already been there. However, given that these are testing wikis that should not matter (especially as this would only be done once)

Besides the UA and hints data, what other kinds of data would be removed?

All data stored in CheckUser. Things like CheckUser entries about edits, page moves, logins, etc. it doesn't mean that the edits get removed, but instead they are not visible in the CheckUser tools.

For example, anyone using Special:CheckUser on a patch demo wiki would get no results for actions performed before you run the script.

Another way to see this is if for any CheckUser interface it's as if you just created a new patch demo wiki.

Given that these are inherently test wikis, I see no issue with doing this.

Never ran a maintenance script before, it looks like the command on the wiki env would look like: root@some-wiki:/var/www/html/w# ./maintenance/run.php extensions/CheckUser/maintenance/purgeOldData.php. Does that look correct to you folks?

Given that some wikis could be running <MW1.40, and therefore might not have run.php (cf. T372963#10080076), maybe it's worth running purgeOldData.php directly? Otherwise seems okay, but I'm very far from an expert on this so I shouldn't be relied upon here!

Never ran a maintenance script before, it looks like the command on the wiki env would look like: root@some-wiki:/var/www/html/w# ./maintenance/run.php extensions/CheckUser/maintenance/purgeOldData.php. Does that look correct to you folks?

Given that some wikis could be running <MW1.40, and therefore might not have run.php (cf. T372963#10080076), maybe it's worth running purgeOldData.php directly? Otherwise seems okay, but I'm very far from an expert on this so I shouldn't be relied upon here!

Good point on that. It should be fine as long as the script runs. It may not purge anything for old patch demo wikis, as the data may have already been expired.

Given that some wikis could be running <MW1.40, and therefore might not have run.php (cf. T372963#10080076), maybe it's worth running purgeOldData.php directly? Otherwise seems okay, but I'm very far from an expert on this so I shouldn't be relied upon here!

Right now the oldest wiki we have was created in October and is running 1.43 so I think we're good there.

Besides the UA and hints data, what other kinds of data would be removed?

All data stored in CheckUser. Things like CheckUser entries about edits, page moves, logins, etc. it doesn't mean that the edits get removed, but instead they are not visible in the CheckUser tools.

For example, anyone using Special:CheckUser on a patch demo wiki would get no results for actions performed before you run the script.

Another way to see this is if for any CheckUser interface it's as if you just created a new patch demo wiki.

Given that these are inherently test wikis, I see no issue with doing this.

@thcipriani knowing that, would you be ok with running the purging script? Alternatively, I've checked which wikis have CheckUser patches, we could contact the creators and ask if they would be ok with losing access to that data:

wikicreator
7addbc8809DJacksonA
c617841011Jon (WMF)
e703a79aedNKohli (WMF)

These are meant to be ephemeral test wikis—let's go ahead and purge it. This is not data we want.

For these wikis were purging data from, did they get updates to ensure they won't start collecting the same data again?

For these wikis were purging data from, did they get updates to ensure they won't start collecting the same data again?

I updated all preexisting Patchdemo-backend wikis with the newer configuration and confirmed CU data is no longer being collected. For the Catalyst-backend wikis I added $wgCheckUserClientHintsEnabled = false but the Apache config to remove the UA header cannot be added without recreating the env, which is something we don't support yet. There is only a small bunch of Catalyst wikis without the fix and they are old, once we start reaping old environments automatically (which should happens soon) these envs will go away.

To purge the data, I added $wgCUDMaxAge = 0; to all wikis (both Patchdemo and Catalyst) and then ran the purge script. Data was remove from all of them:

[...]
Purging data from cu_changes...Purged 5 rows and 0 client hint mapping rows.
Purging data from cu_log_event...Purged 4 rows and 0 client hint mapping rows.
Purging data from cu_private_event...Purged 12 rows and 47 client hint mapping rows.
Purged 5 central index rows.
Purged 0 orphaned client hint mapping rows.
Purging data from recentchanges...Finished purging data from recentchanges.
Done.
Purging data from cu_changes...Purged 1 rows and 0 client hint mapping rows.
Purging data from cu_log_event...Purged 3 rows and 0 client hint mapping rows.
Purging data from cu_private_event...Purged 4 rows and 0 client hint mapping rows.
Purged 4 central index rows.
Purged 0 orphaned client hint mapping rows.
Purging data from recentchanges...Finished purging data from recentchanges.
Done.
[...]

For a majority of envs I executed: ./maintenance/run.php ./extensions/CheckUser/maintenance/purgeOldData.php. I had to make an exception for env 47eeff0ea4, which was created in Feb but using branch REL1_39. run.php didn't exist for that env so I executed purgeOldData.php directly:

www-data@<c>:/var/www/html/wikis/47eeff0ea4/w$ php ./extensions/CheckUser/maintenance/purgeOldData.php
Purging data from cu_changes...7 rows.
Purging data from recentchanges...0 rows.
Done.

I believe that covers enrything. Closing task

sbassett moved this task from Incoming to Completed on the Privacy Engineering board.

@jnuche et al - Should this task remain security-protected? I'm not seeing anything obvious that would require it to be.

The only thing that immediately occurs to me is whether/in case it should remain protected until the Catalyst wikis without the Apache fix applied to them (referred to in T385156#10792315) are deleted?

The only thing that immediately occurs to me is whether/in case it should remain protected until the Catalyst wikis without the Apache fix applied to them (referred to in T385156#10792315) are deleted?

Looking at the table at https://patchdemo.wmcloud.org/, I would guess that the wikis in question here are now 6682c16744 (created by Roan Kattouw (WMF)), 4d2a112178 (created by Lucas Werkmeister (WMDE)), and 6c57336911 (created by Lucas Werkmeister (WMDE)) -- those being what look like the catalyst-backend wikis with a creation date before @jnuche's comment above (2025-05-05), that have CheckUser listed on their Special:Version page.
(@jnuche or anyone else, please could you check that I haven't accidentally missed out any wikis here?)

As far as I can see, both of those wikis' creators have automatic access to Phab security tasks, so I'll be bold and ping them here: @Catrope @Lucas_Werkmeister_WMDE would you be okay with deleting/with those demo wikis being deleted in the interests of being able to make this security task public, or would you prefer for them to not be deleted?

(@jnuche or anyone else, please could you check that I haven't accidentally missed out any wikis here?)

Thanks for following up on this @A_smart_kitten. You list seems complete, there aren't any other old Catalyst environments without the Apache config fix.

I originally expected those wikis to eventually expire, unfortunately the criteria for wiki expiration that in the end was agreed upon left those wikis out. Hopefully @Catrope and @Lucas_Werkmeister_WMDE are ok with having them deleted

I’m okay with having those two wikis deleted.

I've gone ahead and deleted the last wiki without the Apache fix

EDIT: Wasn't sure about how tags are handled in this case, please adjust as necessary

jnuche changed the visibility from "Custom Policy" to "Public (No Login Required)".Jul 16 2025, 8:33 AM
jnuche changed the edit policy from "Custom Policy" to "All Users".

That's a good call. We're using apache for both the wikis. I tested CheckUser just now enabling mod_headers with RequestHeader unset User-Agent and that gets rid of the user agent.

Unfortunately, this breaks the Wikibase REST API on PatchDemo. As per T318151: 🥐 Require clients to identify themselves by providing User-Agent header, Wikibase rejects REST requests with missing or empty User-Agent headers.

We could also fake one if that's helpful for developers.

That would be helpful, indeed.

That's a good call. We're using apache for both the wikis. I tested CheckUser just now enabling mod_headers with RequestHeader unset User-Agent and that gets rid of the user agent.

Unfortunately, this breaks the Wikibase REST API on PatchDemo. As per T318151: 🥐 Require clients to identify themselves by providing User-Agent header, Wikibase rejects REST requests with missing or empty User-Agent headers.

We could also fake one if that's helpful for developers.

That would be helpful, indeed.

Split to T401354: Consider adding a fake User-Agent header to incoming requests to Patch Demo wikis :)