Page MenuHomePhabricator

Add a way to generate a list of users with an invalid signatures
Closed, ResolvedPublic

Description

@HouseBlaster asked me to generate a list of users with an invalid signatures, so that they can be notified that their signature is about to be disabled as a result of T355462: Set $wgSignatureValidation to disallow [enwiki]: https://en.wikipedia.org/wiki/User_talk:Matma_Rex#c-HouseBlaster-20240129034400-Invalid_signatures

I'm going to do this with a maintenance script. Ideally, this would be a special page or something that wouldn't require a developer to make that list, but:

  • Generating this list will probably take a few hours, or at least minutes, so making it a special page would require setting up a job to update it, rather than showing live results
  • I feel that showing that someone has an invalid signatures, without showing the signature itself, would be weird – but if we showed their signatures, that would also require adding a way for administrators to moderate them, in case trolls put something nasty in there
  • I don't want to spend more than an hour or two building this :)

Event Timeline

Change 994149 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Add maintenance script to list users with invalid signatures

https://gerrit.wikimedia.org/r/994149

Change 994149 merged by jenkins-bot:

[mediawiki/core@master] Add maintenance script to list users with invalid signatures

https://gerrit.wikimedia.org/r/994149

Change 994141 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@wmf/1.42.0-wmf.15] Add maintenance script to list users with invalid signatures

https://gerrit.wikimedia.org/r/994141

Tested on the beta cluster English Wikipedia: (https://en.wikipedia.beta.wmflabs.org/)

matmarex@deployment-mwmaint02:~$ mwscript CheckSignatures --wiki=enwiki
Shangkuanlc
Eallanw
Terminator534
Go!Luigi
Billy.james8390
Lancetu
Ashboom
Thebl00dy8r4v3
-- 8 invalid signatures --

Change 994141 merged by jenkins-bot:

[mediawiki/core@wmf/1.42.0-wmf.15] Add maintenance script to list users with invalid signatures

https://gerrit.wikimedia.org/r/994141

Mentioned in SAL (#wikimedia-operations) [2024-01-30T14:30:54Z] <logmsgbot> lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [[gerrit:994139|CommentParser: Ignore generated timestamp links (T356142)]], [[gerrit:994140|CommentParser: Ignore generated timestamp links (T356142)]], [[gerrit:994141|Add maintenance script to list users with invalid signatures (T356168)]]

Mentioned in SAL (#wikimedia-operations) [2024-01-30T14:32:26Z] <logmsgbot> lucaswerkmeister-wmde@deploy2002 matmarex and lucaswerkmeister-wmde: Backport for [[gerrit:994139|CommentParser: Ignore generated timestamp links (T356142)]], [[gerrit:994140|CommentParser: Ignore generated timestamp links (T356142)]], [[gerrit:994141|Add maintenance script to list users with invalid signatures (T356168)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-01-30T14:41:55Z] <logmsgbot> lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [[gerrit:994139|CommentParser: Ignore generated timestamp links (T356142)]], [[gerrit:994140|CommentParser: Ignore generated timestamp links (T356142)]], [[gerrit:994141|Add maintenance script to list users with invalid signatures (T356168)]] (duration: 11m 01s)

Mentioned in SAL (#wikimedia-operations) [2024-01-30T15:29:11Z] <Lucas_WMDE> START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript CheckSignatures enwiki | tee T356168

The maintenance script is still running, but so far, from a few spot checks, it seems like plenty of the users it prints are accounts with no edits who, for whatever reason, enabled fancysig and set their nickname to some random text. So I tried to put together a query for users with no edits (deleted or not) who have fancysig enabled and a nickname without a [ character (which IIUC is certainly invalid):

mysql:research@dbstore1008.eqiad.wmnet [enwiki]> SELECT COUNT(*) FROM user_properties up1 JOIN user_properties up2 ON up1.up_user = up2.up_user JOIN user ON up1.up_user = user_id JOIN actor ON actor_user = up1.up_user LEFT JOIN revision ON rev_actor = actor_id LEFT JOIN archive ON ar_actor = actor_id WHERE up1.up_property = 'fancysig' AND up1.up_value = 1 AND up2.up_property = 'nickname' AND up2.up_value NOT LIKE '%[%' AND rev_id IS NULL AND ar_id IS NULL;
+----------+
| COUNT(*) |
+----------+
|   157298 |
+----------+
1 row in set (1 min 59.984 sec)

I don’t know where those 157k users come from, but I doubt it’s worth pinging them in particular, to be honest. (The fancysig and nickname preferences both seem to be public, so it should be possible to run that query on the Toolforge replicas as well, if anyone’s very interested in the results. Make sure to use the revision_userindex and archive_userindex views if you do.)

(The maintenance script is at 186k rows output and counting, having reached user_id 17 million out of 47 million total; about 1% of users are being reported as invalid, it seems. That’s more than reported by the above query, so presumably there are plenty of users with an invalid signature who either did make some edits or whose signature is invalid in a more interesting way than just “doesn’t contain [”.)

Mentioned in SAL (#wikimedia-operations) [2024-01-30T19:27:44Z] <Lucas_WMDE> FINISHED lucaswerkmeister-wmde@mwmaint2002:~$ mwscript CheckSignatures enwiki | tee T356168 # -- 268378 invalid signatures --

I put the results in P55901 (NDAed just in case; warning, paste with 268379 lines ^^ also, if you’re piping this into anything, you might want to skip the final line “-- 268378 invalid signatures --”)

Thank you @Lucas_Werkmeister_WMDE. I guess I should have limited the script to only list active users in some manner…

I filtered that list now to only include users who edited a discussion page in the last 3 months (borrowing the query from the signatures.toolforge.org tool: https://github.com/AntiCompositeNumber/signatures/blob/5acbaed049a2d5232d6400f662c2e30febd11071/src/datasources/db.py#L61-L70).

The result is: P55995 (this paste is public, I think that's okay, since the data is available publicly in the replica databases). There are 307 usernames, most of them matching the list at https://signatures.toolforge.org/reports/en.wikipedia.org.

@HouseBlaster Does that work for you?

This is perfect; thank you so much!!