Page MenuHomePhabricator

Regularly purge expired temporary userrights from DB tables
Closed, ResolvedPublic3 Estimated Story Points

Description

New description on 2017-09-05
Later investigation unveiled that in fact it is MediaWiki that does not update user_groups table with regards to users with temporary permissions until a new promotion or demotion on that wiki happens again. It'd be nice to have a script that could purge from time to time those rows (Patch-For-Review submitted).

Old description

(change visibility) 01:39, 20 July 2017 Matiia (talk | contribs | block) changed group membership for Matiia@astwiki from (none) to CheckUser (temporary, until 01:40, 20 July 2017) (per request)

That was a one-minute temporary permission on astwiki. It seems it expired as no entry on Special:ListUsers there shows any user in the 'checkuser' group, but when I was doing T176578 I noticed that the tool was not giving bad results, but that the database has the wrong data:

MariaDB [astwiki_p]> select count(*) from user_groups where ug_group = "checkuser";
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.01 sec)

MariaDB [astwiki_p]> select ug_user from user_groups where ug_group = "checkuser";
+---------+
| ug_user |
+---------+
|   26774 |
+---------+
1 row in set (0.01 sec)

So I approached @Marostegui if he could check the live production tables and the results, which can be viewed at P6043 shows the same.

As such, it seems temporary userrights does not clean up on very short temporary userrights. We were told that this could take some minutes but this has been sitting there for more than a month. Other projects are also affected.

The system does, however, recognize the right expiry:

MariaDB [astwiki_p]>  select * from user_groups where ug_group = "checkuser";
+---------+-----------+----------------+
| ug_user | ug_group  | ug_expiry      |
+---------+-----------+----------------+
|   26774 | checkuser | 20170720014010 |
+---------+-----------+----------------+
1 row in set (0.00 sec)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I understand that updating the very second they expire would be overkill in the current state of things, but maybe what you mention, update more often in terms of days (once a day or a week) would be a good start IMHO.

I would suggest invalid or creating tasks for the tools.

One option is to include the where condition to the view used on toollabs. Than no expired rows would be shown when select data, but the group would than missing in user_former_groups, because that is created on purge.

But there are many tables with an expiry row and it makes maintenance harder, when all get the where condition.

I know my opinion does not mean much, but I still feel this is a valid task. Patching the tools is okay, but leaving those expired groups there until somebody else is granted or removed a permission in that wiki is not okay IMHO. MediaWiki should regularly check (purge?) the user_groups table and move expired user groups automatically to user_former_groups in a regular basis. More so if user_former_groups is used by other tools or MediaWiki features such as autopromotions.

We could certainly think about running a cronjob for purging expired user groups.

We could certainly think about running a cronjob for purging expired user groups.

I definitely agree with that - however it should be done in addition to patches on the tools querying that data, not instead of.

EddieGP renamed this task from Temporary userrights not expiring in DB tables to Regularly purge expired temporary userrights from DB tables.Sep 26 2017, 11:20 PM
kaldari set the point value for this task to 3.Oct 3 2017, 10:43 PM

If this is going to be done by a maintenance script and a cron job, you may want to look at how PageTriage or PageAssessments does this as an example. The maintenance script would live in the core MediaWiki repo and the cron script that runs maintenance script the would live in the operations/puppet repo (probably in modules/mediawiki/manifests/maintenance/). @Dzahn can help with this.

Yep, once you know what command exactly you want to run you can ping me about adding that to a puppetized cron job. And yes, modules/mediawiki/manifests/maintenance/ is the right place to look at to see existing examples.

..of course pending that DBAs are ok with this running as cron.

EddieGP added a project: DBA.

..of course pending that DBAs are ok with this running as cron.

Adding DBA for that approval. Once that's done, I'll have a look at creating the maintenance script.

Legoktm subscribed.

I agree with T176754#3636245 and am declining this task. MediaWiki offers no guarantee of stability for the database schema to tools, they need to be updated to account for ug_expiry. Adding a cron job to delete rows doesn't even fix the bug as asked - the rows will still be around for a period of time after they are expired.

The bug about groups not being immediately added to user_former_groups *is* a valid bug, but a solution to that should be discussed and considered independently to this one. I don't think a cron job would be considered an acceptable solution for that bug.

Sorry, I dare to totally disagree here. A cron job will clean the expired rows from time to time which is a solution for this problem. If not immediatelly, at least some maintenance will happen, and I'm totally okay with that. People above seems to support this idea, so this closure tastes innapropriate to me with all due respect. And while not super high priority, some of us agree that this will be useful. As for user_former_groups, I'll open a new task as requested.

This is my (I hope) neutral evaluation of the issue:

  • There is no guarantee of stability for the database to tools (wikireplicas). I 100% agree with that. Tools developers should be update the tool inner working with HEAD, and mediawiki should not be blocked on an unsupported stability API (direct MySQL access through the MySQL protocol). Users wanting stability should use the API for that. The previous internal meaning of the user rights was "if the row exists, the rights is applied". After certain code change, the meaning of the rows changed to "if the rows exists and now() < expiration_time", the right is applied. Tools should be modified to reflect this meaning change
  • Purging old rows is desirable. There should be 0 expectation of immediately purge old rows (because of the above), but I do not see a problem with the rows being purged from time to time (in addition to on next modification). I understand that in some wikis that can be a long time, and it could be considered a bug on itself, but a very very low priority
  • I would keep this open with priority low IF someone wants to work on it (but no expectation from others to work on it). Given the potential nature of a purge, the change should not be expected to be deployed with high priority, not expected high prorority from roots. I can take care of reviewing and deploying it, but 0 expectation should be taken of my availability.

If someone wants to contribute code, we should be ok with it, but with 0 expectations of not being blocked if it doesn't meet proper quality control.

If purging is undesirable on production, which is something I can agree with, wikireplicas can create "compatibility layers" like a view that shows only non-expired rows as a compromise option (which is transparent to wikireplica users). Again, code would be asked by the proponents of the feature in the form of a patch to the view creation system, with no expectation from upstream developers. I think if the ticket was renamed as a problem description, rather than a single specific solution, more people would be open to the changes. I think this is a common issue that leads to frustration :-) This is just my €0.02, feel free to disagree.

@jcrespo Thank you. Point number two is what people above and myself have agreed would be optimal (old rows be purged from time to time, okay with it not being immediate).

As for the description, feel free to amend it (you or anyone) to actually describe that the second point of your statement is what is being requested here, if it helps people to better understand the issue. English is not my mothertongue and I have difficulties to properly write what I'm thinking, and I'm sure people also will realise that I'm not a developer so I'm not expected to know the inners of the whole MediaWiki, so that's an additional barrier. Again, feel free to rename the task.

TTO agreed with the regular cleanup on T176754#3637625 so EddieGP did at T176754#3637737. Other comments below say that there's willingness to work on this one. Thanks.

Legoktm was totally legitimate about closing the ticket with the given description; I do the same sometimes if the proposal is not right; changing it to indicate a particular issue involving tools only (smaller scope); instead of modifying production behaviour would be a more open approach leading to a proper solution of the original problem, rather than imposing a specific solution only.

If purging is undesirable on production, which is something I can agree with, wikireplicas can create "compatibility layers" like a view that shows only non-expired rows as a compromise option (which is transparent to wikireplica users).

Personally, I would prefer that we do the purging at the source. It's confusing to me when there are big differences between the wikireplica data and the production data, and I'd like to keep them as synced as practically possible (with appropriate differences for security and privacy). Is there a down-side to purging in production?

When digging a bit further into this, I found that it was somewhat (ableit for other reasons) discussed in T153817 and @MaxSem had https://gerrit.wikimedia.org/r/#/c/350961/ for it: When mw does the "get all groups for user X" operation, it already has to check if one of the results it got is about an expired group and filter that one out. Whenever an expired row was seen in that step, the patch would schedule a job (which just called UserGroupMembership::purgeExpired(); ) to clean up the table. I'd love to hear what the "different direction" mentioned when abandoning that patch was.

@jcrespo: Jaime, sorry but I don't see it that way. As much as I respect Legoktm, I just felt him slamming the door in fellow volunteer faces' when several people after me agreed (e.g. @TTO in T176754#3637625 and @EddieGP in T176754#3637737) this can be solved with a maintenance script and a cron job, and there are people interested and willing to work on this.

Again, if the title is the problem, feel free to amend it. Would Create a maintenance script to purge expired user rights and run it via a puppet cron be a better title? If it is not, I'm open to suggestions.

In any case, I find it concerning that valid reports can just be shut down because the reporter didn't find the best words to describe them. I think that's a disservice to MediaWiki and its thousands of daily users. You can't expect people to know as deep as you how MediaWiki works nor to write in perfect English. Spending few minutes in reading the whole ticket or asking questions if things ain't clear is always useful and certainly sheds light on the issues. Also, sometimes when you report a problem with something, you end noticing that the problem is with something else, which is what in part has happened here.

Thank you.

@EddieGP: I believe there were performance concerns with that approach. My understanding is that Aaron preferred that we do something more passive (like this).

@MarcoAurelio: I can't speak for Lego, but I imagine his concern was that this was the wrong solution to the problem of tools like https://tools.wmflabs.org/rightstool/ not working correctly. There is indeed the possibility that this solution will mask bugs in various toolforge tools (and possibly even in MediaWiki), so that's a valid point. In the long run, though, I think it makes sense for us to purge data that is no longer useful, so I'm sympathetic to implementing this.

@kaldari Tool Labs is mentioned here because I discovered the 'root' of this issue via a toolforge tool. The issue here is that MediaWiki is keeping, with regards to temporary user rights, old and incorrect data as you correctly mention as well as others. I'm certainly sorry if my meaning was not clear, and as @EddieGP mentions, tool mantainers should, in addition to this work, amend their DB queries so their results are more accurate; but this is all about how MediaWiki works, not Toolforge tools. Regards.

Change 382218 had a related patch set uploaded (by EddieGP; owner: EddieGP):
[mediawiki/core@master] Add maintenance script: Clean up expired userrights

https://gerrit.wikimedia.org/r/382218

EddieGP lowered the priority of this task from Medium to Low.Oct 4 2017, 8:00 PM

The issue here is that MediaWiki is keeping, with regards to temporary user rights, old and incorrect data as you correctly mention as well as others.

The data in user_groups isn't incorrect, there just happens to be a new column of relevant data. Tools not recognizing that new column are a valid problem. Not ensuring user_former_groups is populated correctly (and thus not ensuring that tools relying on user_former_groups work) is a valid problem. Keeping expired rows in that table is not a valid problem by itself.

Adding a cronjob is nice-to-have because it'll reduce the impact both of the mentioned valid problems will have (it won't solve any of them though) and because it's nice to clean up from time to time, even more so if it'll reduce impact other valid bugs might have. But keeping those rows around is not a problem by itself. Even less one that would be worth spending the paid time of wmf staff on it. As Jaime mentions, there's nothing wrong with a volunteer working on it if they want, as long as everybody is aware of it being low-priority. And tbh I think I'd agree to decline this if nobody would have expressed interest in working on it.

The data in user_groups isn't incorrect, there just happens to be a new column of relevant data. Tools not recognizing that new column are a valid problem.

Is "Tool X" has stopped working a valid bug? I think it is, but that should be the reported error, not assuming a specific behavior on production. The response to such a bug report could be multiple things: cron job on production (which doesn't fix the problem), job queue changes, ttl implementation on mediawiki/mysql, tool rewriting, or special views on wikireplicas with different names (like revision_userindex). Semantics, both on the database and on reporting bugs are important. Lego said (which the support of some) that the bug was invalid. That doesn't mean other tickets could be opened (or the same, with slightly different wording- in fact he proposed that himself). Invalid doesn't mean your problem isn't real, it means the, at the moment, the suggested solution (which based on the description seemed to be deleting rows exactly on expiration time, and making production changes because tool logic was outdated) was not ideal- and he is still right about that. The original description said:

" it seems temporary userrights does not clean up on very short temporary userrights"

This seems to be non-true as it is literally expressed; and for the casual reader it is likely to be marked as invalid- the correct followup is to correct the description as the knowledge of the issue advances to indicate each time the most accurate overview of the issue (and nobody has yet done that). I 100% believe that if the original description was "Tool X, Y and Z stopped working" nobody could touch the task because that is essentially true. Invalid doesn't make your problem invalid. When one has thousands of ticket to attend, clear communication is important; we are here to help, I hope you can believe that, and nobody intends badly. An "evil non-volunter" (and I am joking here, of course) closed the task, but many other "evil non-volunteers" both supported other directions and would have reopened the task (myself, for example), once there was better understanding of the problems. Let's not give more meaning to closing and opening new tasks than to updating descriptions to obsolete tasks for pure coordination. :-)

Let's stop meta-talking about the issue, and start working on it, ok? Now you will be on the point of sight if you do not propose actual, deployable code :-P

Change 382218 merged by jenkins-bot:
[mediawiki/core@master] Add maintenance script: Clean up expired userrights

https://gerrit.wikimedia.org/r/382218

Let's stop meta-talking about the issue, and start working on it, ok?

Amen.

Now you will be on the point of sight if you do not propose actual, deployable code :-P

I had already done that when you wrote that sentence, see gerritbot's comment :-P

The maintenance script is merged into mediawiki master now. I'm not going to SWAT this, so it'll go out with the train, which means the maintenance script itself will be available on all wikis on Fri, 13th Oct. That means a cronjob can't be deployed before Mon, Oct 16 (it'd throw "file not found" errors). Also likely ops want to test in on beta-cluster for a few days/weeks or run the script manually a few times to check it works fine before deploying a cron job which does it unattended. So it'll take a while until this has an effect in production.

Change 382631 had a related patch set uploaded (by EddieGP; owner: EddieGP):
[operations/puppet@production] [DNM] Add cron job for expired userrights maintenance script

https://gerrit.wikimedia.org/r/382631

If purging is undesirable on production, which is something I can agree with, wikireplicas can create "compatibility layers" like a view that shows only non-expired rows [..]

Personally, I would prefer that we do the purging at the source. It's confusing to me when there are big differences between the wikireplica data and the production data [..]

For the record, this proposal wasn't about changing data in the wiki replicas. Rather, about creating a filtered view (aka "pseudo table") that is essentially the result of SELECT * FROM user_group where ug_expiry > NOW(). This isn't too complicated , and we already do this in other cases (e.g. to provide anonymised access to user_properties, for example). The rationale for doing this would be unrelated to the current task (which is about purging redundant/unused data from the database, purely for space and query speed reasons). The schema has changed and queries that are meant to be accurate, must be updated. However for back-compat, we could hide expired rows in the replicas for tools. This remains a valid proposal even with this task closed given that purging can and may still happen days or weeks after the fact and is by no means required or important to run frequently.

Per comments given on the patch, it will need a +1 from DBA to proceed. I'm moving this on your workboard so you are aware this needs your input, without any expectations when it may be done :)

Some notes/links for easier review: Frequency for the cron is twice a month (see puppet patch), it'll run the maintenance script for all wikis. The maintenance script itself can be found here in diffusion. However the script isn't really interesting as it just calls another function, which can be found here (diffusion again) . The db queries run are basically

  1. SELECT ug_user, ug_group, ug_expiry FROM user_groups WHERE ug_expiry < $now
  2. DELETE FROM user_groups WHERE ug_expiry < $now
  3. insert the user<->group pairs we selected in the first step into user_former_groups

That requires an order-by, a limit on a loop, and a waitfor replica on every loop step. We have to guarantee the number of rows purged are smaller than a certain quantity (and that they create no lag).

Also race conditions should be eliminated by making sure concurrent runs do not insert the rows twice by locking or other concurrency control.

That requires an order-by, a limit on a loop, and a waitfor replica on every loop step

The number of rows being deleted here will virtually always be very small (one or two). The logic is that if there are many user rights that are expiring, there will also be many user rights being added from time to time, and expired user group rows are already being purged when new user rights are added. I cannot imagine a situation where there would be more than about 20 rows to purge from this script - even that would be rare, perhaps on a Wikimania wiki or something. Is it really worth the trouble to write a limit loop?

That requires an order-by, a limit on a loop, and a waitfor replica on every loop step

The number of rows being deleted here will virtually always be very small (one or two). The logic is that if there are many user rights that are expiring, there will also be many user rights being added from time to time, and expired user group rows are already being purged when new user rights are added. I cannot imagine a situation where there would be more than about 20 rows to purge from this script - even that would be rare, perhaps on a Wikimania wiki or something. Is it really worth the trouble to write a limit loop?

I would be more confident if we have some limit loops and a wait for replication. It might a small number of rows now, but this could be one of those things that are left to be running for months and no one remembers about and then it start getting bigger and bigger and causing issues.
If it is not a lot of trouble, I would prefer to have those throttling mechanisms from the start, so we can "forget" about it :-)

will always be very small

And recentchanges from wikidata will be a small percentage... https://en.wikipedia.org/wiki/Defensive_programming

Is it really worth the trouble to write a limit loop?

Yes.

will always be very small

And recentchanges from wikidata will be a small percentage... https://en.wikipedia.org/wiki/Defensive_programming

Very good point :)

Look, I ask for a loop and probably a SELECT... FOR UPDATE, that should be no more than a 3 line change; it is not like I am asking for a huge refactoring; and literally the code can be copied from the other hundreds of maintenance scripts :-D

Change 384429 had a related patch set uploaded (by EddieGP; owner: EddieGP):
[mediawiki/core@master] Be more db-friendly when purging expired userrights

https://gerrit.wikimedia.org/r/384429

@EddieGP any update on this? is this resolved (as per your submitted patch?)

JFTR, I answered marostegui on irc yesterday: That patch isn't merged yet, it especially needs review by DBA (likely @jcrespo as you already reviewed it once before?) to continue. I've got the impression that it's on their backlog (albeit low-priority, as mentioned above) and I'll just have to wait for that. Once it has been merged I'll let it ride the train and find someone to run it manually for all wikis, so that the first run is not done by an unattended cron. If that goes fine, we'll deploy the cron.

Afaics there's nothing I could do other than waiting for review at this point, let me know if I'm wrong aboout that :)

Change 384429 merged by jenkins-bot:
[mediawiki/core@master] Be more db-friendly when purging expired userrights

https://gerrit.wikimedia.org/r/384429

Next steps:

  1. Deploy https://gerrit.wikimedia.org/r/#/c/384429/ (mw: be more db-friendly)
  2. Run /usr/local/bin/foreachwiki maintenance/purgeExpiredUserrights.php on terbium (which is exactly what the cron will do)
  3. If nothing fails, deploy https://gerrit.wikimedia.org/r/c/382631/ (puppet: add cron)

@zeljkofilipin: I'd like to do #1 and #2 in swat next week if possible (most likely I'd pick some EU window, so that's why I'm pinging you). I'm not sure whether #2 is a good fit for swat though or whether it'd need it's own window. To me the key Q is how long foreachwiki will take given the number of wikis - the script itself should be fast for each wiki, but it does wait for replication to catch up after each. Just want to sort out whether this fits in swat or not before I sign it up for that.

My advice is to do this off-SWAT. Talk to @greg and ask for a window to do 1 and then 2 (foreachwiki mwscript maintenance/purgeExpiredUserrights.php no matter how fast it is, will be run on 800+ Wiki and will take some time; and long-running scripts are always being run in their special windows afaik). As for 3, @Dzahn can review it and merge it on the puppet if 1 and 2 went well. Regards.

@EddieGP I would also recommend asking for a deployment window. This does not look like a simple config change or a urgent regression fix - that is what usually gets deployed during SWAT.

@EddieGP I would also recommend asking for a deployment window.

Thanks! JFTR, I've just done that.

Hi, sorry, my bugmail backlog is woefully long right now.

Let's plan for after wmf.22 is everywhere (so next week). Do you have an idea of how long this script run will take the first time?

It looks like the DBAs have reviewed this (but Eddie moved it on their board to "done", not them?), but do either @jcrespo or @Marostegui want to be involved/around when it happens?

Hi, sorry, my bugmail backlog is woefully long right now.

As with probably anybody working for the WMF. That's totally okay, I wasn't expecting you to read through the whole task to understand which detail we need your help with after just a tiny mention of "talk to greg". That's why I sent a mail with a short summary :)

Let's plan for after wmf.22 is everywhere (so next week). Do you have an idea of how long this script run will take the first time?

Given that there's not really a lot of workload (remember, purges already do happen whenever userrights are actively changed on a wiki, so there's nothing like a backlog here) , I'd anticipate 0-10 rows to purge per wiki (with quite a lot 0 and very few 10), which shouldn't take more than about a second (moving a few rows doesn't take that long). 1s*800+ wikis ~= 15m though, and of course it might stall at some point if it has to wait for replication for whatever reason - it'll do that even if the script itself is not the cause of the lag. All in all, I think we should be safe with a regular deployment window of 1 hour, even if something goes wrong.

but Eddie moved it on their board to "done", not them?

Earlier in the history, I've moved it from 'Blocked external/Not db team' into 'Backlog' on their board when I needed codereview from them to proceed. Jaime did that codereview a while ago and the patch is now merged, so I thought I should move it out of their backlog again.

Sorry, I was about to ask when exactly to do this on Monday evening, when some fire suddenly ripped me away from it and I got distracted by other things for the rest of the week. @greg I guess we could do it next week?

Sorry, I still haven't found someone to do this deploy/script run nor do I know if our DBAs need to be around when it happens.

I'm guessing this is going to be mostly a noop...

enwiki:

reedy@terbium:~$ mwscript eval.php enwiki
> $dbw = wfGetDB( DB_SLAVE );

> $res = $dbw->select( 'user_groups', '*', [ 'ug_expiry < ' . $dbw->addQuotes( $dbw->timestamp( time() ) ) ], '', [ 'LIMIT' => 100 ] );

> echo $res->numRows();
0

Do we know of any wiki that's using the feature at all?

I'm guessing this is going to be mostly a noop...
enwiki:

Not really a surprise. Don't want to repeat myself, but: We're already cleaning up all old rows for wiki X when rights changes happen at wiki X. The maintenance script is needed for small wikis where right changes are a thing that might not happen for months. On enwiki right changes probably happen so regularly that you'll hardly ever see expired rows in the db.

Do we know of any wiki that's using the feature at all?

Plenty, just go to https://meta.wikimedia.org/w/index.php?title=Special%3ALog&type=rights and search for "temporary".

I tested this script on deploymentwiki. It works but it does not print the number of rows cleaned. Do we know a way to make that happen, if that does *not* mean to rewrite the whole script or complicated coding? Thanks.

nor do I know if our DBAs need to be around when it happens.

Probably not. I would start though with small (or isolated) wikis to make sure the throttling and all the stuff works as expected.
ie: start with dewiki or s6 (frwiki, jawiki,ruwiki).

(Not sure if that is doable)

I don't think there's even any point doing that. Very few wikis have any rows to remove (as per what @EddieGP said above). Just deploy it.

Evidence:

select count(*) from user_groups where ug_expiry < '20180303143203';

Running foreachwiki sql.php with the query above... Then doing some parsing on the log output...

reedy@terbium:~$ grep count expiry.log | grep -v "=> 0"
aswikisource:      [count(*)] => 1
avwiki:      [count(*)] => 1
azwiki:      [count(*)] => 1
bat_smgwiki:      [count(*)] => 1
bawikibooks:      [count(*)] => 1
biwiki:      [count(*)] => 1
bnwiki:      [count(*)] => 1
cswiki:      [count(*)] => 1
cswiktionary:      [count(*)] => 1
dawikibooks:      [count(*)] => 1
dawiktionary:      [count(*)] => 1
dewikisource:      [count(*)] => 1
dewikiversity:      [count(*)] => 2
diqwiki:      [count(*)] => 1
elwikiversity:      [count(*)] => 1
enwikiversity:      [count(*)] => 2
eowikinews:      [count(*)] => 1
eowiktionary:      [count(*)] => 1
eswikiversity:      [count(*)] => 1
etwikiquote:      [count(*)] => 1
euwiktionary:      [count(*)] => 1
fiwikimedia:      [count(*)] => 1
foundationwiki:      [count(*)] => 1
iawikibooks:      [count(*)] => 1
iawiktionary:      [count(*)] => 1
igwiki:      [count(*)] => 1
ikwiki:      [count(*)] => 1
incubatorwiki:      [count(*)] => 1
iswikisource:      [count(*)] => 1
itwikivoyage:      [count(*)] => 1
itwiktionary:      [count(*)] => 1
jamwiki:      [count(*)] => 1
jbowiktionary:      [count(*)] => 1
jvwiki:      [count(*)] => 1
kowikiversity:      [count(*)] => 1
kswiki:      [count(*)] => 1
kywiki:      [count(*)] => 1
lnwiktionary:      [count(*)] => 1
loginwiki:      [count(*)] => 1
mrwiki:      [count(*)] => 2
mswiktionary:      [count(*)] => 1
nlwikiquote:      [count(*)] => 1
orwikisource:      [count(*)] => 1
pawiktionary:      [count(*)] => 1
plwikinews:      [count(*)] => 1
ptwiki:      [count(*)] => 21
ptwikibooks:      [count(*)] => 1
rowikivoyage:      [count(*)] => 1
sawikiquote:      [count(*)] => 1
siwiki:      [count(*)] => 1
sswiki:      [count(*)] => 1
tawikinews:      [count(*)] => 1
tlwiki:      [count(*)] => 1
trwikibooks:      [count(*)] => 1
trwikisource:      [count(*)] => 1
ukwikibooks:      [count(*)] => 1
ukwikisource:      [count(*)] => 1
urwikibooks:      [count(*)] => 1
viwiki:      [count(*)] => 1
viwikisource:      [count(*)] => 1
votewiki:      [count(*)] => 6
wikimania2006wiki:      [count(*)] => 1
xhwiki:      [count(*)] => 1
zuwiktionary:      [count(*)] => 1
reedy@terbium:~$

The most rows to purge is ptwiki at 21. This is nothing. It's a noop for most wikis. We have maintenance scripts purging a lot more rows than that on a regular basis.

I tested this script on deploymentwiki. It works but it does not print the number of rows cleaned. Do we know a way to make that happen, if that does *not* mean to rewrite the whole script or complicated coding? Thanks.

Yeah, should be easily doable

/**
	 * Purge expired memberships from the user_groups table
	 */
	public static function purgeExpired() {
		$services = MediaWikiServices::getInstance();
		if ( $services->getReadOnlyMode()->isReadOnly() ) {
			return;
		}

		$lbFactory = $services->getDBLoadBalancerFactory();
		$ticket = $lbFactory->getEmptyTransactionTicket( __METHOD__ );
		$dbw = $services->getDBLoadBalancer()->getConnection( DB_MASTER );

		$lockKey = $dbw->getDomainID() . ':usergroups-prune'; // specific to this wiki
		$scopedLock = $dbw->getScopedLockAndFlush( $lockKey, __METHOD__, 0 );
		if ( !$scopedLock ) {
			return; // already running
		}

		$now = time();
		do {
			$dbw->startAtomic( __METHOD__ );

			$res = $dbw->select(
				'user_groups',
				self::selectFields(),
				[ 'ug_expiry < ' . $dbw->addQuotes( $dbw->timestamp( $now ) ) ],
				__METHOD__,
				[ 'FOR UPDATE', 'LIMIT' => 100 ]
			);

			if ( $res->numRows() > 0 ) {
				$insertData = []; // array of users/groups to insert to user_former_groups
				$deleteCond = []; // array for deleting the rows that are to be moved around
				foreach ( $res as $row ) {
					$insertData[] = [ 'ufg_user' => $row->ug_user, 'ufg_group' => $row->ug_group ];
					$deleteCond[] = $dbw->makeList(
						[ 'ug_user' => $row->ug_user, 'ug_group' => $row->ug_group ],
						$dbw::LIST_AND
					);
				}
				// Delete the rows we're about to move
				$dbw->delete(
					'user_groups',
					$dbw->makeList( $deleteCond, $dbw::LIST_OR ),
					__METHOD__
				);
				// Push the groups to user_former_groups
				$dbw->insert( 'user_former_groups', $insertData, __METHOD__, [ 'IGNORE' ] );
			}

			$dbw->endAtomic( __METHOD__ );

			$lbFactory->commitAndWaitForReplication( __METHOD__, $ticket );
		} while ( $res->numRows() > 0 );
	}

Just keep track of the number of rows modified, and return that at end the end of the function. I bet you can ask @EddieGP nicely to make the change ;)

Change 416231 had a related patch set uploaded (by EddieGP; owner: EddieGP):
[mediawiki/core@master] purgeExpiredUserrights: Show number of rows purged

https://gerrit.wikimedia.org/r/416231

I don't think there's even any point doing that. Very few wikis have any rows to remove (as per what @EddieGP said above). Just deploy it.

Maybe I'm overcautious here. I've never deployed a new cron/maintenance script before and in an "better safe than sorry" manner I thought we should test this attended. If the experts think that's not needed, then I'm not going to insist :-)

I tested this script on deploymentwiki. It works but it does not print the number of rows cleaned. Do we know a way to make that happen, if that does *not* mean to rewrite the whole script or complicated coding? Thanks.

Thanks for reminding me, we already had that suggestion at some point but I totally forgot about it.

Edit: JFTR, that feature doesn't block deployment of the cron/script run, it's just nice-to-have.

Change 416344 had a related patch set (by MarcoAurelio) published:
[operations/puppet@production] mediawiki: add cronjob to purge expired temporary userrights

https://gerrit.wikimedia.org/r/416344

Change 416344 abandoned by MarcoAurelio:
[DO NOT MERGE] mediawiki: add cronjob to purge expired temporary userrights

Reason:
Duplicate.

https://gerrit.wikimedia.org/r/416344

Change 416231 merged by jenkins-bot:
[mediawiki/core@master] purgeExpiredUserrights: Show number of rows purged

https://gerrit.wikimedia.org/r/416231

As a followup, and out of scope of this ticket- are the synchronous cleanups actually needed assuming this one works regularly (timid suggestion only)?

As a followup, and out of scope of this ticket- are the synchronous cleanups actually needed assuming this one works regularly (timid suggestion only)?

Not for wm. But other users mw probably don't run that maintenance script as a cron like we do. We could introduce a config variable to turn synchronous cleanup off/on (on by default, off for wm). I wonder what's the point though, because as far as I'm aware there's no harm in running it both regularly and synchronous.

Change 382631 merged by Dzahn:
[operations/puppet@production] Add cron job for expired userrights maintenance script

https://gerrit.wikimedia.org/r/382631

Thanks @Dzahn! With the cron being applied, this is now done. FTR: The cron runs on each 14th and 28th at 06:42.