Page MenuHomePhabricator

Migrate CentralAuth maintenance jobs to mw-cron
Closed, ResolvedPublic

Description

Migrate MediaWiki-extensions-CentralAuth periodic mediawiki jobs from mwmaint to mw-cron on kubernetes.

  • centralauth-backfillLocalAccounts.php-loginwiki
  • centralauth-backfillLocalAccounts.php-metawiki
  • purge_expired_userrights (core)
  • purge_expired_global_rights
  • purge_temporary_accounts

Doc on the new platform

ServiceOps new will handle migrating the jobs, but would appreciate input from MediaWiki-Platform-Team on:

  • jobs that should be watched more closely
  • jobs that are low criticality and could be migrated first
  • any potential gotchas in the way these jobs use MediaWiki

Event Timeline

All of these are low-risk (in the sense that if the job stops working for a few days or weeks, that's not much of a problem, as long as we notice eventually).
They all do DB writes to primary tables, so writing incorrect data would be bad, but I don't think a host change could result in that.

backfillLocalAccounts does cross-DB reads from all the wiki databases; other than that, I don't think any of the jobs do anything unusual.

All of these are low-risk (in the sense that if the job stops working for a few days or weeks, that's not much of a problem, as long as we notice eventually).

Thanks, we will be looking at logs once we do migrate them.

They all do DB writes to primary tables, so writing incorrect data would be bad, but I don't think a host change could result in that.

It's a bit more than a host change, they are going to be redefined from systemd timers to kubernetes CronJobs. They will run in the same environment as current mediawiki production, or mw-script. Which actually gives me an idea, can these scripts be run manually in-between scheduled runs? If so we could pre-test some of them by running them through mw-script to validate that they run normally in a containerized environment.

backfillLocalAccounts does cross-DB reads from all the wiki databases; other than that, I don't think any of the jobs do anything unusual.

Thanks, I don't think that should be a problem.

Yeah, it should be fine to run any of them manually.

All of these are low-risk (in the sense that if the job stops working for a few days or weeks, that's not much of a problem, as long as we notice eventually).
They all do DB writes to primary tables, so writing incorrect data would be bad, but I don't think a host change could result in that.

backfillLocalAccounts does cross-DB reads from all the wiki databases; other than that, I don't think any of the jobs do anything unusual.

Cross-referencing with the list of timers on mwmaint2002, it would seem that the backfillLocalAccounts jobs are not actually defined or ran at this moment. Is this an oversight that should be remedied, or does that mean they don't need to run?

...

Cross-referencing with the list of timers on mwmaint2002, it would seem that the backfillLocalAccounts jobs are not actually defined or ran at this moment. Is this an oversight that should be remedied, or does that mean they don't need to run?

They certainly do need to run; if the timer isn't set up right, that's my oversight in the original patch. And that's both on metawiki and loginwiki.

Change #1131025 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] alertmanager: Add mediawiki-platform-task

https://gerrit.wikimedia.org/r/1131025

Change #1131025 merged by Clément Goubert:

[operations/puppet@production] alertmanager: Add mediawiki-platform-task

https://gerrit.wikimedia.org/r/1131025

Change #1143197 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:mw::maint::temporary_accounts: purge_temporary_accounts to k8s

https://gerrit.wikimedia.org/r/1143197

Change #1143198 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:mw::maint::purge_expired_userrights: purge_expired_userrights to k8s

https://gerrit.wikimedia.org/r/1143198

Change #1143199 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:mw::maint::purge_expired_userrights: purge_expired_global_rights to k8s

https://gerrit.wikimedia.org/r/1143199

Change #1143226 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:mw::maint::backfill_localaccounts: backfillLocalAccounts-loginwiki to k8s

https://gerrit.wikimedia.org/r/1143226

Change #1143227 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:mw::maint::backfill_localaccounts: backfillLocalAccounts-metawiki to k8s

https://gerrit.wikimedia.org/r/1143227

I've posted a handful of patches to migrate the periodic jobs tracked here.

@Tgr - Just to confirm per your comment in T385866#10533989, for the two expired user-rights jobs in [0] that only run twice per month, it should be safe to manually trigger a run earlier than planned in order supervise it, correct?

[0] https://gerrit.wikimedia.org/g/operations/puppet/+/2d855c2c39a26c739f442038e75e57be33ee20f8/modules/profile/manifests/mediawiki/maintenance/purge_expired_userrights.pp

Yeah, those jobs don't presuppose anything about how often they are run. The expiry times are taken from the database.

Great, thanks for confirming, @Tgr - I'll get started migrating these first thing next week, and I'll keep that in mind as an option.

Change #1143197 merged by Scott French:

[operations/puppet@production] P:mw::maint::temporary_accounts: purge_temporary_accounts to k8s

https://gerrit.wikimedia.org/r/1143197

purge_temporary_accounts is now migrated, which I'll verify tomorrow after the first scheduled k8s-based run (14:27).

Next up: per-wiki and global jobs in profile::mediawiki::maintenance::purge_expired_userrights, which I'll aim to do tomorrow as well.

Also stealing this from @Clement_Goubert, who I realized only today was the still the assignee :)

Change #1143198 merged by Scott French:

[operations/puppet@production] P:mw::maint::purge_expired_userrights: purge_expired_userrights to k8s

https://gerrit.wikimedia.org/r/1143198

Change #1143199 merged by Scott French:

[operations/puppet@production] P:mw::maint::purge_expired_userrights: purge_expired_global_rights to k8s

https://gerrit.wikimedia.org/r/1143199

Updates:

  • The first run of purge-temporary-accounts appears to have completed successfully earlier today.
  • The purge-expired-userrights and purge-expired-global-rights jobs have now been migrated as well.
    • Their next scheduled executions are on the 14th and 17th, respectively.
    • I'll validate the former tomorrow once it has run. For the latter, we may want to trigger a manual run so as to avoid having the first run fall on a weekend.
  • Next-up: the hourly loginwiki and metawiki account backfill jobs in profile::mediawiki::maintenance::backfill_localaccounts.

The first post-migration run of purge-expired-userrights succeeded earlier today. I also triggered a manual run of purge-expired-global-rights, which also succeeded.

Change #1143226 merged by Scott French:

[operations/puppet@production] P:mw::maint::backfill_localaccounts: backfillLocalAccounts-loginwiki to k8s

https://gerrit.wikimedia.org/r/1143226

Change #1143227 merged by Scott French:

[operations/puppet@production] P:mw::maint::backfill_localaccounts: backfillLocalAccounts-metawiki to k8s

https://gerrit.wikimedia.org/r/1143227

Scott_French updated the task description. (Show Details)

The first post-migration hourly runs of centralauth-backfilllocalaccounts.php-loginwiki and centralauth-backfilllocalaccounts.php-metawiki have completed successfully. The logs contain roughly the same content as that from the last last pre-migration timer runs (i.e., reporting autoCreateUser failures due to IP blocks).