Page MenuHomePhabricator

Migrate community-tech jobs to mw-cron
Closed, ResolvedPublic

Description

Migrate Community-Tech periodic mediawiki jobs from mwmaint to mw-cron on kubernetes.

Job nameCriticalityDone?
mediawiki_job_pagetriage_cleanup_en.timerx*
mediawiki_job_pagetriage_cleanup_test2wiki.timerx*
mediawiki_job_pagetriage_cleanup_testwiki.timerx*
mediawiki_job_purge_loginnotify.timerx
mediawiki_job_pageassessments_cleanup.timerx

Note: The PageTriage extension is actually owned by Moderator-Tools-Team, and T393395: Migrate moderator-tools jobs to mw-cron has been opened retroactively to capture that.

Doc on the new platform

ServiceOps new will handle migrating the jobs, but would appreciate input from Community-Tech on:

  • jobs that should be watched more
  • jobs that are low criticality and could be migrated first
  • outdated jobs that can be removed
  • any potential gotchas in the way these jobs use MediaWiki

Event Timeline

Change #1135753 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] alertmanager: Add team/project receivers for Phab

https://gerrit.wikimedia.org/r/1135753

Change #1135754 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] alertmanager: Add routing for task alerts

https://gerrit.wikimedia.org/r/1135754

Change #1135753 merged by Clément Goubert:

[operations/puppet@production] alertmanager: Add team/project receivers for Phab

https://gerrit.wikimedia.org/r/1135753

Change #1135754 merged by Clément Goubert:

[operations/puppet@production] alertmanager: Add routing for task alerts

https://gerrit.wikimedia.org/r/1135754

Change #1136038 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] PageTriage: migrate updatePageTriageQueue-test2wiki

https://gerrit.wikimedia.org/r/1136038

Change #1136038 merged by Scott French:

[operations/puppet@production] PageTriage: migrate updatePageTriageQueue-test2wiki

https://gerrit.wikimedia.org/r/1136038

Alright, mediawiki_job_pagetriage_cleanup_test2wiki has been migrated to mw-cron on k8s as a pilot. The next run will happen at 8:55 UTC tomorrow (Friday, April 18th), after which I'll verify that the job has succeeded.

Alright, the first run appears to have succeeded:

$ kubectl describe job pagetriage-cleanup-test2wiki-29082775
Name:             pagetriage-cleanup-test2wiki-29082775
[...]
Start Time:       Fri, 18 Apr 2025 08:55:00 +0000
Completed At:     Fri, 18 Apr 2025 08:55:06 +0000
Duration:         6s
Pods Statuses:    0 Active / 1 Succeeded / 0 Failed
[...]
$ kubectl logs pagetriage-cleanup-test2wiki-29082775-dshrn -c mediawiki-main-app
Started processing... 
cleanReviewedPagesAndUnusedNamespaces()... 
processed 1 
cleanRedirects()... 
processed 0 
Completed 

I'll follow up and begin migrating the remaining jobs when I return the week of the 28th, unless anyone else from ServiceOps new happens to do so in the interim.

Change #1139424 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] mw::maintenance: move remaining pagetriage jobs to k8s

https://gerrit.wikimedia.org/r/1139424

Change #1139424 merged by Hnowlan:

[operations/puppet@production] mw::maintenance: move remaining pagetriage jobs to k8s

https://gerrit.wikimedia.org/r/1139424

Change #1139923 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:mediawiki::maintenance::purge_loginnotify: migrate to k8s

https://gerrit.wikimedia.org/r/1139923

Change #1140266 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:mediawiki::maintenance::pageassessments: migrate to k8s

https://gerrit.wikimedia.org/r/1140266

Change #1139923 merged by Scott French:

[operations/puppet@production] P:mediawiki::maintenance::purge_loginnotify: migrate to k8s

https://gerrit.wikimedia.org/r/1139923

Change #1140542 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] Revert "P:mediawiki::maintenance::purge_loginnotify: migrate to k8s"

https://gerrit.wikimedia.org/r/1140542

Change #1140542 merged by Scott French:

[operations/puppet@production] Revert "P:mediawiki::maintenance::purge_loginnotify: migrate to k8s"

https://gerrit.wikimedia.org/r/1140542

For the record, https://gerrit.wikimedia.org/r/1139923 was reverted due to an issue with the rendered yaml, rather than an issue with the job itself. The yaml rendering issue should be fixed by https://gerrit.wikimedia.org/r/1140548.

I chatted with @MusikAnimal from Community-Tech earlier today, who confirmed there are no concerns with migrating the PageAssessments and LoginNotify jobs. However, it sounds like the already-migrated PageTriage jobs are actually owned by Moderator-Tools-Team. I'll fork those off to a separate task and follow up to update the alert routing.

Change #1141916 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:mediawiki::maintenance::purge_loginnotify: migrate to k8s

https://gerrit.wikimedia.org/r/1141916

Change #1141916 merged by Scott French:

[operations/puppet@production] P:mediawiki::maintenance::purge_loginnotify: migrate to k8s

https://gerrit.wikimedia.org/r/1141916

Change #1140266 merged by Scott French:

[operations/puppet@production] P:mediawiki::maintenance::pageassessments: migrate to k8s

https://gerrit.wikimedia.org/r/1140266

The LoginNotify and PageAssessments jobs have both been migrated. I'll follow up later today to confirm their first scheduled runs succeed (23:00 and 20:42 UTC respectively) before closing this out.

Change #1141946 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] mw::maintenance: update team for pagetriage jobs

https://gerrit.wikimedia.org/r/1141946

Scott_French claimed this task.
Scott_French updated the task description. (Show Details)

Both jobs have now had a successful first run:

$ kubectl describe jobs/pageassessments-cleanup-29107962
Name:             pageassessments-cleanup-29107962
[...]
Start Time:       Mon, 05 May 2025 20:42:00 +0000
Completed At:     Mon, 05 May 2025 20:42:08 +0000
Duration:         8s
Pods Statuses:    0 Active / 1 Succeeded / 0 Failed
[...]
$ kubectl logs jobs/pageassessments-cleanup-29107962 mediawiki-main-app
enwiki Projects before purge: 3522
enwiki Purging unused projects from page_assessments_projects...
enwiki Done.
enwiki Projects after purge: 3520
enwikivoyage Projects before purge: 14
enwikivoyage Purging unused projects from page_assessments_projects...
enwikivoyage Done.
enwikivoyage Projects after purge: 14
testwiki Projects before purge: 7
testwiki Purging unused projects from page_assessments_projects...
testwiki Done.
testwiki Projects after purge: 7
$ kubectl describe job/purge-loginnotify-29108100
Name:             purge-loginnotify-29108100
[...]
Start Time:       Mon, 05 May 2025 23:00:00 +0000
Completed At:     Mon, 05 May 2025 23:00:27 +0000
Duration:         27s
Pods Statuses:    0 Active / 1 Succeeded / 0 Failed
[...]
$ kubectl logs job/purge-loginnotify-29108100 -c mediawiki-main-app
$ # empty as expected

Remaining follow-up to fix notifications for PageTriage jobs will be tracked in T393395: Migrate moderator-tools jobs to mw-cron.

Change #1141946 merged by Scott French:

[operations/puppet@production] mw::maintenance: update team for pagetriage jobs

https://gerrit.wikimedia.org/r/1141946