Page MenuHomePhabricator

Review/update wikitech-static syncing after wikitech moves to Kubernetes
Closed, ResolvedPublic

Description

Wikitech-static is documented here: https://wikitech.wikimedia.org/wiki/Wikitech-static

Right now it syncs via a cron running on the wikitech host (cloudweb*). That job, or something similar, will need to be implemented someplace to consume the new k8s-hosted wikitech and provide it to wikitech-static.

Alternatively, we could entirely re-engineer wikitech-static to be a static site that does some kind of recursive pull from public wikitech. If we coupled that to T304688 (moving wikitech-static to a new VM) then this could be done at any time, ideally prior to the k8s migration. Unfortunately I haven't figured out a way to make such a static site searchable.

UPDATE 3 Oct 2024

After switching Wikitech to k8s, we had to move faster as to ensure wikitech-static remains up to date.

other dumps

Our dumps/snapshot infra already has support to run systemd timers which run dump-related scripts, for a number of miscellaneous services .
Those dumps in turn are publicly served under dumps.wikimedia.org/other/. It makes sense to make wikitech's dump yet-another-other dump.

Why?
  • wikitech's dumps are now not a snowflake, but we are reusing processes/code that already exist and work
But dumps 1.0 will be deprecated!
  • It is ok, the same solution that will work for all the "other" dumps, will work for wikitech's dump as well.
Where?

https://dumps.wikimedia.org/other/wikitech/

Event Timeline

cc'ing @Dzahn because he's done some wikitech-static maintenance in the past and might be interested in this. Just a shot in the dark!

jijiki removed jijiki as the assignee of this task.Sep 5 2024, 1:56 PM

Fwiw, the sync is now broken since https://wikitech.wikimedia.org/dumps/ is no longer served on Kubernetes.

Thanks @taavi for pointing it out, we'll try to find a temporary bandaid to keep dumps running, and then re-design it

Change #1077684 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] WIP: Migrate wikitech dumps to snapshot servers

https://gerrit.wikimedia.org/r/1077684

Change #1077684 merged by Ladsgroup:

[operations/puppet@production] modules::snapshot: Migrate wikitech dumps to snapshot servers

https://gerrit.wikimedia.org/r/1077684

Change #1077733 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] wikitechdumps: Fix path to the exec file

https://gerrit.wikimedia.org/r/1077733

Change #1077733 merged by Ladsgroup:

[operations/puppet@production] wikitechdumps: Fix path to the exec file

https://gerrit.wikimedia.org/r/1077733

Change #1077736 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] wikitechdumps: Update the path to mwscript

https://gerrit.wikimedia.org/r/1077736

Change #1077736 merged by Ladsgroup:

[operations/puppet@production] wikitechdumps: Update the path to mwscript

https://gerrit.wikimedia.org/r/1077736

Change #1077741 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/wikitech-static@master] Update URL to wikitech dumps

https://gerrit.wikimedia.org/r/1077741

Change #1077744 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] wikitech: Get rid of the old mw-xml dumper file and cron

https://gerrit.wikimedia.org/r/1077744

jijiki changed the task status from Open to In Progress.Oct 3 2024, 4:01 PM
jijiki triaged this task as Unbreak Now! priority.
jijiki updated the task description. (Show Details)

Change #1077741 merged by Effie Mouzeli:

[operations/wikitech-static@master] Update URL to wikitech dumps

https://gerrit.wikimedia.org/r/1077741

Change #1077440 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] dumps: Stop fetching custom Wikitech dumps

https://gerrit.wikimedia.org/r/1077440

Change #1077440 merged by Ladsgroup:

[operations/puppet@production] dumps: Stop fetching custom Wikitech dumps

https://gerrit.wikimedia.org/r/1077440

Change #1077744 merged by Ladsgroup:

[operations/puppet@production] wikitech: Get rid of the old mw-xml dumper file and cron

https://gerrit.wikimedia.org/r/1077744

jijiki claimed this task.

@Ladsgroup and I believe this works alright, https://wikitech-static.wikimedia.org looks as expected. I will check tomorrow if there is anything else we may have missed, and mark it as done. Further code cleanups will be worked T371378

jijiki reopened this task as In Progress.Oct 3 2024, 4:20 PM
jijiki lowered the priority of this task from Unbreak Now! to Low.

Change #1078026 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] dumps: Cleanup absented resources

https://gerrit.wikimedia.org/r/1078026

Change #1078026 merged by Majavah:

[operations/puppet@production] dumps: Cleanup absented resources

https://gerrit.wikimedia.org/r/1078026

I noticed there's an alert firing, probably related to this work:

MWVERSION WARNING - wikitech-static.wikimedia.org is running MediaWiki 1.42.1, latest is MediaWiki 1.42.3 Consult https://wikitech.wikimedia.org/wiki/Wikitech-static for details.

It's because there were MW releases last week and no one has updated wikitech-static yet :)

I noticed this alert today:

image.png (190×545 px, 28 KB)

I'm not sure if it is related to the work in this ticket.

Mentioned in SAL (#wikimedia-operations) [2024-10-29T12:42:44Z] <claime> Killed dead and stacked import-wikitech.sh processes on wikitech-static - T374114

Mentioned in SAL (#wikimedia-operations) [2024-10-29T12:43:06Z] <claime> Manually relaunched import-wikitech.sh on wikitech-static - T374114

The import process seems to stall out after a few hundred pages, I'll try to upgrade mediawiki on wikitech-static and rerun the import disk is full, looking for what can be cleaned up.

Mentioned in SAL (#wikimedia-operations) [2024-10-29T15:00:19Z] <claime> Running php maintenance/deleteArchivedFiles.php --delete on wikitech-static - T374114

Mentioned in SAL (#wikimedia-operations) [2024-10-29T15:08:33Z] <claime> Running find /srv/mediawiki/images/wikitech/archive -type f | xargs rm on wikitech-static - T374114 T348503

I am closing this in favour of T376400. If any other issues pop up, it would be great if we would create a new task, since the work as described here, is completed.

@Clement_Goubert cheers for sorting the out of disk issue!