Page MenuHomePhabricator

Re-imaged mw app servers can end up with missing l10n cache for old versions of MW needed for rollback
Closed, DeclinedPublic

Description

Seen during the rollback today.

Various re-imaged codfw servers (T245757) are throwing errors about no l10ncache, as they presumably did scap pull when all wikis were already on .28, so .27 wasn't rebuilt on those servers. This means when the train was reverted all back to .27, they had no l10n cache waiting to go

No obvious user facing impact (this time!), but icinga is obviously unhappy. And if happened on app servers in the DC currently serving MW.. It could be a much less happy place

[22:35:36] <Reedy> we should make nostalgiawiki always be a version behind
[22:35:41] <Reedy> then we can be nostalgic for "last week"

Event Timeline

Reedy updated the task description. (Show Details)

It sounds like sync wikiversions should check to make sure the l10n cache exists on each appserver for any not-already-in-use versions being added before actually syncing.

And on the reimage side, if there could be a scap command that would rebuild the l10n cache for N-1 version in case of a rollback, we can run that after reimaging.

Aklapper renamed this task from Re-imaged mw app servers can end up with missing l10n cache for old versons of MW needed for rollback to Re-imaged mw app servers can end up with missing l10n cache for old versions of MW needed for rollback.Jan 30 2021, 11:21 AM

This happened again yesterday during the rollback.

We proceeded with the wider work without fixing this task, so I'll remove it as a blocker.

thcipriani subscribed.

The move to mw-on-k8s should obviate the need for this work.