Page MenuHomePhabricator

Deploy versioned maps support to production
Closed, ResolvedPublic3 Estimated Story Points

Description

Preparation:

Planned deployment date: 10 May 2022

Event Timeline

Change 786289 had a related patch set uploaded (by WMDE-Fisch; author: Awight):

[operations/mediawiki-config@master] Watch for mapdata cache misses in production

https://gerrit.wikimedia.org/r/786289

awight set the point value for this task to 3.
awight added subscribers: Jgiannelos, MSantos.
awight set Due Date to May 9 2022, 10:00 PM.Apr 29 2022, 11:30 AM
awight renamed this task from Deploy versioned maps to production to Deploy versioned maps support to production.May 2 2022, 11:35 AM

Change 788347 had a related patch set uploaded (by Awight; author: Awight):

[operations/mediawiki-config@master] Enable versioned maps everywhere

https://gerrit.wikimedia.org/r/788347

Change 786289 merged by jenkins-bot:

[operations/mediawiki-config@master] Watch for mapdata cache misses in production

https://gerrit.wikimedia.org/r/786289

Mentioned in SAL (#wikimedia-operations) [2022-05-10T07:48:24Z] <awight@deploy1002> Synchronized wmf-config: Config: [[gerrit:786289|Watch for mapdata cache misses in production (T304813 T300712)]] (duration: 00m 50s)

Change 788347 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable versioned maps everywhere

https://gerrit.wikimedia.org/r/788347

We discovered that cached old revisions won't be updated until they expire.

Mentioned in SAL (#wikimedia-operations) [2022-05-10T13:31:22Z] <awight@deploy1002> Synchronized wmf-config: Config: [[gerrit:788347|Enable versioned maps everywhere (T300712)]] (duration: 00m 50s)

Change 786289 merged by jenkins-bot:

[operations/mediawiki-config@master] Watch for mapdata cache misses in production

https://gerrit.wikimedia.org/r/786289

For future reference, it might be easier to reason about and less risky, to (temporarily) change the severity of a message in source code rather than the configuration of an entire channel, especially when that involves the level debug as that makes it very easy to accidentally introduce high-traffic noise into a hot code path that then floods production Logstash, either by our future selves after we forget, or by an unsuspecting person adding it relatively soon not expecting that level to be enabled for prod traffic.

Having said that, in this particular case the kartographer channel was not yet enabled in production at all, so the patch would have been very much the same, just possibly set to info or warning instead of debug; and the source changed to call warning() for the message in question. I suspect this would be easier to scale over time and less likely to be forgotten and then lead to unexpected impact. This was an unsollicited viewpoint that I hope is of some use. I recognise that you probably would not have forgotten to raise it from "debug" to something higher within a few weeks, and probably no-one would have introduced such a message before then.

For future reference, it might be easier to reason about and less risky, to (temporarily) change the severity of a message in source code rather than the configuration of an entire channel, especially when that involves the level debug as that makes it very easy to accidentally introduce high-traffic noise into a hot code path

Thanks, this is a good reminder. I'll create a task to disable debug logging to make it harder to overlook this important step.