Another way to catch problems like T112401
Description
Details
Related Objects
Event Timeline
There is no practical means for querying a wiki's deployment group (there is no stable interface for .dblist files in operations/mediawiki-config.git), and the groups are not permanent enough to hard-code. So this is not feasible currently. Let's re-open this if and when we have a queryable deployment system.
What do you mean by "stable interface"? There's a group0.dblist, group1.dblist, and we could create a group2.dblist if we really needed. And there's PHP code to read and parse those files.
But more importantly, what alternative tools/solutions can we use to make sure regressions like T112401 don't happen again?
Yes, but we need the information on hafnium in a very lightweight Python process that processes Navigation Timing packets in real-time to send further to statsd/graphite.
A few options I see in the current infrastructure:
- Change the navtiming.py subscriber to maintain a copy of these dblist files, sync them from time to time (to disk and to memory), and have it do lookups for each packet based on the EventLogging 'wiki' field (see https://meta.wikimedia.org/wiki/Schema:EventCapsule).
- Export this information client-side as part of EventLogging (e.g. via a mw.config variable in the startup module somehow). And log it as part of the NavigationTiming packet.
- Change the core EventLogging server-side code to maintain a copy of the dblist files all sugar packets with this information as part of the EventCapsule.
Change 273988 had a related patch set uploaded (by Ori.livneh):
Add wgVersion to SaveTiming and NavigationTiming events
Change 273990 had a related patch set uploaded (by Ori.livneh):
Report save timing by MediaWiki version
Change 273988 merged by jenkins-bot:
Add wgVersion to SaveTiming and NavigationTiming events