Page MenuHomePhabricator

Changing dblist files requires mtime touch of InitialiseSettings.php
Open, NormalPublic

Description

I've had trouble yesterday (494218, 494707) and today (494806) with deploying dblists/commonsuploads.dblist during EU SWAT.

The patches restrict local uploads on mediawiki.org (T217523). There were no errors during scap deployment, just the local uploads were not disabled after deployment.

Yesterday, I've merged 494218 and pulled it to mwdebug1002, but it did not work, so I've reverted it with 494707.

Today, I've merged 494806 and pulled it to mwdebug1002, but it did not work again. I didn't think it would break anything so I've done a full deployment. It didn't work even then.

@Urbanecm and @MarcoAurelio suggested that touching wmf-config/InitialiseSettings.php would help, but I've rejected because it's not documented at SWAT deploys/Deployers.

I've decided to deploy a patch that was next in SWAT calendar (494225) that changes wmf-config/InitialiseSettings.php and see if then 494806 (dblists/commonsuploads.dblist) would work. To my surprise, 494806 (dblists/commonsuploads.dblist) worked after 494225 (wmf-config/InitialiseSettings.php) was deployed.

I'm not sure if that's a bug in Scap.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 7 2019, 1:10 PM
zeljkofilipin triaged this task as High priority.Mar 7 2019, 1:10 PM
zeljkofilipin updated the task description. (Show Details)
zeljkofilipin updated the task description. (Show Details)

My understanding is that the dblist is used on InitialiseSettings and/or CommonSettings, if you don't touch those the change doesn't get applied as some sort of 'caching' issue. Fact is that after touching InitialiseSettings the dblist patch merged previously started to work as expected. Thanks for filing this task.

hashar added a comment.Mar 7 2019, 2:03 PM

The dblist files are read and are part of the cached config. The relevant code (inline annotations are mine):

wmf-config/CommonSettings.php
// File in which we cache the wiki configuration
 188 $filename = "/tmp/mw-cache-$wmgVersionNumber/conf-$wgDBname";
 189 if ( defined( 'HHVM_VERSION' ) ) {
 190     $filename .= '-hhvm';
 191 }
 192 

// $globals would hold the values
 193 $globals = false;

// load from cache unless InitialiseSettings.php is more recent. Or to say otherwise, to dismiss the cache, InitialiseSettings.php has to be touched.
 194 if ( @filemtime( $filename ) >= filemtime( "$wmfConfigDir/InitialiseSettings.php" ) ) {
 195     $cacheRecord = @file_get_contents( $filename );
 196     if ( $cacheRecord !== false ) {
 197         $globals = unserialize( $cacheRecord );
 198     }
 199 }
 200 
 201 if ( !$globals ) {

// cache not loaded so process 
202     # Get configuration from SiteConfiguration object
 203     require "$wmfConfigDir/InitialiseSettings.php";
 204 

// And here we read the .dblist files
 205     # Collect all the dblist tags associated with this wiki
 206     $wikiTags = [];
 207     # When updating list please run ./docroot/noc/createTxtFileSymlinks.sh
 208     # Expand computed dblists with ./multiversion/bin/expanddblist
 209     foreach ( [
 210         'private',
...
 243         'pp_stage0',
 244         'pp_stage1',
 245         'cirrussearch-big-indices',
 246     ] as $tag ) {
 247         $dblist = MWWikiversions::readDbListFile( $tag );
 248         if ( in_array( $wgDBname, $dblist ) ) {
 249             $wikiTags[] = $tag;
 250         }
 251     }

Conclusion: the .dblist files are read and cached in the global wiki configuration. To invalidate them one has to touch InitialiseSettings.php and deploy it. Maybe scap can be made smart enough and automatically touch InitialiseSettings.php when it is asked to sync a file in ./dblists/.

thcipriani lowered the priority of this task from High to Normal.Mar 7 2019, 3:13 PM

Lowering priority since touching InitialiseSettings.php has been happening since T60618: [scap] Touch InitialiseSettings.php if syncing (circa 2014)

Maybe scap can be made smart enough and automatically touch InitialiseSettings.php when it is asked to sync a file in ./dblists/.

In fact, scap already touches InitialiseSettings.php at the end of every sync (https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/tools/scap/+/master/scap/tasks.py#408)

I can verify the code was executed in the logs:

Sync happened at 2019-03-07 12:34:

<zfilipin@deploy1001>	Synchronized dblists/commonsuploads.dblist: SWAT: [[gerrit:494806|Restrict local uploads on mediawiki.org, take 2 (T217523)]] (duration: 00m 56s)

The logs on mwdebug1002 should be representative:

2019-03-07T12:33:48 Running rsync command: `sudo -u mwdeploy -n -- /usr/bin/rsync --archive --delete-delay --delay-updates --compress --delete --exclude=**/cache/l10n/*.cdb --exclude=*.swp --no-perms --exclude=**/.git --include=/dblists --include=/dblists/commonsuploads.dblist --exclude=* deploy1001.eqiad.wmnet::common /srv/mediawiki`
2019-03-07T12:33:48 Touching /srv/mediawiki/wmf-config/InitialiseSettings.php

I investigated this previously: T181833#4179431 and I came up empty. I have a feeling there is a race condition here, but I can't see it.


Summary

We use second-accurate file mtime to purge a cache (which is problematic cf: https://apenwarr.ca/log/20181113), but to compound that problem, the purge is triggered via an app request. Meaning that you have an asynchronous process triggering a cache purge based on mtime.

At the end of scap we touch initialiseSettings.php. This often happens within the same second that the sync is complete (cf: logs above). This makes me suspect some race condition that I can't prove.

Example race condition

  1. sync InitialiseSettings.php and dblist/all.dblist
  2. sync finishes within the same second as touch
  3. app request comes in within the milli(/micro)second window after InitialiseSettings.php has synced, but before dblist/all.dblist has synced and before we touch InitialiseSettings.php after the sync is complete
  4. The cache is rebuilt with out-of-date dblist/all.dblist settings
  5. touch InitialiseSettings.php happens, but the mtime isn't updated because this happens within the same second as #2

What I can't figure out

Syncing dblists/commonsuploads.dblist in isolation as above isn't triggering the example race condition.

Here's my mental model of what's happening:

  1. sync dblists/commosuploads.dblist
  2. This should not have any effect on wmf-config/InitialiseSettings.php
  3. Sync completes
  4. Touch wmf-config/InitialiseSettings.php (which hasn't been modified by the last sync)
  5. Cache is invalidated in CommonSettings.php by incoming request

Some part of my understanding is wrong. The rsync command could be incorrect in some way to make my mental model wrong (i.e., we're not syncing dblist in isolation). There could be something else modifying the mtime of InitialiseSettings.php that I don't know about.

I think I have a theory, and I think it could explain how this keeps happening.

Here's a diagram:

TimeActionInitialiseSettings.php mtime/tmp/mw-cache-1.33.0-wmf.20/conf-commonswiki-hhvm mtimecomment
0-00Starting from time 0, the cache is up-to-date with InitialiseSettings.php
1scap sync-file README10README sync triggers an update to InitialiseSettings.php mtime
2GET / commons.wikimedia.org12Incoming request the next second creates a new cache file with a new mtime
2scap sync-file dblist/commons.dblist22Syncing of the new dblist updates the mtime of InitialiseSettings.php
2GET / commons.wikimedia.org22Incoming request DOES NOT update the cache since the mtime is the same on the cache file and the IS.php file and the cache file

How do we fix this? We could set the mtime way in the future in scap, I guess. This system is really hard to reason about as it is and I'd be happier with a different solution.

How do we fix this? We could set the mtime way in the future in scap, I guess. This system is really hard to reason about as it is and I'd be happier with a different solution.

@hashar suggested in IRC we have scap generate the cache rather than touch InitialiseSettings.php and have the application generate the cache (the current situation). We'd have to do that for all databases for all active versions, which might be time consuming, but it would side-step this problem.

Is this cache still needed with php7? could we get rid of the cache? I don't know who can answer that...

hashar added a comment.Mar 7 2019, 7:49 PM

From git log the filemtime has been there since at least 2011 / MediaWiki 1.17. We had php 5.2 at that time Special:Version in June 2011.

So yeah it is probably way faster to just reprocess it now thanks to hhvm/php7.0? But we would need a feature flag and some way to measure the difference with and without filemtime and the serialized configs. Maybe Performance-Team team could be interested in tackling/helping on that front?

hashar added a comment.Mar 7 2019, 8:34 PM

I am tempted to just merge this as a duplicate of the old task T181833 (and maybe update its description with Tyler comments above).

For the usefulness of filemtime and the serialized cache of the configuration, that predates our time by years. It has been there since at least June 2011 when we ran php 5.2. Supposedly nowadays with HHVM / PHP 7.0 parsing the php file might not be much of a problem anywere. And might well be faster since the PHP is read once and held in a bytecode cache, saving PHP from having to read file and unserialize. But I think that is a subject for another task.

Krinkle added a subscriber: Krinkle.EditedMay 13 2019, 9:07 PM

I think the simpler solution would be to do the same as what we generally to avoid this problem in production code, e.g. in MediaWiki core. Which is to have the verification mechanism part of the value instead of as some fragile indirect side-effect of the value.

In other words, either

  1. Make the fragile side-effect not a side effect and not fragile, by having the PHP code in wmf-config create the cache with the explicit mtime set to the value it was creating it for.
  2. Or, store the target mtime in the cache value and verify it on retrieval. Thus removing use of filemtime for the cache file.

I think the simpler solution would be to do the same as what we generally to avoid this problem in production code, e.g. in MediaWiki core. Which is to have the verification mechanism part of the value instead of as some fragile indirect side-effect of the value.
In other words, either

  1. Make the fragile side-effect not a side effect and not fragile, by having the PHP code in wmf-config create the cache with the explicit mtime set to the value it was creating it for.

This is what I expected the code did; i.e., generate a /tmp cache file, set mtime on cache file to the mtime of IS.php, mv cachefile into place -- this would side-step the race-condition, I believe.

  1. Or, store the target mtime in the cache value and verify it on retrieval. Thus removing use of filemtime for the cache file.
Krinkle renamed this task from Problems deploying dblists/commonsuploads.dblist to Changing dblist files requires mtime touch of InitialiseSettings.php.Aug 6 2019, 11:43 AM

Change 528446 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] CommonSettings: Clean up wmf-config caching code [no-op]

https://gerrit.wikimedia.org/r/528446

Change 528447 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] CommonSettings: Store mtime inside wmf-config cache file

https://gerrit.wikimedia.org/r/528447

Change 528446 merged by jenkins-bot:
[operations/mediawiki-config@master] CommonSettings: Clean up wmf-config caching code [no-op]

https://gerrit.wikimedia.org/r/528446

Change 528447 merged by jenkins-bot:
[operations/mediawiki-config@master] CommonSettings: Store mtime inside wmf-config cache file

https://gerrit.wikimedia.org/r/528447

Mentioned in SAL (#wikimedia-operations) [2019-09-04T01:02:06Z] <krinkle@deploy1001> Synchronized wmf-config/CommonSettings.php: ed5297c10 / T217830 (duration: 00m 59s)