[scap] Deployment: Repository .git is synchronised fine, but is broken for submodules because of hardcoded gitdir link
Closed, ResolvedPublic

Description

The VisualEditor UI has an interface to check what version is currently deployed (similar to Special:Version), when asking around people seemed to assume that this has been disabled purposely on the cluster.

However this does not seem to be true.

For one, the configuration around is working perfectly fine.

Though the HEAD commit is retreived by MediaWiki's GitInfo class without shell (it just reads from the .git/HEAD file directly), I additionally verified that $wgGitBin (used for calcuating the commit dates with git show) has also not been disabled in production and works fine (tested on tin.eqiad.wmnet).

The .git directory itself is not excluded from our deployment scripts. Though partial syncs using sync-dir or sync-file will naturally not sync the .git directory, when using scap or sync-dir on an extension directory, it will be updated just fine.

And, in fact, [[Special:Version]] does show a git hash (maybe not the best one, but it does show something).

It doesn't work for extensions because the git data is stored in mediawiki-core (e.g. mediawiki-core/extensions/VisualEditor/.git is a placeholder file with a pointer to mediawiki-core/.git/modules/extensions/VisualEditor), and the pointer is hardcoded to the location of the working copy on tin, namely /a/common/php-1.22wmf16, which doesn't exist on apaches.

For example:

krinkle@mw1017:/apache/common/php-1.22wmf16/extensions/VisualEditor$ git show

fatal: Not a git repository: /a/common/php-1.22wmf16/.git/modules/extensions/VisualEditor

krinkle@mw1017:/apache/common/php-1.22wmf16/extensions/VisualEditor$ cat .git

gitdir: /a/common/php-1.22wmf16/.git/modules/extensions/VisualEditor

I'm not sure whether the resemblance of /a and /apache is a coincendence or whether one is intended to be a shortcut of the other. Either way, it seems fairly trivial to make this work.

I'm not sure what the semantic meaning is of these directories (/a/ seems to be an existing but unused directory on all hosts other than tin).

Depending on whether it is really empty we should probably just create a symlink from /a to /apache on all machines that have /apache (except for tin), or if /a is used for other stuff, put symlink inside and have one from /a/common to /apache/common (except for tin).


Version: wmf-deployment
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=36271

bzimport added a project: Deployment-Systems.Via ConduitNov 22 2014, 2:04 AM
bzimport set Reference to bz53972.
Krinkle created this task.Via LegacySep 10 2013, 12:09 AM
Reedy added a comment.Via ConduitSep 10 2013, 1:11 AM

(In reply to comment #0)

I'm not sure what the semantic meaning is of these directories (/a/ seems to
be
an existing but unused directory on all hosts other than tin).

fluorine

bd808 added a comment.Via ConduitFeb 13 2014, 11:32 PM

(In reply to Krinkle from comment #0)

I'm not sure what the semantic meaning is of these directories (/a/ seems to
be an existing but unused directory on all hosts other than tin).

/a/common is the rsync server source directory on tin for scap. This directory is used to prepare the files that will be pushed out by scap to the members of the /etc/dsh/group/scap-proxies group and then subsequently to the rest of the cluster.

The misc::deployment::vars puppet class ensures that this directory is created on all hosts where the misc::deployment::scap_scripts puppet class is applied. The misc::deployment::scap_scripts class is applied directly to arsenic, hume, terbium, and tin in site.pp. It's also applied indirectly on the snapshot* servers by the mediawiki::sync class.

I can't find any reason by grepping in operations/puppet that the general mw* nodes would have /a and/or /a/common created but /a/common exists as an empty directory on all of them that I randomly sampled. It seems like it should be possible to add some puppet config (maybe in role::applicationserver::webserver?) to ensure that the /a directory is created and then symlink /a/common to /usr/local/apache/common.

bd808 added a comment.Via ConduitFeb 14 2014, 9:05 PM
  • Bug 53070 has been marked as a duplicate of this bug. ***
Jdforrester-WMF added a comment.Via ConduitMar 20 2014, 9:38 PM
  • Bug 62760 has been marked as a duplicate of this bug. ***
bd808 added a comment.Via ConduitApr 22 2014, 6:36 PM

This bug affects the beta environment now that scap is being used to deploy code there. The ideas for creating symlinks to restore the scame directory structure that is used on the deploy host (deployment-bastion.eqiad.wfmlabs) make even less sense in beta. Here's an example .git file from an extension there:

gitdir: /mnt/srv/scap-stage-dir/php-master/extensions/.git/modules/MultimediaViewer
Jdforrester-WMF added a comment.Via ConduitApr 22 2014, 7:37 PM

Could we just have scap re-write the .git files (or have them be relative and not absolute in the first place)?

bd808 added a comment.Via ConduitApr 22 2014, 8:09 PM

(In reply to James Forrester from comment #6)

Could we just have scap re-write the .git files (or have them be relative
and not absolute in the first place)?

Having them be relative in the first place would really be nicest. Do we have a git guru who could look into that?

I had the same idea about rewriting the .git files as a scap step. It shouldn't be too hard to do. Because of the way that scap works right now we'd need to do this on each host as a post-update step similarly to the way that the l10n JSON intermediate files are recompiled into CDB files.

bd808 added a comment.Via ConduitApr 22 2014, 9:08 PM

(In reply to Bryan Davis from comment #2)

I can't find any reason by grepping in operations/puppet that the general
mw* nodes would have /a and/or /a/common created but /a/common exists as an
empty directory on all of them that I randomly sampled.

I eventually found the puppet code that creates /a/common as an empty directory on many hosts. The misc::deployment::vars class which in included from the misc::deployment::scap_scripts class creates the directory if it doesn't exist [0].

[0]: https://github.com/wikimedia/operations-puppet/blob/c5b8fa0631d3b28b6ca062e313b98e80d7325d80/manifests/misc/deployment.pp#L368-L374

bd808 added a comment.Via ConduitApr 28 2014, 8:15 PM

I'm working on a fix for this problem by adding a step to scap that precomputes the information needed by GitInfo.php to json files that can be synced to the apaches. This would fix the path change issue and should slightly reduce runtime cost of Special:Version and other pages that use GitInfo. It would also allow us to drop syncing of the .git directories all together if we wanted.

gerritbot added a comment.Via ConduitApr 29 2014, 9:27 PM

Change 130498 had a related patch set uploaded by BryanDavis:
Support precomputed data in GitInfo

https://gerrit.wikimedia.org/r/130498

gerritbot added a comment.Via ConduitApr 30 2014, 4:40 AM

Change 130560 had a related patch set uploaded by BryanDavis:
Precompute data for GitInfo

https://gerrit.wikimedia.org/r/130560

gerritbot added a comment.Via ConduitMay 5 2014, 11:56 PM

Change 130498 merged by jenkins-bot:
Support precomputed data in GitInfo

https://gerrit.wikimedia.org/r/130498

gerritbot added a comment.Via ConduitMay 6 2014, 6:07 PM

Change 130560 merged by jenkins-bot:
Precompute data for GitInfo

https://gerrit.wikimedia.org/r/130560

bd808 added a comment.Via ConduitMay 6 2014, 9:02 PM

Patches are merged, but not sufficient. I didn't look into the value of $wgCacheDirectory in operations/mediawiki-config.git. In beta and the prod cluster we set the cache directory to a location outside of $IP rather than $IP/cache as I expected/coded the scap side to this change for.

I'll either need to add a new global to set/change the location where GitInfo looks for the cache or add yet another scap step to copy the generated files around to adjust for this.

bd808 added a comment.Via ConduitMay 6 2014, 11:50 PM

I put a temporary hack in place on the beta nodes that run MediaWiki (apache0[12], jobrunner01, and videoscaler01) by symlinking $wgCacheDirectory/gitinfo to $IP/cache/gitinfo. This should make the git version data visible on the Special:Version page the next time that the MW-Core hash changes (memcache cache buster).

deployment-apache01:~
bd808$ mwscript eval.php --wiki=enwiki
> var_dump( $gi = new GitInfo( "$IP/extensions/VisualEditor" ) );
object(GitInfo)#113 (3) {
  ["basedir":protected]=>
  NULL
  ["cacheFile":protected]=>
  string(62) "/tmp/mw-cache-master/gitinfo/info-extensions-VisualEditor.json"
  ["cache":protected]=>
  array(6) {
    ["head"]=>
    string(40) "0496bd84506927039958cc683a35a5e3b5da2584"
    ["remoteURL"]=>
    string(70) "https://gerrit.wikimedia.org/r/p/mediawiki/extensions/VisualEditor.git"
    ["branch"]=>
    string(40) "0496bd84506927039958cc683a35a5e3b5da2584"
    ["headCommitDate"]=>
    string(10) "1399417572"
    ["headSHA1"]=>
    string(40) "0496bd84506927039958cc683a35a5e3b5da2584"
    ["@directory"]=>
    string(44) "/a/common/php-master/extensions/VisualEditor"
  }
}
bd808 added a comment.Via ConduitMay 7 2014, 1:42 AM

Still not seeing data in Special:Version. There must be something that I'm not understanding about the environment. I can get the info via eval.php as shown above.

bd808 added a comment.Via ConduitMay 19 2014, 7:52 PM

I'm looking into this again and still can't figure out why http://en.wikipedia.beta.wmflabs.org/wiki/Special:Version isn't displaying the git SHA1 for extensions now.

Debugging using mwscript eval.php --wiki=enwiki on deployment-apache01 and deployment-apache02 shows that GitInfo can read the precomputed git information there:

> $wgDebugLogGroups = array(); $wgDebugLogFile = 'php://stdout';
> $gitinfo = new GitInfo($IP); $coreId = $gitinfo->getHeadSHA1();
> $cache = wfGetCache( CACHE_ANYTHING );
> $path = "$IP/extensions/VisualEditor/VisualEditor.php";
> $memcKey = wfMemcKey( 'specialversion-ext-version-text', $path, $coreId );
> echo $memcKey;
enwiki:specialversion-ext-version-text:/srv/common-local/php-master/extensions/VisualEditor/VisualEditor.php:49952a405014c89b239da3bcaea158c47faf8251
> var_dump( $cache->get( $memcKey ) );
enwiki-bca38539: 649.4035  25.8M  [memcached] get(enwiki:specialversion-ext-version-text:/srv/common-local/php-master/extensions/VisualEditor/VisualEditor.php:49952a405014c89b239da3bcaea158c47faf8251)
enwiki-bca38539: 649.4047  25.8M  [memcached] result: NOT FOUND
bool(false)

> $gitinfo = new GitInfo( dirname( $path ) );
> echo $gitinfo->getHeadSHA1();
fefd3a5d72c118993cc555f0a53a561f58a5fd19
> echo $gitinfo->getHeadViewUrl();
https://git.wikimedia.org/commit/mediawiki%2Fextensions%2FVisualEditor.git/fefd3a5d72c118993cc555f0a53a561f58a5fd19
> echo $gitinfo->getHeadCommitDate();
1400528579

I've verified that apache is running as the user "apache" on these nodes and that apache can read the *info.json files. The output of eval.php above shows that the php code is functional when executed from the cli. At this point I'm unable to understand why this output differs from SpecialVersion::getCreditsForExtension(). I'm sure I'm missing something in my analysis but I'm at a loss for what it is.

bd808 added a comment.Via ConduitJun 26 2014, 12:40 AM

So when I echo $IP from mweval in beta I get /srv/common-local/php-master but apparently inside Apache $IP is /usr/local/apache/common-local/php-master. The latter is a symlink to former. This difference is causing GitInfo::getCacheFilePath() to compute the wrong path for the cache file. I'll spend some more time looking into why $IP isn't what I (and the paths inside $wgExtensionCredits) think it should be.

bd808 added a comment.Via ConduitJun 26 2014, 5:52 PM

(In reply to Bryan Davis from comment #18)

I'll spend some more time looking into why $IP isn't what I
(and the paths inside $wgExtensionCredits) think it should be.

Today I learned that WebStart.php clears $IP and then recreates it from the first of the MW_INSTALL_PATH environment variable, realpath( '.' ) or dirname( __DIR__ ). In beta, the realpath branch wins and sets $IP to '/srv/common-local/php-master'.

gerritbot added a comment.Via ConduitJun 26 2014, 7:23 PM

Change 142319 had a related patch set uploaded by BryanDavis:
Fix GitInfo cache file path computation and storage location

https://gerrit.wikimedia.org/r/142319

gerritbot added a comment.Via ConduitJun 27 2014, 10:27 AM

Change 142320 had a related patch set uploaded by Krinkle:
Set wgGitInfoCacheDirectory to point to scap managed location

https://gerrit.wikimedia.org/r/142320

gerritbot added a comment.Via ConduitJul 8 2014, 4:42 PM

Change 142319 merged by jenkins-bot:
Fix GitInfo cache file path computation and storage location

https://gerrit.wikimedia.org/r/142319

Jdforrester-WMF added a comment.Via ConduitJul 8 2014, 5:42 PM

Yay, thanks Bryan.

gerritbot added a comment.Via ConduitJul 8 2014, 6:22 PM

Change 142320 merged by jenkins-bot:
Set wgGitInfoCacheDirectory to point to scap managed location

https://gerrit.wikimedia.org/r/142320

greg moved this task to Done on the Deployment-Systems workboard.Via WebJan 8 2015, 5:39 PM

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.