Page MenuHomePhabricator

scap 3.7.4-2 is broken
Closed, ResolvedPublic

Description

tl;dr: scap 3.7.4-2 was released without quilt patches applied.

Scap 3.7.4-1 was deployed on 2017-12-11

on 2017-12-14 @demon noticed that scap was building a git directory in /srv/mediawiki/. That functionality is behind a feature flag on the release branch of scap https://github.com/wikimedia/scap/blob/release/scap/main.py#L334 we figured out that v.3.7.4-1 was cut from the master branch and not the release branch. I reopend T182347 and that was resolved this morning 2017-12-14 with the deploy of 3.7.4-2.

at 14:37UTC @Legoktm discovered that targets were looking at /srv/deployment/scap/scap/bin/scap as the path to scap. This is a configuration value that is overridden via quilt (https://github.com/wikimedia/scap/blob/release/debian/patches/change-bin-dir.patch) that did not get applied in 3.7.4-2 evidently.


[14:37:11] <legoktm> 22:36:59 ['/srv/deployment/scap/scap/bin/scap', 'pull', '--no-update-l10n', '--include', 'php-1.31.0-wmf.12', '--include', 'php-1.31.0-wmf.12/extensions', '--include', 'php-1.31.0-wmf.12/extensions/LoginNotify', '--include', 'php-1.31.0-wmf.12/extensions/LoginNotify/includes', '--include', 'php-1.31.0-wmf.12/extensions/LoginNotify/includes/LoginNotify.php', 'tin.eqiad.wmnet', 'naos.codfw.wmnet', 'tin.eqiad.wmnet'] on mw1264.eqiad.wmnet returned [127]: bash: /srv/deployment/scap/scap/bin/scap: No such file or directory
[14:37:33] <no_justification> Ummmmm.
[14:37:33] <legoktm> 22:36:59 ['/srv/deployment/scap/scap/bin/scap', 'pull-master', 'tin.eqiad.wmnet'] on naos.codfw.wmnet returned [127]: Could not chdir to home directory /var/lib/mwdeploy: No such file or directory
[14:37:33] <legoktm> bash: /srv/deployment/scap/scap/bin/scap: No such file or directory
[14:37:35] <no_justification> thcipriani: ^
[14:37:53] <thcipriani> also ummm
[14:38:01] <legoktm> all of sync masters and canaries failed
[14:38:13] <no_justification>  /srv/deployment/scap/* shouldn't exist anymore in prod....
[14:38:28] <no_justification> I wonder if the patch file didn't get applied on deb package build?
[14:38:29] <legoktm> the command I ran was
[14:38:30] <legoktm> legoktm@tin:/srv/mediawiki-staging$ scap sync-file php-1.31.0-wmf.12/extensions/LoginNotify/includes/LoginNotify.php "Use extension registry to check for CheckUser to be installed - T182867"
[14:38:31] <stashbot> T182867: "Login to Wikidata as QuickStatementsBot from a computer you have not recently used" - https://phabricator.wikimedia.org/T182867
[14:39:14] <thcipriani> yeah, if the quilt patches didn't get applied in the debian package...
[14:40:24] <legoktm> was a new version deployed?
[14:40:43] <thcipriani> this morning, yeah
[14:41:00] <legoktm> was it tested afterwards? :p
[14:41:29] <thcipriani> no :(
[14:41:49] <thcipriani> so /usr/lib/python2.7/dist-packages/scap/config.py shows the wrong path to scap. That's applied via quilt.
[14:42:05] <thcipriani> it should have been a redeploy of the same version that was running.
[14:42:24] <thcipriani> except with a config flag that was missing.

Event Timeline

Legoktm triaged this task as Unbreak Now! priority.Dec 15 2017, 10:47 PM
Legoktm created this task.
thcipriani updated the task description. (Show Details)Dec 15 2017, 11:43 PM
thcipriani added a subscriber: demon.
thcipriani renamed this task from scap is broken bash: /srv/deployment/scap/scap/bin/scap: No such file or directory to scap 3.7.4-2 is broken.Dec 15 2017, 11:46 PM
thcipriani updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2017-12-15T23:55:30Z] <mutante> downgrading scap from 3.7.4-2 to 3.7.4-1 where it is installed - cumin -b 10 -s 5 'R:Package = scap' 'if dpkg -l scap | grep "3.7.4.2" && file /var/cache/apt/archives/scap_3.7.4-1_all.deb; then puppet agent --disable; apt-get remove --yes -q scap ; dpkg -i /var/cache/apt/archives/scap_3.7.4-1_all.deb ; fi' targeting 478 hosts (T183046)

Change 398606 had a related patch set uploaded (by Chad; owner: Chad):
[operations/puppet@production] scap: Set bin_dir globally to /usr/bin

https://gerrit.wikimedia.org/r/398606

Mentioned in SAL (#wikimedia-operations) [2017-12-16T00:10:40Z] <mutante> no more scap 3.7.4-2 found across 'R:Package = scap' (T183046)

Mentioned in SAL (#wikimedia-operations) [2017-12-16T01:49:14Z] <mutante> reimported scap 3.7.4-1 into APT (jessie-wikimedia) after fixing md5/sha sums in .dsc and .changes files to match orig.tar.gz | copied it from jessie-wikimedia to trusty and stretch-wikimedia. all distributions downgraded to 3.7.4-1 (T183046)

Dzahn lowered the priority of this task from Unbreak Now! to High.Dec 16 2017, 2:15 AM
Dzahn added a subscriber: Dzahn.

the whole cluster has been downgraded to 3.7.4-1 , checked with cumin

the broken version has been removed from APT

[install1002:~] $ sudo -E reprepro ls scap
scap | 3.7.4-1 |  trusty-wikimedia | amd64, i386, source
scap | 3.7.4-1 |  jessie-wikimedia | amd64, i386, source
scap | 3.7.4-1 | stretch-wikimedia | amd64, i386, source

therefore lowering prority from UBN

akosiaris added a subscriber: akosiaris.EditedDec 18 2017, 11:36 AM

So, I 'll add my considerable amounts of cents in this task in order to provide insight into what happened, see where did went south and figure out what we need to do to avoid it in the future.

Just as a reminder, in the 3.7.4-1 I released from master. I shouldn't have but a) the build went fine, b) it seemed entirely natural, c) there was no documentation that I could readily find that mentioned builds should be from the release branch. The only documentation about how to build the software is in https://wikitech.wikimedia.org/wiki/Scap3#Debian_package and it's very terse.

Then on Friday I tried releasing from the release branch, with an extra patch specified in D916 that just bumped the debian/changelog file. And it would not build. The error was

dpkg-source: info: local changes detected, the modified files are:
 scap/bin/scap
 scap/scap/config.py

The command used was GIT_PBUILDER_AUTOCONF=no DIST=jessie gbp buildpackage --git-pbuilder -us -uc -S

The "local" changes when inspected would make no sense to me. The diff by dpkg-source was

--- scap-3.7.4.orig/bin/scap
+++ scap-3.7.4/bin/scap
@@ -1,4 +1,4 @@
-#!/usr/bin/python2
+#!/usr/bin/env python2
 # -*- coding: utf-8 -*-
 """
     scap
--- scap-3.7.4.orig/scap/config.py
+++ scap-3.7.4/scap/config.py
@@ -31,7 +31,7 @@ import scap.utils as utils
 
 
 DEFAULT_CONFIG = {
-    'bin_dir': (str, '/usr/bin'),
+    'bin_dir': (str, '/srv/deployment/scap/scap/bin'),
     'canary_dashboard_url': (
         str,
         'https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e'

The source is generated by git-buildpackage via git archive (and I verified twice it was correct). I could not find the issue despite spending > 1 hour fighting with it (I did find it today with a rested and cleaner head). At that point I made the mistake of managing to build the software effectively ignoring the patches in debian/patches directory (which to be honest are kind of weird patches) and uploading it causing the problems this task is about.

Now to the root cause of the issue. What I now realized was the .gitignore ignores the quilt patch state directory .pc meaning a git status shows no differences and a git clean -fd does not remove it causing subsequent builds which include a branch change to fail. That makes it possible to reproduce what I met on Friday

  1. git clone the repo
  2. Build the master branch (or any other branch)
  3. Checkout the release branch
  4. Try to rebuild

So, actionables as far as I can tell.

  • Bump to 3.7.4-3 to facilitate upgrades (we could force rebuild 3.7.4-2 and purge the caching proxy, but let's play it by the book)
  • Drop the .pc directory from .gitignore
  • Clearly document the building process (and preferably next to the code - e.g. README.rst or a debian/README)
  • Perhaps document what the various git branches are for. It would possibly be helpful to other users of the software
  • Re-evaluate the patches in debian/patches. I for one, see no reason for a patch that sets bin_dir from a clearly WMF specific path to what should be considered the "normal" path. I have to ask in the first place why bin_dir even needs to be configurable and not detectable.

Agree about building 3.7.4-3 for new upload.
As far as the bin_dir, have you seen this? https://gerrit.wikimedia.org/r/#/c/398606/

Change 398606 merged by Alexandros Kosiaris:
[operations/puppet@production] scap: Set bin_dir globally to /usr/bin

https://gerrit.wikimedia.org/r/398606

The 2 first actionables are in D919.

I 've also merged the bin_dir configuration change above, but that's just a safeguard, as per the comment in the change "Ideally, we want to remove this old configuration entirely"

Change 398822 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Bump scap to 3.7.4-3

https://gerrit.wikimedia.org/r/398822

Mentioned in SAL (#wikimedia-operations) [2017-12-18T12:20:15Z] <akosiaris> build scap 3.7.4-3 and upload to jessie-wikimedia, stretch-wikimedia, trusty-wikimedia. T183046, T182347

The control file seems to be wrong. It should be this https://phabricator.wikimedia.org/source/scap/browse/master/debian/control because of a move of python-semver from depends to suggests, so that the package will install on trusty hosts. But somehow that's not what's in the build. Copy-pasting what's in the deb package after extraction:

Package: scap
Version: 3.7.4-3
Architecture: all
Maintainer: Wikimedia Foundation Release Engineering <releng@wikimedia.org>
Installed-Size: 497
Depends: python, python-configparser, python-jinja2, python-psutil, python-pygments, python-requests, python-semver, python-six, python:any (<< 2.8), python:any (>=
 2.7.5-5~), python-yaml, git, bash-completion, python-conftool
Suggests: git-fat, php5-cli | php-cli | hhvm

The control file seems to be wrong. It should be this https://phabricator.wikimedia.org/source/scap/browse/master/debian/control because of a move of python-semver from depends to suggests, so that the package will install on trusty hosts. But somehow that's not what's in the build. Copy-pasting what's in the deb package after extraction:

Package: scap
Version: 3.7.4-3
Architecture: all
Maintainer: Wikimedia Foundation Release Engineering <releng@wikimedia.org>
Installed-Size: 497
Depends: python, python-configparser, python-jinja2, python-psutil, python-pygments, python-requests, python-semver, python-six, python:any (<< 2.8), python:any (>=
 2.7.5-5~), python-yaml, git, bash-completion, python-conftool
Suggests: git-fat, php5-cli | php-cli | hhvm

Indeed. That's an artifact of trusty vs jessie builds (I used jessie) where python:Depends gets calculated differently. I 've rebuilt once more based and trusty and updated the repo.

the documentation on the prep-side for packages is at https://wikitech.wikimedia.org/wiki/How_to_deploy_code/Scap which should probably get combined with https://wikitech.wikimedia.org/wiki/Scap3#Debian_package for use both on Wikitech and in the repo.

Change 398853 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[operations/puppet@production] Scap: bump version to 3.7.4-3

https://gerrit.wikimedia.org/r/398853

Change 398822 merged by Alexandros Kosiaris:
[operations/puppet@production] Bump scap to 3.7.4-3

https://gerrit.wikimedia.org/r/398822

Change 398853 abandoned by Thcipriani:
Scap: bump version to 3.7.4-3

Reason:
Ie42e4463183fc91165e6c6fd093f8b384bc6776f

https://gerrit.wikimedia.org/r/398853

thcipriani closed this task as Resolved.Jan 4 2018, 4:26 PM
thcipriani assigned this task to akosiaris.

This one was resolved with the release of scap 3.7.4-3