Page MenuHomePhabricator

Jenkins rejects all patches to operations/puppet/tox.ini
Closed, ResolvedPublic

Description

I just added a trival no-op patch to tox.ini, and jenkins marked it -1: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/585058/ This is cramping my style as I have a different patch on progress that requires a tox change.

19:55:33 ________________________ VarnishRlsTest.testIfNoneMatch ________________________
19:55:33 
19:55:33 self = <varnish_test.VarnishRlsTest testMethod=testIfNoneMatch>
19:55:33 
19:55:33     def testIfNoneMatch(self):
19:55:33         s = self.store.get_samples('varnish_resourceloader_inm')
19:55:33 >       self.assertIn(('', 1), s)
19:55:33 E       AssertionError: ('', 1) not found in [('', 0)]
19:55:33 
19:55:33 modules/mtail/files/test/varnish_test.py:28: AssertionError
19:55:33 ___________________________ VarnishRlsTest.testResp ____________________________
19:55:33 
19:55:33 self = <varnish_test.VarnishRlsTest testMethod=testResp>
19:55:33 
19:55:33     def testResp(self):
19:55:33 >       s = self.store.get_samples('varnish_resourceloader_resp')
19:55:33 
19:55:33 modules/mtail/files/test/varnish_test.py:31: 
19:55:33 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
19:55:33 
19:55:33 self = <mtail_store.MtailMetricStore object at 0x7f1d8834bd90>
19:55:33 name = 'varnish_resourceloader_resp'
19:55:33 
19:55:33     def get_samples(self, name):
19:55:33         """Return all samples for metric name as a list of samples.
19:55:33            Each sample is in this form: ("k1=v1,k2=v2", value)"""
19:55:33         samples = []
19:55:33         if name not in self._store:
19:55:33             raise ValueError('metric %s not found in store' % name)
19:55:33 >       for metric in self._store[name][0]['LabelValues']:
19:55:33 E       KeyError: 'LabelValues'
19:55:33 
19:55:33 modules/mtail/files/test/mtail_store.py:47: KeyError
19:55:33 ===================== 2 failed, 28 passed in 3.00 seconds ======================
19:55:33 ERROR: InvocationError for command '/srv/workspace/puppet/.tox/mtail/bin/pytest modules/mtail/files' (exited with code 1)

Event Timeline

I've looked at this for quite a while but I can't make any history version of "pytest modules/mtail/files" work locally no matter how far I go back in git history. So either I'm misunderstanding the context in which it's supposed to work, or it never worked. I see evidence that a change to tox.ini worked just a few weeks ago, much more recently than anything in mtail has changed. My only guess is that somehow @ema 's recent Varnish changes broke this and those changes were somehow not subject to this test.

The problem is most probably we're installing a default mtail from buster, instead of the specific version we're using in production, which these tests rely on.

CI has switched that job from Stretch to Buster with:

4e85d6332 zuul: [operations/puppet] Add buster as experimental
21b6dbfad zuul: [operations/puppet] Use buster instead of stretch image
9e5962476 jjb: Drop operations-puppet-tests-stretch-docker, no longer used

We can restore the Stretch job and keep the Buster one in experimental for now. I would assume the Buster based job got tried (by commenting check experimental) but the test suite does not run all the tests, so the experimental job would have missed the mtail related issue.

`
For mtail specially, we can craft a second job that only run the mtail suite and that would only trigger when fies under modules/mtail are touched.

Change 585239 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Revert "jjb: Drop operations-puppet-tests-stretch-docker, no longer used"

https://gerrit.wikimedia.org/r/585239

Change 585239 merged by jenkins-bot:
[integration/config@master] Revert "jjb: Drop operations-puppet-tests-stretch-docker, no longer used"

https://gerrit.wikimedia.org/r/585239

Change 585241 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Revert "zuul: [operations/puppet] Use buster instead of stretch image"

https://gerrit.wikimedia.org/r/585241

Change 585241 merged by jenkins-bot:
[integration/config@master] Revert "zuul: [operations/puppet] Use buster instead of stretch image"

https://gerrit.wikimedia.org/r/585241

hashar triaged this task as Medium priority.Apr 1 2020, 2:42 PM

I have rolled back to the Stretch based job. It passed just fine on a dummy change to tox.ini: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/585058/

Left to figure out: how can we have some tests to be run on a specific distribution? We can have job/container for each of Stretch and Buster, but I have no clue how we can select tests to run based on the distro.

Thanks! The actual patch I was worried about passes CI now, so I'm happy :)

Jdforrester-WMF raised the priority of this task from Medium to Unbreak Now!.Apr 1 2020, 3:02 PM

I worry this cure is worse than the symptom. The move to buster was required for rOPUPd70a7f24f262: systemd: Replace the Datetime regex with a call to systemd-analyze. to pass, so presumably all patches touching that area are now unmergeable instead.

We'll need to split the job urgently.

the test suite does not run all the tests

This feels like a bad choice. Let's revisit that, later.

I worry this cure is worse than the symptom. The move to buster was required for rOPUPd70a7f24f262: systemd: Replace the Datetime regex with a call to systemd-analyze. to pass, so presumably all patches touching that area are now unmergeable instead.

Based on the review comments at https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/584020/ the job failed due to lack of systemd in the CI container. So at least that one is easily solvable.

Change 585261 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] docker: operations-puppet back to Stretch

https://gerrit.wikimedia.org/r/585261

Change 585267 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] jjb: switch operations/puppet to 0.6.3

https://gerrit.wikimedia.org/r/585267

Change 585270 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[integration/config@master] operations-puppet: pin a specific mtail version

https://gerrit.wikimedia.org/r/585270

Change 585261 abandoned by Hashar:
docker: operations-puppet back to Stretch

Reason:
We install a specific mtail version in Buster instead: https://gerrit.wikimedia.org/r/#/c/integration/config/ /585270/

https://gerrit.wikimedia.org/r/585261

Change 585267 abandoned by Hashar:
jjb: switch operations/puppet to 0.6.3

https://gerrit.wikimedia.org/r/585267

Change 585270 merged by jenkins-bot:
[integration/config@master] operations-puppet: pin a specific mtail version

https://gerrit.wikimedia.org/r/585270

Change 585274 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] jjb: update operations-puppet for mtail

https://gerrit.wikimedia.org/r/585274

Change 585275 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Switch operations/puppet to Buster

https://gerrit.wikimedia.org/r/585275

Change 585274 merged by jenkins-bot:
[integration/config@master] jjb: update operations-puppet for mtail

https://gerrit.wikimedia.org/r/585274

Change 585275 merged by jenkins-bot:
[integration/config@master] Switch operations/puppet to Buster

https://gerrit.wikimedia.org/r/585275

hashar claimed this task.

We are back to using Buster for CI.

Stretch does not have a systemd that supports systemd-analyze calendar so we definitely needed Buster.

mtail has been pinned by @Joe via https://gerrit.wikimedia.org/r/#/c/integration/config/+/585270/2/dockerfiles/operations-puppet/Dockerfile.template , allowing us to upgrade to Buster.