Page MenuHomePhabricator

train-deploy-notes Jenkins job fails in conjunction with branch.py
Closed, ResolvedPublic

Description

While deploying T233864, noted that the train-deploy-notes job currently fails:

Per @mmodell, this is a known issue with branching by way of branch.py. Additionally we seem to have deprecated a config file that the script relies on.

I created MediaWiki_1.35/wmf.16/Changelog manually using:

12:33:00 brennen@metaphor:~/release/make-deploy-notes (master m0/u1) ❂ python3 makedeploynotes.py 1.35.0-wmf.15 1.35.0-wmf.16

...after moving release/make-wmf-branch/config.json-deprecated to release/make-wmf-branch/config.json and deleting comments atop the file.

Event Timeline

thcipriani triaged this task as Medium priority.
thcipriani subscribed.

Assigning to @mmodell for the moment since he's been driving forward train automation. He and I will likely pair here since the deploy-notes job is a beast we paired to create in the first place.

Change 571849 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[integration/config@master] train-deploy-notes: Run when wgVersion is changed from -alpha

https://gerrit.wikimedia.org/r/571849

Change 571849 merged by jenkins-bot:
[integration/config@master] train-deploy-notes: Run when wgVersion is changed from -alpha

https://gerrit.wikimedia.org/r/571849

Change 572496 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[integration/config@master] train-deploy-notes: Skip initialization of mediawiki/core submodules

https://gerrit.wikimedia.org/r/572496

Change 572496 merged by jenkins-bot:
[integration/config@master] train-deploy-notes: Skip initialization of mediawiki/core submodules

https://gerrit.wikimedia.org/r/572496

Looks like the job filter introduced to zuul/layout.yaml in https://gerrit.wikimedia.org/r/c/integration/config/+/571849 that was meant to limit scheduling of the job for only changes to the includes/DefaultSettings.php file on wmf/* branches is not working.

From the zuul debug logs:

2020-02-18 18:23:37,379 DEBUG zuul.IndependentPipelineManager: Starting queue processor: postmerge
2020-02-18 18:23:37,380 DEBUG zuul.IndependentPipelineManager: Checking for changes needed by <Change 0x7fae44aaa590 572931,1>:
2020-02-18 18:23:37,380 DEBUG zuul.IndependentPipelineManager:   No changes needed
2020-02-18 18:23:37,380 DEBUG zuul.IndependentPipelineManager: Preparing ref for: <Change 0x7fae44aaa590 572931,1>
2020-02-18 18:23:37,380 INFO zuul.IndependentPipelineManager: Change <Change 0x7fae44aaa590 572931,1> depends on changes []
2020-02-18 18:23:37,380 DEBUG zuul.MergeClient: Submitting job <gear.Job 0x7fae2c2d28d0 handle: None name: merger:merge unique: 8e2b885cf1ad4c92b1c903203806de24> with data {'items': [{'oldrev': None, 'newrev': None, 'refspec': u'refs/changes/31/572931/1', 'merge_mode': 2, 'number': '572931', 'connection_name': 'gerrit', 'project': 'mediawiki/core', 'url': [redacted], 'branch': u'wmf/1.35.0-wmf.20', 'patchset': 1, 'ref': 'Z519642dbd2bc430cbb826041285360dd'}]}
2020-02-18 18:23:37,381 DEBUG zuul.IndependentPipelineManager: Reporting change <Change 0x7fae44aaa590 572931,1>
2020-02-18 18:23:37,382 DEBUG zuul.IndependentPipelineManager: No jobs for change <Change 0x7fae44aaa590 572931,1>
2020-02-18 18:23:37,382 DEBUG zuul.IndependentPipelineManager: Removing change <Change 0x7fae44aaa590 572931,1> from queue
2020-02-18 18:23:37,382 DEBUG zuul.IndependentPipelineManager: Finished queue processor: postmerge (changed: True)

Stealing this from @mmodell since I broke the job (or hit a Zuul bug, rather) with the recent Zuul layout changes.

With @mmodell's latest change: https://gerrit.wikimedia.org/r/575575 I can now trigger the job:

[thcipriani@contint1001 ~]$ zuul enqueue --trigger gerrit --pipeline post --project mediawiki/core --change 576439,1

Caused this build to trigger: https://integration.wikimedia.org/ci/job/train-deploy-notes/4190/console

Of course that build is failing :\

While I can run the clone operation locally on contint1001, I can't run the same operation in docker container:

[thcipriani@contint1001 ~]$ docker run --rm -it --user root --entrypoint /bin/bash docker-registry.wikimedia.org/releng/ci-src-setup-simple:0.2.1
...
root@95015c10eb0d:/# nc -vz contint1001.wikimedia.org -w 1 80                                                                                          
nc: connect to contint1001.wikimedia.org port 80 (tcp) timed out: Operation now in progress                                                            
nc: connect to contint1001.wikimedia.org port 80 (tcp) failed: Cannot assign requested address

While I can run the clone operation locally on contint1001, I can't run the same operation in docker container:

[thcipriani@contint1001 ~]$ docker run --rm -it --user root --entrypoint /bin/bash docker-registry.wikimedia.org/releng/ci-src-setup-simple:0.2.1
...
root@95015c10eb0d:/# nc -vz contint1001.wikimedia.org -w 1 80                                                                                          
nc: connect to contint1001.wikimedia.org port 80 (tcp) timed out: Operation now in progress                                                            
nc: connect to contint1001.wikimedia.org port 80 (tcp) failed: Cannot assign requested address

Can containers in CI run with --network=host ?

Change 582111 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[integration/config@master] train-deploy-notes: Get rid of regex anchor

https://gerrit.wikimedia.org/r/582111

Change 582111 merged by jenkins-bot:
[integration/config@master] train-deploy-notes: Get rid of regex anchor

https://gerrit.wikimedia.org/r/582111

Mentioned in SAL (#wikimedia-releng) [2020-03-20T22:22:31Z] <James_F> Zuul: Unanchor the train-deploy-notes job regex filter 582111 T243330