Page MenuHomePhabricator

mwext-codehealth-master-non-voting erase doc.wikimedia.org coverage report
Closed, DuplicatePublicBUG REPORT

Description

Some extensions' code coverage reports fail with Empty directory!

See, eg, https://doc.wikimedia.org/cover-extensions/GlobalWatchlist/ (intermittently)

The reason is the postmerge job mwext-codehealth-master-non-voting which publish to doc.wikimedia.org although it only generates a single file clover.xml. That causes the HTML report generated by mwext-phpunit-coverage-docker-publish to be erased.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
DannyS712 changed the subtype of this task from "Task" to "Bug Report".

On doc1001 /srv/docroot/org/wikimedia/doc/cover-extensions/GlobalWatchlist/ only has the clover.xml file.

Some coverage seems to have been generated:

php -d zend_extension=xdebug.so /workspace/src/tests/phpunit/phpunit.php --testsuite extensions --coverage-clover /workspace/log/clover.xml --coverage-html /workspace/cover --log-junit /workspace/log/junit.xml /workspace/src/extensions/GlobalWatchlist/tests/phpunit
Using PHP 7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1
PHPUnit 8.5.8 by Sebastian Bergmann and contributors.

...................................                               35 / 35 (100%)

Time: 4.1 seconds, Memory: 48.25 MB

OK (35 tests, 72 assertions)

Generating code coverage report in Clover XML format ... done [24 ms]

Generating code coverage report in HTML format ... done [21 ms]

And there is definitely an index.html file:

+ test -f /workspace/cover/index.html
+ '[' -s /workspace/log/clover.xml ']'
+ cp /workspace/log/clover.xml /workspace/cover/clover.xml

The publish-to-doc1001 seems to use the proper paths:

Fetching from:
- Instance...: 172.16.7.210
- Workspace..: /srv/jenkins/workspace/workspace/mwext-phpunit-coverage-docker-publish
- Subdir.....: cover
+ rsync --archive --compress '--rsh=/usr/bin/ssh -a -T -o ConnectTimeout=6 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' jenkins-deploy@172.16.7.210:/srv/jenkins/workspace/workspace/mwext-phpunit-coverage-docker-publish/cover/. .

Creating remote directory cover-extensions/GlobalWatchlist
sending incremental file list

sent 99 bytes  received 26 bytes  250.00 bytes/sec
total size is 0  speedup is 0.00
Publishing ...
+ rsync --archive --compress --delete-after . rsync://doc1001.eqiad.wmnet/doc/cover-extensions/GlobalWatchlist

Published at https://doc.wikimedia.org/cover-extensions/GlobalWatchlist/

But it does not send much informations?!

I ran the job again adding some debug statement:

+             find . -name cover
+             find . -name index.html

And it worked fine this time: https://doc.wikimedia.org/cover-extensions/GlobalWatchlist/

Everything looks fine as far as I can tell, at least in how the coverage is generated, directories paths and logic. The publish-to-doc1001 job runs on the contint1001 / contint2001 machine and to ensure a clean state, the job uses the Jenkins workspace cleanup plugin which shows up in the console log:

[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Deferred wipeout is used...

I don't know what that deferred deletion is, but it is possible it ends up being deferred after we have fetched the files from the CI agents and before we push them to doc1001.

Change 617097 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] jjb: enhance rsync output in publish-to-doc1001

https://gerrit.wikimedia.org/r/617097

The HTML coverage was there sometimes I loaded the page, but often the CSS would fail to load. And then everything would vanish as well.

One of the rsync logs (from my IRC scrollback) had:

23:35:40 sent 101 bytes  received 26 bytes  84.67 bytes/sec
23:35:40 total size is 0  speedup is 0.00

which was suspicious to me - how could the total size be 0? Or am I misinterpreting the rsync output?

The HTML coverage was there sometimes I loaded the page, but often the CSS would fail to load. And then everything would vanish as well.

Maybe because some files were served by the frontend cache even though the files were no more in the document root.

One of the rsync logs (from my IRC scrollback) had:

23:35:40 sent 101 bytes  received 26 bytes  84.67 bytes/sec
23:35:40 total size is 0  speedup is 0.00

which was suspicious to me - how could the total size be 0? Or am I misinterpreting the rsync output?

That output is generated by a rsync command which sync an empty directory in order to create it and it is run with --verbose:

echo "Creating remote directory ${WMF_CI_PUB_DEST}"
tmpdir=$(mktemp -d)
(
   cd "$tmpdir"
   mkdir -p "${WMF_CI_PUB_DEST}"
   # Sync that empty dir WITHOUT deletion and with relative. That
   # creates the directories at the destination.
   #
   # We do not --archive which preserves date, time or permissions.
   # The base directory (such as ./cover) might have been populated by
   # puppet and thus owned by a different user than rsyncd.
         vvvvvvvvv
   rsync --verbose --recursive --relative "${WMF_CI_PUB_DEST}" rsync://doc1001.eqiad.wmnet/doc/
         ^^^^^^^^^
)
rm -R "$tmpdir"

I am dropping the parameter with https://gerrit.wikimedia.org/r/617097 and makes the rsync commands that fetches and pushes to be a bit more verbose (with --stats).

Change 617097 merged by jenkins-bot:
[integration/config@master] jjb: enhance rsync output in publish-to-doc1001

https://gerrit.wikimedia.org/r/617097

hashar triaged this task as Medium priority.Jul 30 2020, 10:02 AM

https://integration.wikimedia.org/ci/job/publish-to-doc1001/ now has some more details. Gotta wait until we catch a build that empty up the coverage report and investigate when that occurs :-(

I regenerated the MobileFrontend coverage report just fine.

The one for MachineVision only has a clover.xml which states it got generated at 1596063569 which is Wednesday, July 29, 2020 10:59:29 PM

That would match the change https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MachineVision/+/616921 which ran A SINGLE JOB: https://integration.wikimedia.org/ci/job/mwext-codehealth-master-non-voting/ and that rsync to the cover directory as well without generating any HTML report.


If one look at the latest merged change for GlobalWatchlist: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GlobalWatchlist/+/617307 it ran:

mwext-phpunit-coverage-docker-publishHas clover.xml and the HTML report
mwext-codehealth-master-non-votingOnly has clover.xml

Given mwext-codehealth-master-non-voting has no HTML report, if it is executed last that empty up the coverage file. There should be only ONE job publishing the coverage. The codehealth one should not publish to doc.wikimedia.org

hashar renamed this task from Some extensions' code coverage reports fail with `Empty directory!` to mwext-codehealth-master-non-voting erase doc.wikimedia.org coverage report.Jul 30 2020, 10:32 AM
hashar updated the task description. (Show Details)

Why is a "non-voting" job running in postmerge? Seems like it makes more sense to have the normal phpunit-coverage job take care of publishing since it's already working and let codehealth just do whatever it needs and not publish.

Change 617453 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Avoid duplicate codecoverage report

https://gerrit.wikimedia.org/r/617453

I looked at a build of mwext-codehealth-master-non-voting for EntitySchema and it runs:

+ php -d zend_extension=xdebug.so vendor/bin/phpunit --testsuite=extensions:unit --exclude-group Dump,Broken,ParserFuzz,Stub --coverage-clover /workspace/log/clover.xml --log-junit /workspace/log/junit.xml
PHPUnit 8.5.8 by Sebastian Bergmann and contributors.

.........................                                         25 / 25 (100%)

Time: 942 ms, Memory: 20.00 MB

OK (25 tests, 34 assertions)

Generating code coverage report in Clover XML format ... done [577 ms]
+ set -e
+ '[' -f /workspace/log/junit.xml ']'

So seems codehealth does not generate the HTML report at all for some reason. So I guess I will make it stop publishing the coverage report as Kunal suggested above.

Change 617453 abandoned by Hashar:
[integration/config@master] Avoid duplicate codecoverage report

Reason:
Will do it the other way around, namely stop publishing from the codehealth job (essentially reverting https://gerrit.wikimedia.org/r/c/integration/config/ /514016 ).

https://gerrit.wikimedia.org/r/617453

Change 619370 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Revert "Publish coverage reports from extension-codehealth"

https://gerrit.wikimedia.org/r/619370

Times flow so fast.

The change https://gerrit.wikimedia.org/r/619370 was doing too many thing at once. I made it simpler and it now just cause the job to stop invoking the publish step but leaves everything else as is.

Change 619370 merged by jenkins-bot:
[integration/config@master] Revert "Publish coverage reports from extension-codehealth"

https://gerrit.wikimedia.org/r/619370