Page MenuHomePhabricator

Migrate language-screenshots-VisualEditor off of Nodepool to Docker containers
Closed, ResolvedPublic

Description

Have to migrate the job https://integration.wikimedia.org/ci/job/language-screenshots-VisualEditor/ off of Nodepool and use Docker containers instead.

It uses a matrix job that populate the axis to be build from yaml files in mediawiki/extensions/VisualEditor. Then for each combination it does something like:

npm install
node_modules/.bin/grunt screenshots-all
bundle install
bundle exec upload

The CI docker container expect to use a npm script or a rake task. So the first step is to add a npm script for 'grunt screenshots-all' and a task for 'upload'.

Event Timeline

Change 416954 had a related patch set uploaded (by Hashar; owner: Hashar):
[mediawiki/extensions/VisualEditor@master] build: npm/rake entry point for screenshots upload

https://gerrit.wikimedia.org/r/416954

Change 416958 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate VisualEditor screenshot job to Docker

https://gerrit.wikimedia.org/r/416958

And I have filled a couple unrelated issues upstream: https://github.com/amire80/commons_upload/issues

It is unlikely that either @Amire80 or me would have the time to work on commons_upload. It is more likely that I will rewrite it in node (T139747: Port script that uploads visual editor screenshots to javascript).

And eventually I have added a couple pull requests:

File duplicate causes an exception instead of being skipped https://github.com/amire80/commons_upload/issues/10

Uploading xxx.png messages progress are output buffered https://github.com/amire80/commons_upload/issues/11

You are welcome :] We will release a new version of commons_upload which would then handle file duplicates. I guess from there we will be able to run the job more often.

The trivial VE patch https://gerrit.wikimedia.org/r/#/c/416954/ is pending, but it is going to be merged eventually. Then we can switch the job to Docker.

Change 416954 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] build: npm/rake entry point for screenshots upload

https://gerrit.wikimedia.org/r/416954

Change 417739 had a related patch set uploaded (by Zfilipin; owner: Zfilipin):
[mediawiki/extensions/VisualEditor@master] build: Use new version of commons_upload Ruby gem for language screenshots

https://gerrit.wikimedia.org/r/417739

Change 417739 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] build: Use new version of commons_upload Ruby gem for language screenshots

https://gerrit.wikimedia.org/r/417739

How is this possible!? 😢

04:01:36.534 Uploading ./screenshots/VisualEditor_insert_table-ceb.png ... rake aborted!
04:01:36.535 MediawikiApi::ApiError: A file with this name exists already in the shared file repository. If you still want to upload your file, please go back and use a new name. [[File:VisualEditor_insert_table-xh.png|thumb|center|VisualEditor_insert_table-xh.png]] (fileexists-shared-forbidden)

https://integration.wikimedia.org/ci/job/language-screenshots-VisualEditor/BROWSER=chrome,PLATFORM=Windows%2010,label=DebianJessieDocker/66/console

https://commons.wikimedia.org/wiki/File:VisualEditor_insert_table-xh.png

Looks like commons upload only rescues exceptions if mwerr.code is fileexists-no-change, but in this case it was fileexists-shared-forbidden.

rescue MediawikiApi::ApiError => mwerr
  raise mwerr if mwerr.code != 'fileexists-no-change'

https://github.com/amire80/commons_upload/commit/9067e90574a802e43902e395ac6345c117f2854b

The error message says the problem is with [[ https://commons.wikimedia.org/wiki/File:VisualEditor_insert_table-xh.png | File:VisualEditor_insert_table-xh.png ]] but it looks to me that the problem is with the previous file, [[ https://commons.wikimedia.org/wiki/File:VisualEditor_insert_table-ceb.png | File:VisualEditor_insert_table-ceb.png ]], since it used to be a redirect to File:VisualEditor insert table-xh.png. 🤔

I took a quick look at ceb an xh screenshots and it looks to me that those languages do not have many visual editor strings translated. They should be removed from the job anyway.

About fileexists-shared-forbidden, we should probably just rescue that too and continue with the upload.

Change 419777 had a related patch set uploaded (by Zfilipin; owner: Zfilipin):
[mediawiki/extensions/VisualEditor@master] build: Use commons_upload v1.2.1 Ruby gem for language screenshots

https://gerrit.wikimedia.org/r/419777

Change 419777 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] build: Use commons_upload v1.2.1 Ruby gem for language screenshots

https://gerrit.wikimedia.org/r/419777

I have updated Ruby gem, merged it an re-run the Jenkins job, but now it fails 😢

00:00:02.597 FATAL: Command "git clean -fdx" returned status code 1:
00:00:02.597 stdout: Removing cache/bundle
00:00:02.597 Removing cache/_locks
00:00:02.597 Removing log/
00:00:02.597 Skipping repository src/
00:00:02.597 
00:00:02.597 stderr: warning: failed to remove cache/normalize-range/0.1.2/package/package.json
00:00:02.597 warning: failed to remove cache/normalize-range/0.1.2/package.tgz

https://integration.wikimedia.org/ci/job/language-screenshots-VisualEditor/BROWSER=chrome,PLATFORM=Windows%2010,label=DebianJessieDocker/67/consoleFull

After 6 hours and almost 27 minutes 😢

06:26:54.915 MediawikiApi::ApiError: The database has been automatically locked while the slave database servers catch up to the master (readonly)

https://integration.wikimedia.org/ci/job/language-screenshots-VisualEditor/BROWSER=chrome,PLATFORM=Windows%2010,label=DebianJessieDocker/69/console

I'm not even sure how that is possible, since the tool should just report the error and continue with the next file. I remember planing to test it, but I guess I have forgot to do it. 🤦‍♂️

Change 416958 merged by jenkins-bot:
[integration/config@master] Migrate VisualEditor screenshot job to Docker

https://gerrit.wikimedia.org/r/416958

There are certainly things to polish up the screenshot system and better handle errors on upload. At least the job is now relying on Docker and self cleanup the workspace at the end of the build.