Epic catalog of error codes is evolving here: https://www.mediawiki.org/wiki/Extension:FileImporter/Errors
I would actually object to this: imagine your change has caused multiple test failures that you weren't able to predict in your dev environment (because you didn't have all extensions installed or your environment is otherwise different from our CI). You'll have to amend your PR with one fix at a time and push it just to see what explodes next.
Sat, Jun 15
- using --stop-on-failure for PHPUnit (and whatever the equivalent is for QUnit and Selenium) for gate-and-submit might be interesting as a means to speed up the entire cycle when a flaky build happens
Fri, Jun 14
FWIW, WMF is slowly moving away from statsd in favor of Prometheus. I'm not sure what the Mediawiki plan is. @fgiunchedi
Notes on next implementation steps:
- Our exceptions should all respond to getCode, with a string constant naming the error type.
- Take recoverable errors out of the planning phase error count, report it separately.
- FileImporter.import.result.exception' is missing two errors, "bad edit token" and "bad import hash". Move import submit stats logging up to the exception handler to catch these.
- Drop the "plan" vs. "submit" distinction, this doesn't seem to matter.
- Report time taken to fail?
Notes from discussion:
- Definitely go ahead with creating a MediaWiki page to document what the buckets are and what specific errors are in each one.
- Link to our documentation from the "i" information popup on the failure graph, and summarize to help users with interpretation.
- Minimum granularity for this task is to split out unrecoverable from recoverable errors, and to expose exact counts for AbuseFilter matches.
It will be live in around half an hour everywhere; sometime after that, please check that you can get to the hosts you expect.
To say it out loud, it looks like the liuggio/statsd-php-client is no longer maintained. A question about releases, https://github.com/liuggio/statsd-php-client/issues/55 has been sitting for 8 months. The last commit was in 2016. We could fork it and take over maintenance, or start with a new library. I suggest that we wrap whatever we use with our own interface, so that swapping out libraries is easier in the future. We're currently basing our interface hierarchy directly on the liuggio code.
Thu, Jun 13
The grafana dashboard shows all these errors now, and sums correctly over the past 24hr. Next steps are to distinguish between recoverable and unrecoverable errors during the planning stage. Then we can provide more granularity where desired--probab best if @Lea_WMDE looks over the dashboard to help scope where we need this granularity.
I'm getting a little confused, so will leave some breadcrumbs about where various error stats come from in the current code:
- Permission and user block errors when first opening the special page are tallied as MediaWiki.FileImporter.specialPage.execute.fail.*, which should be used with care because it almost overlaps with the ...fail.plan.* below.
- Errors when building the ImportPlan report one MediaWiki.FileImporter.specialPage.execute.fail.plan.total and one MediaWiki.FileImporter.specialPage.execute.fail.plan.byType.*
- Importer::import called when action=submit will record a Mediawiki.FileImporter.import.result.exception
This should be ready to go, just waiting for config deployment which won't happen until at least next week (June 17th). We should sign up for a SWAT deployment once the new calendar is posted.
I'm having second thoughts about this request, because I'm no longer see that I'll be useful in this role. Building and publishing images is a straightforward process, and the burdensome work is to write and test patches in integration-config/dockerfiles (something I can do without privileges), and bumping image versions which is just part of deployment.
Woohoo, nice work!
Wed, Jun 12
The "recheck bot" is an interesting twist--I like the idea of automatically rechecking if there was an external error e.g. network glitch, but it makes me uncomfortable to think about rechecking due to flapping tests. What would be nicer is a fine-grained mask for interpreting test results, basically a way to quickly flag certain tests as broken at the CI level, without having to edit and merge code. This would let us provisionally V+2 patches only affected by flappers.
The patch works as intended, now. Only two minor changes: I had to change the action=edit request to a POST, and we needed an intermediate call to obtain a csrftoken. This approach will be fine!
@Daimona Hi, thanks for the offer! The docs for FileImporter start here, but long story short we're importing File pages including their entire revision history, moving to commonswiki for the Wikimedia use case. We call AbuseFilter on each revision by manually invoking the EditFilterMergedContent hook.
I used the test beta wiki, moving this file from de.wikipedia.org there: https://test.wikimedia.beta.wmflabs.org/wiki/File:Rajamangala_University_of_Technology_Rattanakosin_Salaya_Campus_Stadium.jpg
What I received was the message if we do not know what the template is, but I was expecting to be suggested using Vorlage:NowCommons, because this is what the Wikidata item has in store for de.wikipedia.org
Tue, Jun 11
Local vagrant environment with the centralauth role enabled is behaving well, I'll use that to debug our proof-of-concept patch tomorrow.
I've partially reverted my edits and only removed Information -> Template this time.
@Pikne Thanks for the quick bug report! I'll revert my bad edits right away.
This was more like 2 story points.
Mon, Jun 10
I can confirm the work that's already been done on this ticket. As Thiemo pointed out, the phpunit-suite-edit script is responsible for constructing the coverage whitelist, and for the Translate extension none of these directories exist.
T211702, T211703, and this task all seem slightly at odds with each other. If we clone everything before running quibble, then we lose the opportunity to optimize for the fast-fail on linting. If CI runs npm and composer test, then quibble shouldn't be doing the cloning, etc. We need to refine this plan before implementing, IMHO.
Sun, Jun 9
I'm imagining we might specify the process dependency graph in terms of abstract job milestones, so rather than couple tasks directly e.g. (repo_npm_test && repo_composer test -> clone dependencies), we would say (repo_npm_test && repo_composer_test -> REPO_TESTS_DONE) and (REPO_TESTS_DONE -> clone dependencies -> ALL_TESTS_READY) then (ALL_TESTS_READY -> ext_skin_composer_test), etc.
I wanted to give this a try, as a way to work through some questions that I've been stewing on regarding my Command object refactor. How to represent the higher-level stages? My suspicion is that stages are just a special case of dependency relationships between the fine-grained steps, and there should actually be several stages. One especially quirky point is that I don't want to plan the entire job until after cloning repos, because we can't analyze whether certain steps (e.g. npm run selenium-test) make any sense until the code is present. Here are some thoughts about the stages generally (bigger scope than this task):
- Clone ZUUL_PROJECT
- Analyze what steps can be taken to test this repo (look for composer.json, package.json...) and plan first stage.
- First test stage (T221702), all steps in parallel
- composer test
- npm install && npm test
- Clone all dependencies
- Analyze what can be tested (e.g. look for tests/selenium in each repo) and plan second stage.
- Second test stage, some steps can be in parallel
Sat, Jun 8
Fri, Jun 7
There are at least two easy speedups for npm install, the first is documented in subtask T225330: Commit package-lock.json files everywhere, which just became possible thanks to the npm 10 upgrade. The second is --prefer-offline, which only fetches packages when they are missing from the local cache. We should experiment with both.
Thu, Jun 6
I like this proposal, but I'm concerned that non-voting tests will become meaningless, wasting CI resources for little value, and discouraging developers from writing browser tests in the future. Here are a few alternative approaches we might consider,
- Make it easier to run browser tests locally, especially the full gate-and-submit suite which helps find unwanted interactions between extensions. Just as we shouldn't be pushing patches without linting and running unit tests locally, we should also be able to do these more complex tests. Quibble is a big step towards this goal, we might just be a small amount of extra glue away from a simple, local "test everything".
- Run CI browser tests in parallel. (Task already filed?)
- Better social conventions around flapping tests. I'm pretty sure most of us were being inconvenienced by the same flaky tests. Rather than recheck, maybe the first response should be to mark the test as broken. This would be aided by a hit list of the top offending tests, or other monitoring. (T224673, T225193, T225162)
- Hourly or daily regression suite which is capable of bisecting or something, to identify which specific patches broke the build even after merge. This is less than ideal since it lets us deploy broken code.