Page MenuHomePhabricator

L10n-bot should not force-merge / override Jenkins (breaks the build)
Closed, DeclinedPublic

Description

l10n-bot self-force-merging sometimes breaks mediawiki/core master. Example from today: https://gerrit.wikimedia.org/r/#/c/194725/

Yes, it sucks that our tests depend on the values of localisation messages in random languages, but that's the situation. This has happened in the past and will happen again in the future, and it's very annoying every time.

Please make the bot just vote C+2, without V+2, and let the tests run. If a commit fails the tests, anyone can amend it and fix up.

Related Objects

StatusSubtypeAssignedTask
DeclinedNone
ResolvedLegoktm
ResolvedLegoktm
OpenNone
ResolvedJeroenDeDauw
Resolved Gilles
Resolved santhosh
DuplicateNone
Resolvedkaldari
ResolvedFlorian
ResolvedUmherirrender
ResolvedHe7d3r
Resolvedhashar
DeclinedNone
ResolvedLegoktm
ResolvedKrinkle
ResolvedKrinkle
ResolvedKrinkle
ResolvedNone
Resolvedhashar
DuplicateNone
ResolvedKrenair
ResolvedKrenair
ResolvedPRODUCTION ERRORUmherirrender
DuplicateNone
Resolvednikitavbv
ResolvedMihir-thakkar
ResolvedMtDu
Resolvednikitavbv
Resolvednikitavbv
ResolvedReedy
ResolvedMtDu
ResolvedYaron_Koren
ResolvedAnomie
ResolvedMtDu
ResolvedReedy
Resolveddivadsn
Resolveddivadsn
ResolvedDatGuy
ResolvedDatGuy
DeclinedDatGuy
Resolvedhashar
ResolvedUmherirrender
ResolvedDatGuy
ResolvedUmherirrender
ResolvedDatGuy
ResolvedDatGuy
ResolvedUmherirrender
ResolvedDatGuy
ResolvedDatGuy
ResolvedUmherirrender
ResolvedDatGuy
ResolvedUmherirrender
DeclinedNone
ResolvedDatGuy
ResolvedDatGuy
Resolvednikitavbv
ResolvedDatGuy
ResolvedPaladox
ResolvedUmherirrender
OpenNone
Declinedhashar
Declinedhashar
Declinedhashar
ResolvedAndrew
Resolvedhashar
Declinedhashar
ResolvedKrinkle
Resolvedhashar
ResolvedDzahn
Resolvedhashar
Resolvedhashar
ResolvedAndrew

Event Timeline

matmarex raised the priority of this task from to Needs Triage.
matmarex updated the task description. (Show Details)
matmarex subscribed.

Change 194990 had a related patch set uploaded (by Legoktm):
Don't ignore l10n-bot

https://gerrit.wikimedia.org/r/194990

Change 194992 had a related patch set uploaded (by Legoktm):
Don't self-merge l10n-bot commits, only CR 2

https://gerrit.wikimedia.org/r/194992

I fully agree we should make it run the tests.

I'm concerned about the non-trivial stress this might put on the continuous integration system, though. As they typically sync these overnight all at once. That's a lot of jobs run at once and may clog the queue for several hours, if not more.

I'd recommend we first gather statistics of how many Gerrit repos this currently, roughly, affects.

For reference, in https://gerrit.wikimedia.org/r/#/c/102636/ @hashar blacklisted l10n-bot.

As for how many repos this affects, the following is how many merged commits l10n-bot uploads per day since mid-November:

12014-11-12: 36
22014-11-13: 53
32014-11-14: 62
42014-11-15: 77
52014-11-16: 63
62014-11-17: 26
72014-11-18: 28
82014-11-19: 62
92014-11-20: 1
102014-11-21: 60
112014-11-22: 63
122014-11-23: 44
132014-11-24: 34
142014-11-25: 43
152014-11-26: 76
162014-11-27: 2
172014-11-29: 61
182014-11-30: 79
192014-12-01: 39
202014-12-02: 48
212014-12-03: 48
222014-12-04: 26
232014-12-05: 37
242014-12-06: 32
252014-12-07: 39
262014-12-08: 49
272014-12-09: 29
282014-12-10: 40
292014-12-11: 87
302014-12-13: 118
312014-12-14: 63
322014-12-15: 27
332014-12-16: 42
342014-12-17: 44
352014-12-18: 69
362014-12-19: 33
372014-12-20: 75
382014-12-21: 4
392014-12-22: 67
402014-12-23: 42
412014-12-25: 2
422014-12-26: 47
432014-12-27: 39
442014-12-28: 33
452014-12-30: 57
462015-01-02: 71
472015-01-03: 46
482015-01-04: 48
492015-01-05: 33
502015-01-06: 43
512015-01-07: 48
522015-01-08: 54
532015-01-09: 66
542015-01-10: 31
552015-01-11: 31
562015-01-12: 48
572015-01-13: 30
582015-01-14: 62
592015-01-15: 58
602015-01-16: 34
612015-01-17: 5
622015-01-18: 73
632015-01-19: 60
642015-01-20: 58
652015-01-21: 31
662015-01-22: 52
672015-01-23: 72
682015-01-24: 3
692015-01-25: 86
702015-01-26: 44
712015-01-27: 35
722015-01-28: 36
732015-01-29: 97
742015-01-30: 48
752015-02-02: 73
762015-02-03: 41
772015-02-04: 45
782015-02-05: 26
792015-02-06: 45
802015-02-07: 40
812015-02-08: 49
822015-02-09: 26
832015-02-10: 27
842015-02-11: 48
852015-02-12: 48
862015-02-13: 45
872015-02-15: 2
882015-02-16: 69
892015-02-17: 44
902015-02-18: 66
912015-02-19: 66
922015-02-20: 37
932015-02-21: 60
942015-02-22: 29
952015-02-23: 57
962015-02-24: 60
972015-02-25: 63
982015-02-26: 75
992015-02-27: 49
1002015-02-28: 66
1012015-03-01: 68
1022015-03-02: 55
1032015-03-04: 1
1042015-03-05: 108
1052015-03-06: 3

It might be easier to just revert 02364f56df90ca9e64a3d63fd81c52643e4f9e97 : Raimond used to act on test failures IIRC, as L10n-bot is always run and checked manually by Raimond when he has some spare time during the day (thanks Raimond!).

L10n-bot was blacklisted because at some point some devs working in PST found that every day around lunch time (?) all merging and testing would be delayed. Rather than the daily count, perhaps the hourly peak is more interesting. Adding more capacity, or a dedicated queue, would be nice.

Nemo_bis triaged this task as Medium priority.Mar 7 2015, 9:02 AM

Would it help to change the time when I do the export from translatewiki.net? Currently I export and submit nearly every day between 19:00 and 22:00 UTC. I could change this to the European morning, this means between 8:00 and 12:00 UTC. In this case I suggest the time of the Localization cron job should be changed to 13:00 UTC. This way the jobs do not conflict with working time in California. But maybe more with the Wikidata team in Berlin?

Hm.. If it's automated and could be any time of day, then perhaps between 03:00 and 06:00 UTC. That would match 04:00 - 07:00 CET (UTC+1; Amsterdam, Berlin), and 17:00 - 20:00 PST (UTC -8, San Francisco).

Or between 16:00 and 19:00 UTC. That would match 17:00 - 20:00 CET and 06:00 - 09:00 PST.

I haven't looked back at the l10n-bot configuration in Zuul, but what could be done is:

  • blacklist bot from test / check
  • make sure the bot reacts to gate-and-submit
  • have the bote vote CR+2 and not merge

This way when patches are sent they do not trigger any tests, on +2 they get the test and are magically merged by Zuul if the tests pass.

Note that the extra time spent in CI (~ 5 minutes) is enough to occasionally cause race conditions where, after the l10n patch is pushed to Gerrit, another commit already approved and being tested gets merged first. If that patch also changes i18n files, I wonder what the process is from there for @Raymond. Does it require manual work? Or would that update just fail and the next nightly import/export covers the gap?

The export is not fully automated, no.

Or between 16:00 and 19:00 UTC. That would match 17:00 - 20:00 CET and 06:00 - 09:00 PST.

Because there is manual work to do 03:00-06:00 UTC is impossible, 16:00-19:00 UTC would be possible for most of the days (depends on my personal schedule/working level).

Note that the extra time spent in CI (~ 5 minutes) is enough to occasionally cause race conditions where, after the l10n patch is pushed to Gerrit, another commit already approved and being tested gets merged first. If that patch also changes i18n files, I wonder what the process is from there for @Raymond. Does it require manual work? Or would that update just fail and the next nightly import/export covers the gap?

Such conflict happens from time to time now too. When I sync the exports back with translatewiki production repo after the merges I identify such conflicts and if necessary I revert the export patch set.

I am fine having the l10n update changes be pushed between 19:00 and 22:00 UTC, though I am not sure why they need to occur daily :-D

As I wrote above we should skip running tests from check* / test Zuul pipelines, have l10n-bot only vote CR+2 and let gate-and-submit run the jobs and merge the change for us. @Legoktm change https://gerrit.wikimedia.org/r/#/c/194990/ is on this track.

@Raymond can you prepare a patch that makes l10n-bot only vote CR+2 instead of CR+2/V+2/submit ? We can them pair together one of the european evening to update both tools at the same time and test the behavior together. Any day would work for me after 9pm CET.

@hashar I will prepare tomorrow (Wednesday) such a patch set.

Change 194992 merged by jenkins-bot:
Don't self-merge l10n-bot commits, only CR 2

https://gerrit.wikimedia.org/r/194992

Change 194990 merged by jenkins-bot:
Don't ignore l10n-bot in gate-and-submit pipeline

https://gerrit.wikimedia.org/r/194990

@Raymond it seems the wednesday run used CR+2 and got changes merged via successful tests. Is there anything else to accomplish?

Related to this task: Since the change I get emails from watched gerrit projects about submitted l10n changes. This wasn't before (except changes manually approved).
Is there a way to ignore L10n-bot again (particularly changes owned by l10n and submitted by jenkins)?

With https://gerrit.wikimedia.org/r/#/c/198184/, every extension and skin should now have jenkins running basic phplint and merging changes.

Update: l10n-bot is currently being ignored again because jenkins machines were running out of disk space when extensions that hadn't had tests ever triggered before were suddenly cloning mediawiki/core a bunch of times. Adding relevant blockers which we need to resolve first.

It might be easier to just revert 02364f56df90ca9e64a3d63fd81c52643e4f9e97 : Raimond used to act on test failures IIRC, as L10n-bot is always run and checked manually by Raimond when he has some spare time during the day (thanks Raimond!).

I think the test confirmed my hypothesis. L10n-bot should still C+2 and V+2, but tests can be run alongside (capacity permitting, perhaps in a special queue) to file bugs in the process.

I am kind of lost with past week events. It seems the current situation is l10n-bot is now being ignored completely because the mediawiki extensions jobs fill the disk with copy of MediaWiki core.

I am thus marking this bug as being blocked by T93703: reduce copies of mediawiki/core in workspaces.

Krinkle renamed this task from l10n-bot self-force-merging sometimes breaks mediawiki/core master to L10n-bot should not force-merge / override Jenkins (breaks the build).Jul 8 2017, 2:26 AM
Krinkle subscribed.

Yes, it sucks that our tests depend on the values of localisation messages in random languages, but that's the situation.

How hard would it be for those tests to force lang qqx which has a stable output?

It probably wouldn't be hard, but it would be tedious, since you'd probably have to update the expected values in each test from English text to the message keys.

We can just fix them as we find, I guess. Eventually it will get rare enough to not to bother people. I.e. when was the last time this happened?

I am not advocating that we skip tests for l10n updates... however that would risk patches getting stuck in gerrit and perhaps nobody even noticing. It should not be the responsibility of translatewiki.net people to fix those things.

If someone can find automatically tests which rely on translations (btw. we do not export English) we can make those as good first task and/or GCI tasks.

Change 462460 had a related patch set uploaded (by Hashar; owner: Reedy):
[translatewiki@master] Disable automatic V+2/submit for l10n-bot

https://gerrit.wikimedia.org/r/462460

Change 462460 merged by jenkins-bot:
[translatewiki@master] Disable automatic V+2/submit for l10n-bot

https://gerrit.wikimedia.org/r/462460

chasemp claimed this task.
chasemp subscribed.

tentatively resolving this as part of T205563

Surely this is not fixed, because just yesterday, L10n-bot broke the build with https://gerrit.wikimedia.org/r/c/mediawiki/core/+/465210 (I fixed it in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/465299).

L10n-bot is only triggering the special i18n checker thing right now, we still need to run the full test suite against it.

Thanks, I misunderstood the scope here. Leaving open :)

Urbanecm subscribed.

L10n-bot is only triggering the special i18n checker thing right now, we still need to run the full test suite against it.

I was thinking _why_ dummy job was used in all repositories. I guess this is something that is managed by integration/config.

Suggesting to decline because:

  • I haven't heard many complaints of i18n updates breaking tests anymore
  • The servers could melt under the increased testing load (we generate hundreds of patches and push them all at once)
  • Nobody aint got time to manually merge all the patches failing to merge due to intermittent test failures
  • GitLab

Hmm, fine, with 2 occurrences in 8 years this is probably not worth spending time to fix. I must have thought that the problems would be more common when reporting this. Humans break the build way more often than that.