I just noticed this also makes cleaning up spam very difficult. Since some jobs don't get run, the entries remain in the recent changes list. Nuke does not realize that some pages have already been deleted so it keeps offering the same pages. The recent changes list also keeps showing pages that have already been deleted as new pages.
My error log was just flooded with almost a thousand lines just because I deleted a handful of pages. 100% reproducible for me on master.
There has been no change to AppleFFS in Translate, so there is no stringsdict support yet, so this isn't yet resolved.
Fix now deployed and confirmed both in WMF and translatewiki.net.
MessageGroups::expandWildcards has been optimized (should do a new profiling run to check the current status). Duplicate get(): issue has not yet been fixed.
Fri, May 26
Wed, May 24
Tue, May 23
It is not trivial, but I am not so sure it wouldn't actually be helpful and better in the long run.
I'll state the perhaps obvious, that the other option is to lower the export threshold for core even further. Like, if we really want to go to the way of encouraging translators, we could lower the export threshold to much smaller, and guard their visibility with $wgShowLanguagesWithMinimalLocalisation, disabled by default, but could be enabled for Incubator and so on.
But jquery.uls.grid is already enabled for mobile when we were fixing Special:Translate on mobile site. I don't remember when I last checked with the main page, maybe that part works now.
I believe the main bottleneck is the understanding and execution of our language approval criteria. As far as I can see, the patch creation and merging hasn't been a problem. At most it takes a few days to go through the process. Documenting the criteria better, simplifying where possible, and letting people know upfront what information is required, would imho be a good starting point.
Might be caused by my recent refactorings, or at least now they are much more common. I hope to investigate this soon.
Do you mean the characters in the page content, or in the page title?
I would say the expected behavior is that no exceptions get thrown :D How can we find out what those exceptions are and fix them?
This was first reported on Saturday, but it could have started earlier as well.
Sun, May 21
I did some research on what limit would be suitable. I found out that we have quite a lot of legitimate blobs over three megabytes, some even over 5 megabytes. One of them was a huge table. It is not possible to just set a hard limit that would not cause false positives while still being useful. Doing this would require changes in the UI as well (to disallow translation of huge sections) or some logic to compare source and target blob size. Latter would be complicated since we don't necessarily have the source and target blobs around to compare them, so I would lean on dropping this task.
Sat, May 20
Fri, May 19
Wed, May 17
We are back online, with some slowness. We have identified multiple actionables based on this experience.
Around 1300Z first our main server went unreachable, then few minutes after our secondary server went unreachable. 1343Z our secondary server came up, but not the primary one. We can access the console via our control panel. First we saw hhvm failing to start, but after disabling it we noticed that networking does not come up. Running dhclient -v directly shows DHCPDISCOVERs going out but no replies going in. The configuration for our two servers is pretty much the same, so we filed a support request with them.
Tue, May 16
You can click on ”Training modules: Keeping events safe” to see and access the list of pages it contains. Is this sufficient for your task?
Sat, May 13
Your patch does not touch the TUX interface at all, only the legacy interface, so I don't think it fixes this bug.
Thu, May 11
Since I could not figure how to access old wikitext editor easily anymore, another workaround is to do the edit in the source editor, but switch to visual editor before saving.
Wed, May 10
I didn't see anything in release notes nor recent code changes. No recent changes in TranslateSandbox, so I suspect a core change. Though I guess the fix needs to be done in Translate.
Tue, May 9
Makes sense. I think S is the only shortcut that needs to be re-assigned when documentation editor is open.
Fri, May 5
Since it is hard to detect when LU is or isn't working, we could add one dummy message key to MediaWiki core which contains the timestamp of last export in a given language. Then one could just look at the timestamp (and compare to what is in git if necessary).
Is a system that works without tags already being worked on ? Is there a timeline?
Thu, May 4
Wed, May 3
It's not $wgInvalidRedirectTargets because it still shows as a redirect (which the pages listed in there don't do), and I don't see that option customised anywhere.
Delaying it until save button is clicked would be counterproductive, as the editor is hidden when the button is clicked. The translator would not see the notice.
Tue, May 2
Cross-linking my reply in the wiki: https://www.mediawiki.org/w/index.php?title=Topic:Tp48n8pqrvakvug6#flow-post-tprzuha1buolqpfj
Not directly related, but we are currently not using all of the data that we log. Logging of some events could be removed after a review what is needed.
Apr 28 2017
I filed T164050: Enforce blob size limits for draft save API as additional follow-up.
Santhosh found and fixed T163105: CX template editor's own HTML may end up in the published as an HTML blob by investigating the slow queries in P5316. This fits well with the observation of increased database traffic during the incident.
Apr 27 2017
I logged in yesterday with "Remember me" and today I had to log-in again, so doesn't seem to be fixed.
@Krinkle It doesn't fit with MLEB's release policy. Ideally we should wait until 1.30 is released, but at lest until 1.29 is released. Alternatively, make the inclusion of the shim conditional.
Apr 26 2017
There seems to be an complication, perhaps unnecessary, that the timestamp is part of the unique index. I believe this was made so that we could in future store the history as well. But for now we are not keeping history, and instead doing extra effort for finding an existing row (if any) and replacing its contents or inserting a new row.
Apr 25 2017
Lowering priority as CX is currently back (we are monitoring and ready to disable though). Finding the root cause is important.
For what I can recall, it was made so because the wording "in other languages" does not make sense when there are no interwiki links. Fixing it might involve changing core and/or skins to have it done server side.
Apr 21 2017
I captured a few (3) draft save requests on my wiki and replayed them in an endless loop. I did not see any deadlocks and queries were completing relatively quickly even under load.
Apr 20 2017
Unfortunately I did not capture any of the queries that were long-running during the outage. However, the FOR UPDATE queries are very simple and should complete fast:
EXPLAIN SELECT * FROM `bw_cx_corpora` WHERE cxc_translation_id = '194' AND cxc_section_id = 'mwCA' AND cxc_origin = 'user' ORDER BY cxc_timestamp DESC LIMIT 1 FOR UPDATE;