Tue, Dec 11
Tue, Dec 4
Mon, Nov 26
The specific error is related about database replication: "Replication wait failed: MySQL server has gone away". There are other errors on this banner that seem linked to rebuilding the Translate extension MessageGroupStats index. See these Logstash entries.
Thanks, @MarcoAurelio! Indeed, this should be documented properly. The banner-protect right is used to protect translatable banner messages handled by the translate extension. The goal is to ensure that translation managers can approve and publish translations for those messages without permission to modify other banner content.
Thanks @MarcoAurelio and @Seddon! It seems that this is the right way to go. If I understand correctly, the i18n aspect would be mostly of use outside the WMF, on wikis that use CentralNotice and are in languages other than English... Is that correct?
Wed, Nov 21
Tue, Nov 20
Mon, Nov 19
Oct 31 2018
Oct 26 2018
@Jdforrester-WMF Thanks!! :)
Oct 25 2018
Oct 23 2018
Oct 22 2018
Here's the Jupyter notebook with the queries used:
Hi! I've found a pretty convincing indication that the problem was with old ChoiceData being sent to browsers.
Oct 19 2018
Hey... of course, apologies for the delays!!!
Oct 18 2018
Oct 17 2018
What this isn't:
- Not a more general Mediawiki or database issue, outage, or any known problem on the cluster.
- Not a campaign or banner configuration issue.
- Not a general CentralNotice outage.
- Not a data pipeline issue.
Checked a couple more things:
Oct 16 2018
Oct 15 2018
Also just re-checked CentralNotice logs... I don't see any changes in any of them around 18:40 on 2018-10-09, which is when the mobil frFR banner finally went out (according to Druid/Turnilo, see above).
Oct 11 2018
Thanks much for this!!!! I debugged through the merged patch locally. Unfortunately... it looks like it might not fix the problem, since, at least locally, the mergeable updates are never merged. This is because each is added to DeferredUpdates within the execution of a different AtomicSectionUpdate. (See the Gerrit change for more details.)
Oct 10 2018
Thanks!! Checked the data in Turnilo (formerly Pivot), and it shows the problem happened as described: https://bit.ly/2NzT6OW
@elukey so just to confirm, it's fine to go ahead and use the eventlogging_CentralNoticeImpression stream for this. Thanks so much again and apologies again for the delays in replying!!!
Oct 9 2018
Hi!!! Many apologies for the delay here...
Oct 3 2018
Oct 2 2018
Thanks @Pcoombe... That looks like a significant clue!! In the logs, the related server errors seem to come in clusters... That might be related to when campaigns and tests are running, though. There are other MessageCache issues now (see T203925), so we might want to look at them all together.
Sep 29 2018
Sep 28 2018
Currently, CentralNotice uses Revision::getContent() in the administration UI to show the content of a banner's translatable messages. So, as far as that UI is concerned, it's not essential to update MessageCache before the form returns. (Note: that part of the UI has been broken-ish for a while--see T72939.)
Sep 27 2018
Hi! Thanks so much, everyone!!
Sep 26 2018
Sep 25 2018
Hi @zeljkofilipin... Many apologies for not responding here, and thanks for the sample test!!!!! Can you say how urgent this is? Also, is the lack of update maybe related to recent failures on OSX for the legacy system? Thanks so much and apologies again!
This may well be a less severe symptom as the same problem that's causing T203925: Save times for changes to translation variable text in centralnotice paralysingly slow. (More details here: T203925#4609441.)
Ooops! I was mistaken... This is not necessary, since we currently don't get data (via either pipeline) from banner previews (makes much more sense, really).
Sep 24 2018
For contrast, the same mwrelp test on enwiki took about 2 seconds:
Looks like I've found the general area of code causing this. :)
Sep 21 2018
Sep 18 2018
Sep 17 2018
Sep 13 2018
@Gilles Apologies for the delay... Is it ok with you if this goes on next week's deployment train? (Also replied to your e-mail... Apologies again!!!)
Sep 12 2018
Hi! Some notes here:
- As far as I can tell, by requesting that CentralNotice administrators refrain from creating or editing banners with translatable messages, we're preventing any affectation of cluster performance due to the specific problem described in this bug (incredibly slow save times for banners with translatable messages).
- Nonetheless, we do need a careful review of CentralNotice-Translate interaction, to improve performance and code sanity in general, which includes this specific bug.
- The option suggested by @Nikerabbit is the immediate action we're pursuing, though that should also not be deployed until we're clear about its impact on all bits and pieces of this complex system.
- @Krenair, indeed we wouldn't disable CN-Translate integration without looking carefully at the impact, and, also, that doesn't look like the right option at this point :)
Sep 11 2018
@jcrespo @Nikerabbit Thanks!!! I think we could potentially look at disabling CentralNotice-Translate integration as a measure of last resort. (Note that we have asked CentralNotice administrators to not use translatable messages in banners for now, also.)
Sep 10 2018
Just to note, though the general slow-to-save banner issue is important, that problem, as far as I can tell, has a different cause from very-very-slow-to-save problem with banners containing translatable messages...
Hi! Looked at this quickly:
- This does not seem to be affecting banner displays, as far as I can see.
- There are a lot of Database warnings related to translatable messages.