Maniphest T206562

Delay in French mobile banners showing up in Banner Allocation
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	• TSkaff
	Oct 9 2018, 6:52 PM

Description

We put up French & English (France) mobile banners today at 18UTC and noticed that while English/France mobile banners showed up immediately in Banner Allocation, it took French/France mobile at least 1/2 hour. Would Andy have an idea?

Thanks,
Thea

Related Objects

Mentioned Here: T199073: Perform a datacenter switchover (2018-19 Q1)
T197276: turnilo x axis improperly labeled

Event Timeline

• TSkaff created this task.Oct 9 2018, 6:52 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 9 2018, 6:52 PM

• DStrine added projects: Fundraising-Backlog, MediaWiki-extensions-CentralNotice.Oct 9 2018, 6:58 PM

• DStrine removed • DStrine as the assignee of this task.Oct 9 2018, 7:22 PM

• DStrine subscribed.

@TSkaff Were you able to verify that both banners were showing up on the site at the same time?

@DStrine In both campaigns, I set the launch time to be 18UTC. So at 18UTC I noticed the banner allocation worked for English but not French ... and I could use statler to pull data for English but nothing for French.

• DStrine moved this task from Triage to Sprint +1 on the Fundraising-Backlog board.Oct 9 2018, 8:00 PM

• DStrine added a project: Fundraising Sprint They Live.Oct 10 2018, 4:48 PM

• DStrine moved this task from Sprint +1 to Current Sprint on the Fundraising-Backlog board.Oct 10 2018, 5:04 PM

Thanks!! Checked the data in Turnilo (formerly Pivot), and it shows the problem happened as described: https://bit.ly/2NzT6OW

(There's a problem with the labels on the x-axis of the graph... For correct times of events shown there, hover over the lines and look at the info box next to the cursor. See T197276.)

The English campaign started showing up for users around 18:00, but the French one only appeared around 18:40. The CN logs show both were turned on a bit before 18:00. Also, I don't see anything in the logs for changes in banner settings that might have caused this.

Definitely worth further investigation!!

Just checking stuff to eliminate possible causes: there's nothing odd in the server logs for that time, nor were any actions related to the datacenter switchover set for that day (see T199073).

Also nothing stands out in logstash for that time: search 1, search 2.

AndyRussG moved this task from Backlog to Doing on the Fundraising Sprint They Live board.Oct 15 2018, 5:54 PM

Also just re-checked CentralNotice logs... I don't see any changes in any of them around 18:40 on 2018-10-09, which is when the mobil frFR banner finally went out (according to Druid/Turnilo, see above).

• DStrine added a project: Fundraising Sprint USB stands for underhanded socket bureaucracy.Oct 16 2018, 8:29 PM

AndyRussG moved this task from Backlog to Doing on the Fundraising Sprint USB stands for underhanded socket bureaucracy board.Oct 17 2018, 2:51 PM

Checked a couple more things:

Looked at various dashboards for events around this time, in case there were cluster issues that I might have missed in logstash: general mysql, database lag, resource loader (RL is used to send choice data, that is, data about available campaigns and banners), memcache (used to cache choice data). I don't see any potential explanations there.

General CN health: everything else in CentralNotice seems OK around this time. Status codes look normal, and no other campaigns appear to have been disrupted.

Summary

What this isn't:

Not a more general Mediawiki or database issue, outage, or any known problem on the cluster.
Not a campaign or banner configuration issue.
Not a general CentralNotice outage.
Not a data pipeline issue.

What this is:

At least one campaign began to be selected by clients about 40 minutes late.

There may have been other yet-undetected effects of whatever the underlying was.

So far, my best guess as to the cause is a bug in ChoiceData object caching. The object cache TTL for ChoiceData is 1 hour. There are also some changes in campaign and banner configuration a little over an hour before the missing campaign finally started to go out. So, maybe an old version of ChoiceData was being served from that cache, and the campaign was only included in ChoiceData when that version in the cache expired.

AndyRussG claimed this task.Oct 18 2018, 4:05 PM

Hi! I've found a pretty convincing indication that the problem was with old ChoiceData being sent to browsers.

Looking at HTTP response sizes for the requests that fetch ChoiceData for fr.m.wikipedia.org, we can see that requests for only the ext.centralNotice.choiceData and jquery modules (bundled together) didn't change following the changes in settings around 17:54.

So an old ChoiceData was stuck, almost certainly in the object cache. (That also would explain the lack of updated data in on the Banner Allocation page.)

Here's the Jupyter notebook with the queries used:

T206562.html313 KBDownload

• DStrine moved this task from Current Sprint to Sprint +1 on the Fundraising-Backlog board.Oct 30 2018, 8:24 PM

• DStrine moved this task from Sprint +1 to Q3 2021-2022 on the Fundraising-Backlog board.May 6 2019, 9:41 PM

Removing task assignee due to inactivity, as this open task has been assigned to the same person for more than two years (see the emails sent to the task assignee on Oct27 and Nov23). Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome.
(See https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.)

• DStrine added a project: FR-France.Mar 24 2021, 4:49 PM

XenoRyet moved this task from Q3 2021-2022 to Unscheduled on the Fundraising-Backlog board.Mar 14 2023, 10:24 PM

	F26689700: response_sizes1.png
	Oct 22 2018, 4:37 AM

Delay in French mobile banners showing up in Banner AllocationOpen, Needs TriagePublicActions

Description

Related Objects

Event Timeline

Delay in French mobile banners showing up in Banner Allocation
Open, Needs TriagePublic
Actions