Page MenuHomePhabricator

Mobile Steward elections 2019 banner broken
Closed, ResolvedPublic

Description

See:

https://meta.wikimedia.org/wiki/Talk:CentralNotice#stewnoms_mobile_is_throwing_JavaScript_errors_on_mobile

https://meta.wikimedia.org/wiki/Special:CentralNoticeBanners/edit/stewnoms_mobile

https://meta.wikimedia.org/w/index.php?title=Special:CentralNotice&subaction=noticeDetail&notice=Steward+elections+2019

Banners sent out so widely should be checked carefully on desktop and mobile platforms. It seems that the desktop banner is OK, but it seems the mobile banner was showing up for all logged-in mobile users, regardless of administrator status (edit: previously I thought it wasn't showing at all) and may have been breaking mobile editing for logged-in users. I've disabled the mobile banner for now.

Event Timeline

I've updated the mobile banner. I don't think it will cause errors now. I also added an alterImpressionData function to get correct data about whether or not the banner was displayed.

Before re-adding to the global campaign, please add the banner to a test campaign on aa.wikibooks to ensure that it's not causing errors, that it displays/hides correctly for administrators and non-administrators, and that the data about the banner displaying or hiding is correct. Thanks much!!!

Checks on desktop and mobile were carried out on my request as far as I can tell. I was told they were okay. I got reports of admins on mobile getting the banner and being able to edit. Thanks for the fix, but then we should also update the script for the desktop banners as well to use more accurate/modern syntax. Thank you.

Thanks for taking care of the banner problem. Errors haven't quite resumed to normal levels, is there a possibility old banner code could be cached? Obviously this could be another error, so I will continue investigating today.

I'm a little alarmed such a big bug could have been live for an entire weekend. Is there anyway we can add some basic tests to banners to check they render without error ? Also what is the best way to report these kind of problems in future?

Thanks for taking care of the banner problem. Errors haven't quite resumed to normal levels, is there a possibility old banner code could be cached? Obviously this could be another error, so I will continue investigating today.

Hi! That campaign currently sets no banner going out to the mobile site. We have a quick (though slightly less reliable) way of querying the data, which shows that after the banner was removed, it was displayed no more than four times per hour. This could be due to JavaScript with the banner choices payload being cached longer than it should, somewhere outside our datacenters, so outside our control. I don't think that's enough to be the source of the errors you see? BTW at its peak, the banner wasn't being shown more often than 60 times per hour. See this Turnilo plot.

Edit: I goofed, these numbers are incorrect, apologies. See below. Still doesn't seem to account for the majority of the spike in errors, though.

A more certain way to check would be to look at banner loader webrequests via Hive. Please let me know if I should try that... but I'd expect the Turnilo data is correct.

I'm a little alarmed such a big bug could have been live for an entire weekend. Is there anyway we can add some basic tests to banners to check they render without error ? Also what is the best way to report these kind of problems in future?

Yes, I agree completely. :( See T214412 and T214410.

From IRC:

usually we see 90k errors an hour but since the campaign started we're getting 200-300k an hour

@Jdlrobson, the data from Turnilo seems to indicate that the banner was not the source of the errors. What requests does the mobile error come in on? Maybe I should dig in more on Hive...

A more certain way to check would be to look at banner loader webrequests via Hive.

Nice idea!

I spent the last 30 minutes looking at this some more. I think the remaining errors are not due to the banners but are genuine iOS Safari issues. I'm trying to get some more details about those!

Hi! That campaign currently sets no banner going out to the mobile site. We have a quick (though slightly less reliable) way of querying the data, which shows that after the banner was removed, it was displayed no more than four times per hour. This could be due to JavaScript with the banner choices payload being cached longer than it should, somewhere outside our datacenters, so outside our control. I don't think that's enough to be the source of the errors you see? BTW at its peak, the banner wasn't being shown more often than 60 times per hour. See this Turnilo plot.

Ooops! I forgot that this data set no longer corrects for the beacon sample rate, which was 1%... So, in fact, the banner was loaded around 6000 times per hour at its peak, and around 400 times per hour even after if it was removed from the campaign. I'll get more precise numbers from Hive. Thanks!!

Jdlrobson claimed this task.

I think we can call this closed for this particular banner. Thanks for the swift response here! Looking forward to improvements relating to T214410!

Here is the data from Hive for actual calls to Special:BannerLoader to load and inject the banner into the page. The Hive query used follows.

I'm surprised that the requests are significantly lower than what we get from Tornilo...

dtrequests
2019-01-20T003168
2019-01-20T012916
2019-01-20T022886
2019-01-20T032831
2019-01-20T042853
2019-01-20T052843
2019-01-20T062706
2019-01-20T072940
2019-01-20T083252
2019-01-20T093414
2019-01-20T103688
2019-01-20T113748
2019-01-20T124010
2019-01-20T134336
2019-01-20T144343
2019-01-20T154586
2019-01-20T164593
2019-01-20T174507
2019-01-20T184498
2019-01-20T194458
2019-01-20T204378
2019-01-20T214195
2019-01-20T223864
2019-01-20T233168
2019-01-21T002953
2019-01-21T012775
2019-01-21T022686
2019-01-21T032865
2019-01-21T042734
2019-01-21T052748
2019-01-21T062882
2019-01-21T072889
2019-01-21T083086
2019-01-21T093007
2019-01-21T103063
2019-01-21T113330
2019-01-21T123609
2019-01-21T133910
2019-01-21T144033
2019-01-21T154168
2019-01-21T164298
2019-01-21T174309
2019-01-21T182560
2019-01-21T19217
2019-01-23T0046
2019-01-23T0115
2019-01-23T0220
2019-01-23T0322
2019-01-23T0440
2019-01-23T0523
2019-01-23T0632
2019-01-23T0734
2019-01-23T0834
2019-01-23T0924
2019-01-21T20172
2019-01-21T21131
2019-01-21T22133
2019-01-21T23117
2019-01-23T1027
2019-01-23T1131
2019-01-23T1228
2019-01-23T1336
2019-01-23T1429
2019-01-23T1531
2019-01-23T1638
2019-01-23T1732
2019-01-23T1825
2019-01-23T1918
2019-01-23T2023
2019-01-23T2114
2019-01-23T2214
2019-01-23T2312
2019-01-22T0092
2019-01-22T0177
2019-01-22T0281
2019-01-22T0375
2019-01-22T0478
2019-01-22T0585
2019-01-22T0683
2019-01-22T0777
2019-01-22T0873
2019-01-22T0981
2019-01-22T1063
2019-01-22T1179
2019-01-22T1277
2019-01-22T1374
2019-01-22T1469
2019-01-22T1571
2019-01-22T1678
2019-01-22T1779
2019-01-22T1855
2019-01-22T1962
2019-01-22T2038
2019-01-22T2137
2019-01-22T2241
2019-01-22T2332

Hive query:

SELECT
  substr( dt, 1, 13 ) AS hour,
  count(*) AS requests
FROM
  wmf.webrequest
WHERE
  uri_host = 'meta.wikimedia.org' AND
  (
    uri_host LIKE '%Special:BannerLoader%' OR
    uri_query LIKE '%Special:BannerLoader%'
  ) AND
  uri_query LIKE '%stewnoms_mobile%' AND
  year = 2019 AND
  month = 1 AND
  day >= 20 AND
  day <= 23
GROUP BY
  substr( dt, 1, 13 );

Just FYI, here's the task about the data discrepancy: T214600.

Did you intend to have a different field than uri_host in the OR clause looking for Special:BannerLoader? Lools like uri_host has to be exactly meta.wikimedia.org.

Did you intend to have a different field than uri_host in the OR clause looking for Special:BannerLoader? Lools like uri_host has to be exactly meta.wikimedia.org.

Ah ooops!! Yeah, that should be uri_path, in case for some reason we call the non-ugly URL format, though I don't this will actually happen. I'll re-run the query!

I just re-ran the query, and the results are the same, as expected.