Stack trace and the sudden surge of errors suggests a central notice banner notice is causing this. Not sure which one.
Currently, this accounts for 986 errors in 12 hours and is our 3rd highest frequency error:
Jdlrobson | |
Nov 4 2020, 8:29 PM |
F33920641: Screen Shot 2020-11-20 at 9.30.28 AM.png | |
Nov 20 2020, 5:30 PM |
F32431960: Screen Shot 2020-11-04 at 12.29.06 PM.png | |
Nov 4 2020, 8:29 PM |
F32431947: Screen Shot 2020-11-04 at 12.26.49 PM.png | |
Nov 4 2020, 8:29 PM |
Stack trace and the sudden surge of errors suggests a central notice banner notice is causing this. Not sure which one.
Currently, this accounts for 986 errors in 12 hours and is our 3rd highest frequency error:
I blame https://de.wikipedia.org/w/index.php?title=A&banner=B20_WMDE_Test_09_ctrl&uselang=de&force=1 - the other banner that's running is https://de.wikipedia.org/w/index.php?title=A&banner=wikipedia_asian_month_2020&uselang=de&force=1 and that seems to be less complicated.
Timing coincides with campaign C20_WMDE_Test_09 going up, so yes it's probably one or both of B20_WMDE_Test_09_ctrl or B20_WMDE_Test_09_var. Pinging WMDE-Fundraising-Tech as owners of those banners.
WMF Fundraising saw a similar issue before in T264366#6510591 caused by corrupt Firefox profiles (https://stackoverflow.com/questions/18877643/error-in-local-storage-ns-error-file-corrupted-firefox/26371494), and all of these errors also appear to be in Firefox. We fixed it by wrapping the use of localStorage in a try/catch. Alternatively I believe the mw.storage module can handle this.
There were 2,429 errors in the last 24 hrs. For now I am calling UBN when a new error is introduced that exceeds 1,500 errors every 24hrs.
If it's a banner though it doesn't block the train.
Bump.
This has been UBN for almost 2 weeks now. Is there some one we can directly ping @Pcoombe to ensure this gets attention?
Seems that @Pcoombe's comment regarding corrupt Firefox profiles points into the right direction.
I could not reproduce this using any specific browser functionality, but can confirm that this only happens in Firefox. This error has been reported to logstash since Sep 23rd and at a much lower rate during earlier banner tests, probably because of the traffic limit. There is a spike in early October which coincides with the Wikipedia Challenge banners.
Note: currently we are not capturing all errors for Chrome due to a bug in the instrumentation (T266517), so it is very likely the numbers are much higher and do include Chrome.
I've created a patch for our localstorage access, to guard against corruption in Firefox: https://github.com/wmde/fundraising-banners/pull/466