Page MenuHomePhabricator

EL alarms raw/validated 20160926
Closed, ResolvedPublic

Event Timeline

Since today's at 6:30 am this is the split of client parsing errors, see below schema names:

1 "MultimediaViewerDuration"

   1 "NavigationTiming"
   1 "PageD	cp1067.eqiad.wmnet	42439819	2016-09-26T12
   2 "CentralNoticeBannerHistory"
   2 "MobileWikiAppShareAFact"
   2 "PageMove"
   2 "ServerSideAccountCreation"
   5 "MobileWikiAppEdit"
   7 "MediaViewer"
  10 "MultimediaViewerNetworkPerformance"
  20 "Search"
  21 "Popups"
  28 "PageDeletion"
  38 "Edit"
  56 "MobileWikiAppCreateAccount"
  97 "MobileWikiAppLogin"
 204  "TranslationRecommendationAPIRequests"
 614 "MobileWikiAppFindInPage"
4648 "ChangesListFilters"

On prior couple days:

 2 "EchoInteraction"
    2 "PrefUpdate"
    3 "MobileWikiAppShareAFact"
    4 "Search"
    5 "NavigationTiming"
    5 "WikipediaPortal"
    6 "MobileAppUploadAttempts"
    9 "ServerSideAccountCreation"
   16 "MultimediaViewerNetworkPerformance"
   17 "PageMove"
   22 "CentralNoticeBannerHistory"
   39 "MobileWikiAppEdit"
   57 "Popups"
   88 "Edit"
   96 "PageDeletion"
  207 "MobileWikiAppCreateAccount"
  265 "PageContentSaveComplete"
  455 "MobileWikiAppLogin"
  656  "TranslationRecommendationAPIRequests"
 2789 "MobileWikiAppFindInPage"
14801 "ChangesListFilters"

Day before:

2 "EchoInteraction"
    2 "NavigationTiming"
    3 "Search"
    4 "MobileAppUploadAttempts"
    4 "ServerSideAccountCreation"
    6 "MultimediaViewerNetworkPerformance"
   11 "CentralNoticeBannerHistory"
   17 "PageMove"
   23 "Popups"
   26 "MobileWikiAppEdit"
   29 "Edit"
   35 "PageDeletion"
  105 "MobileWikiAppCreateAccount"
  155 "PageContentSaveComplete"
  253 "MobileWikiAppLogin"
  299  "TranslationRecommendationAPIRequests"
 1324 "MobileWikiAppFindInPage"
 7726 "ChangesListFilters"

2 days before:

   2 "EchoInteraction"
   2 "MediaViewer"
   2 "ServerSideAccountCreation"
   3 "NavigationTiming"
   7 "MobileWikiAppEdit"
  10 "MultimediaViewerNetworkPerformance"
  13 "CentralNoticeBannerHistory"
  27 "Popups"
  32 "PageContentSaveComplete"
  34 "Edit"
  52 "PageDeletion"
 105 "MobileWikiAppCreateAccount"
 232 "MobileWikiAppLogin"
 399  "TranslationRecommendationAPIRequests"
1237 "MobileWikiAppFindInPage"
7966 "ChangesListFilters"

3 days before:

   3 "PageMove"
   7 "UploadWizardTutorialActions"
  11 "CentralNoticeBannerHistory"
  12 "MultimediaViewerNetworkPerformance"
  16 "MobileWikiAppEdit"
  17 "Popups"
  18 "Search"
  87 "PageContentSaveComplete"
  90 "Edit"
 113 "MobileWikiAppCreateAccount"
 113 "PageDeletion"
 229 "MobileWikiAppLogin"
 328  "TranslationRecommendationAPIRequests"
1262 "MobileWikiAppFindInPage"
2691 "ChangesListFilters"

4 days before:

2 "WikimediaBlogVisit"
   4 "PageMove"
   5 "MediaViewer"
   8 "CentralNoticeBannerHistory"
  10 "Popups"
  12 "MultimediaViewerNetworkPerformance"
  12 "Search"
  24 "MobileWikiAppEdit"
  30 "ChangesListFilters"
  30 "MobileWikiAppSavedPages"
  37 "PageDeletion"
  41 "PageContentSaveComplete"
  47 "Edit"
  60 "MobileWikiAppCreateAccount"
 117 "MobileWikiAppReadingAction"
 216 "MobileWikiAppLogin"
 361  "TranslationRecommendationAPIRequests"
1331 "MobileWikiAppFindInPage"

So, ofending party is defintely ChangesListFilters

Pinging moriel on this ticket.

@MSchottlender-WMF There is a pretty big number of events for event logging schema: https://meta.wikimedia.org/wiki/Schema:ChangesListFilters

That are not validating.

An example below:

Unable to validate: ?{"event":{"pagename":"Recentchanges","namespace":null,"invert":false,"associated":false,"hideminor":false,"hidebots":true,"hideanons":false,"hideliu":false,"hidepatrolled":false,"hidemyself":false,"hidecategorization":true,"tagfilter":null},"schema":"ChangesListFilters","revision":15876023,"clientValidated":false,"wiki":"nowikimedia","webHost":"no.wikimedia.org","userAgent":"Apple-PubSub/65.28"}; cp1066.eqiad.wmnet 42402900 2016-09-26T07:01:42 -

Error is:

"MediaWiki/1.28.0-wmf.20" (None is not of type 'integer')

Do take a look at your code that is sending events, you can easily test your changes in vagrant or beta labs to make sure events validate

https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/TestingOnBetaCluster

https://www.mediawiki.org/wiki/Extension:EventLogging#Using_mediawiki-vagrant

cc @kaldari

@Mooeypoo: Looks like whatever is trying to use the ChangesListFilters EventLogging schema is failing to pass the page namespace consistently.

Okay, I was under the impression that since I tagged the namespace value as 'optional', the logger will ignore it if it's null - but apparently that's not the case.
Added another check to the condition to make sure there's a value (not null) before sending it to the logger. This should solve things.

Change 312936 had a related patch set uploaded (by Mooeypoo):
Ignore null values when sending to EventLogging

https://gerrit.wikimedia.org/r/312936

Change 312936 merged by jenkins-bot:
Ignore null values when sending to EventLogging

https://gerrit.wikimedia.org/r/312936

no "clientValidated":false" was found in vagrant logs/eventlogging.log and in betalbas: /srv/log/eventlogging/all-events.log

We are still getting tons of events not valid from the ChangesListFilters schema, seems that there are several validation failures besides the ones fixed:

Examples:

2016-09-29 11:02:59,829 [13401] (MainThread) Unable to validate: ?{"event":{"pagename":"Recentchanges","namespace":null,"invert":false,"associated":false,"hideminor":false,"hidebots":true,"hideanons":false,"hideliu":false,"hidepatrolled":false,"hidemyself":false,"hidecategorization":true,"tagfilter":"reverting\u0020anti-vandal\u0020bot"},"schema":"ChangesListFilters","revision":15876023,"clientValidated":false,"wiki":"enwiki","webHost":"en.wikipedia.org","userAgent":"Mozilla/5.0\u0020(Windows\u0020NT\u002010.0;\u0020Win64;\u0020x64)\u0020AppleWebKit/537.36\u0020(KHTML,\u0020like\u0020Gecko)\u0020Chrome/53.0.2785.116\u0020Safari/537.36"}; cp1053.eqiad.wmnet 42352846 2016-09-29T11:02:58 - "MediaWiki/1.28.0-wmf.20" (None is not of type 'integer')

2016-09-29 11:05:03,297 [13401] (MainThread) Unable to validate: ?{"event":{"pagename":"Recentchanges","namespace":null,"invert":false,"associated":false,"hideminor":false,"hidebots":true,"hideanons":false,"hideliu":false,"hidepatrolled":false,"hidemyself":false,"hidecategorization":true,"tagfilter":null},"schema":"ChangesListFilters","revision":15876023,"clientValidated":false,"wiki":"enwiki","webHost":"en.wikipedia.org","userAgent":"Apple-PubSub/28"}; cp1052.eqiad.wmnet 44598288 2016-09-29T11:05:03 - "MediaWiki/1.28.0-wmf.20" (None is not of type 'integer')

quick reminder: User Agents and browsing activity together constitute sensitive data, it should not be shared publicly. @Nuria's comment above should be redacted.

The fix that was submitted (and that will make sure there's no 'null' values anywhere, particularly in namespace, which is the issue here) is not in production yet because of the train stop issues. IIRC we also couldn't swat it up because of this.

These errors should go away in the next train when the patch is finally deployed.

quick reminder: User Agents and browsing activity together constitute sensitive data, it should not be shared publicly. @Nuria's comment above should be redacted.

I'm not sure how to do that. Maybe @Aklapper can help redact that information?

@Nuria The patch has been merged, but could not be deployed last week due to the ops offsite. As the WMF-deploy tag on this task says, the patch will roll out with the October 4-6 (i.e. this week's) deployment train. If this needs to be fixed faster, tell me and I'll SWAT the patch.

@Catrope: nah, it doesn't. My mistake, I was under the impression that patch was deployed but forgot about last week no deployment

Re-checked vagrant logs/eventlogging.log and in betalbas: /srv/log/eventlogging/client-side-events.log

The below grep results show that there are records about schema.*ChangesListFilter, but no clientValidate.*false

etonkovidova@deployment-eventlogging03:/srv/log/eventlogging$ grep 'Recentchanges.*schema.*ChangesListFilters.*clientValidate.*true' client-side-events.log | wc 
     11      11    6331

etonkovidova@deployment-eventlogging03:/srv/log/eventlogging$ grep 'Recentchanges.*schema.*ChangesListFilters.*clientValidate.*false' client-side-events.log | wc 
      0       0       0

@Etonkovidova: I'm wondering, is the code deployed on beta the latest one with the patch that fixes the problem? Or are you saying the code that's generating errors in production is working on beta?

@Millmetric - confirmed with @Catrope: the code will be deployed on Thu

Right, @Etonkovidova, that's fine and no rush. But I mean the code in beta where you found no errors, is that the old or new code? If it's the old code and it's working well in beta, it means something else might be wrong. Otherwise all is well.

@Millmetric the fix for validation works in beta, as far as I could see. I will monitor betalabs for errors, at least for today.

Right, @Etonkovidova, that's fine and no rush. But I mean the code in beta where you found no errors, is that the old or new code? If it's the old code and it's working well in beta, it means something else might be wrong. Otherwise all is well.

The new code is in beta labs and on group0 wikis. It'll go to group1 wikis today and all wikis tomorrow.