Page MenuHomePhabricator

Flip blacklist for MySQL eventlogging consumer to be a whitelist of allowed schemas
Closed, ResolvedPublic5 Story Points

Description

Flip blacklist for MySQL eventlogging consumer to be a whitelist of allowed schemas.

We continuously run into problems with MYSQL processor in eventlogging.
(example: https://phabricator.wikimedia.org/T203592)

Going forwards by default all events should go to hadoop and only a whitelist of events should appear on MYSQL, we can build whitelist by looking at the topics in kafka (eventlogging-valid-mixed) from which data to MySQL is persisted.

Event Timeline

Nuria created this task.Sep 5 2018, 6:20 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 5 2018, 6:20 PM
fdans triaged this task as High priority.Sep 6 2018, 4:46 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.
Nuria assigned this task to Ottomata.Sep 10 2018, 7:42 PM

Let's message analytics@ list when we get this work started.

Alright! First we need a list of active schemas that are not blacklisted. Those will all go to the eventlogging-valid-mixed topic.

For fun, I decided to try and compute this using Spark Structured Streaming. My code is here:

https://gist.github.com/ottomata/c6411c9872e80bce4c4c33ed6bee9b42

This is running in a spark2-shell in a screen on stat1004. At about every 60 seconds, the up to date counts of events per schema are overwritten to the otto.eventlogging_valid_mixed_schema_counts hive table. Pretty cool!

Nuria added a comment.EditedSep 11 2018, 1:14 PM

Super fun and all, just saying that I think all this info is on the MySQL consumer log, as we have those for the past month.

That's true it would be! :o It would also be in the eventlogging-valid-mixed files too.

Looking in the consumer logs as you suggest was very easy (but less fun :p). Between Aug 26 06:50:58 (our oldest consumer log) and now, there were 110 unique schemas inserted into mysql:

449543 NavigationTiming
292450 ChangesListFilters
103120 TestSearchSatisfaction2
 98352 Edit
 48843 MultimediaViewerNetworkPerformance
 47503 MobileWikiAppSearch
 45858 MobileWikiAppEdit
 43098 EchoInteraction
 40041 MobileWikiAppFeed
 37937 MobileWikiAppiOSFeed
 32685 MobileWikiAppReadingLists
 29008 ReadingDepth
 28940 WikipediaPortal
 28201 MobileWikiAppShareAFact
 27460 MobileWikiAppSessions
 25482 MobileWikiAppToCInteraction
 21644 MobileWikiAppLogin
 19751 UniversalLanguageSelector
 19482 MobileWikiAppLinkPreview
 18714 MobileWikiAppArticleSuggestions
 17048 ServerSideAccountCreation
 16801 MobileWikiAppiOSSessions
 16666 MobileWikiAppDailyStats
 16606 MobileWikiAppTabs
 16099 MobileWikiAppPageScroll
 15224 MultimediaViewerDuration
 13077 MobileWebSearch
 12501 MobileWikiAppiOSReadingLists
 11960 MobileWikiAppCreateAccount
 11928 GettingStartedRedirectImpression
 11710 MobileWikiAppiOSUserHistory
 10985 MobileWikiAppOnThisDay
 10525 ContentTranslationCTA
 10261 WikidataCompletionSearchClicks
 10022 MobileWebMainMenuClickTracking
  9574 MobileWikiAppIntents
  9308 MobileWikiAppMediaGallery
  9259 MobileWikiAppRandomizer
  9121 QuickSurveyInitiation
  8981 UploadWizardStep
  8477 WikimediaBlogVisit
  8063 CitationUsage
  7921 MobileWikiAppFeedConfigure
  7902 MediaViewer
  7513 MobileWikiAppAppearanceSettings
  7371 MultimediaViewerAttribution
  7224 EchoMail
  7038 ContentTranslation
  6619 MobileWikiAppFindInPage
  5571 SearchSatisfactionErrors
  5438 MobileWikiAppLanguageSearching
  5194 MobileWikiAppInstallReferrer
  4962 CentralAuth
  4918 MultimediaViewerDimensions
  4801 UploadWizardErrorFlowEvent
  4774 MediaWikiPingback
  4743 MobileWikiAppiOSLoginAction
  4578 GuidedTourGuiderHidden
  4564 MobileWikiAppProtectedEditAttempt
  4526 UploadWizardTutorialActions
  4510 MobileWikiAppLanguageSettings
  4486 MobileWikiAppSavedPages
  4401 GuidedTourButtonClick
  4370 GeoFeatures
  4367 InputDeviceDynamics
  4320 EditorActivation
  4276 GuidedTourGuiderImpression
  3832 Kartographer
  3661 EditConflict
  3655 MobileWikiAppNavMenu
  3653 WMDEBannerEvents
  3630 ChangesListFilterGrouping
  2705 SaveTiming
  2689 GuidedTourExited
  2526 TranslationRecommendationAPIRequests
  2397 PrefUpdate
  1998 AdvancedSearchRequest
  1765 QuickSurveysResponses
  1741 LandingPageImpression
  1716 UploadWizardUploadFlowEvent
  1487 MobileWikiAppiOSSettingAction
   982 UploadWizardFlowEvent
   902 ChangesListHighlights
   896 GuidedTourExternalLinkActivation
   759 TwoColConflictConflict
   723 EUCCVisit
   517 MobileWikiAppWiktionaryPopup
   495 MobileWikiAppLangSelect
   457 MobileWikiAppStuffHappens
   411 ContentTranslationSuggestion
   410 FlowReplies
   317 EUCCStats
   304 WMDEBannerSizeIssue
   242 UploadWizardExceptionFlowEvent
   197 ContentTranslationError
   156 WikipediaZeroUsage
    64 MobileWikiAppWidgets
    36 MobileWikiAppOnboarding
    35 MultimediaViewerVersusPageFilePerformance
    35 MobileWebUIClickTracking
    28 TranslationRecommendationUIRequests
    24 MobileWikiAppOfflineLibrary
    19 ChangesListClickTracking
    17 ExternalLinksChange
    14 MobileAppLoginAttempts
    10 MobileAppCategorizationAttempts
    10 GuidedTourInternalLinkActivation
     9 MobileAppUploadAttempts
     6 RelatedArticles
     1 TranslationRecommendationUserAction

Change 459807 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Whitelist EventLogging schemas for ingestion into MySQL

https://gerrit.wikimedia.org/r/459807

Ottomata set the point value for this task to 5.

Change 459815 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[eventlogging@master] Support loading plugins in eventlogging-processor

https://gerrit.wikimedia.org/r/459815

Change 459815 merged by Ottomata:
[eventlogging@master] Support loading plugins in eventlogging-processor

https://gerrit.wikimedia.org/r/459815

Mentioned in SAL (#wikimedia-operations) [2018-09-12T13:11:05Z] <otto@deploy1001> Started deploy [eventlogging/analytics@5c6fab6]: Support loading plugins in eventlogging-processor - T203596

Mentioned in SAL (#wikimedia-operations) [2018-09-12T13:11:13Z] <otto@deploy1001> Finished deploy [eventlogging/analytics@5c6fab6]: Support loading plugins in eventlogging-processor - T203596 (duration: 00m 05s)

Mentioned in SAL (#wikimedia-analytics) [2018-09-12T13:11:18Z] <ottomata> otto@deploy1001 Started deploy [eventlogging/analytics@5c6fab6]: Support loading plugins in eventlogging-processor - T203596

Change 459807 merged by Ottomata:
[operations/puppet@production] Whitelist EventLogging schemas for ingestion into MySQL

https://gerrit.wikimedia.org/r/459807

Ottomata moved this task from In Progress to Done on the Analytics-Kanban board.Sep 12 2018, 1:32 PM
Tbayer renamed this task from Flip blacklist for MySQL eventlogging consumer to be a whilelist of allowed schemas to Flip blacklist for MySQL eventlogging consumer to be a whitelist of allowed schemas .Sep 12 2018, 4:43 PM
Tbayer updated the task description. (Show Details)

BTW, ReadingDepth should be blacklisted/de-whitelisted from MySQL too, as we are planning to increase its event rate in an upcoming experiment (T200792). Let me know in case a separate ticket should be filed for that.

BTW, ReadingDepth should be blacklisted/de-whitelisted from MySQL too, as we are planning to increase its event rate in an upcoming experiment (T200792). Let me know in case a separate ticket should be filed for that.

(And the new schema we will be introducing with that experiment, PageIssues, may similarly have an event rate that is too high for MySQL, but I assume that because it won't go into production this week, it will already be covered by the new whitelisting setup that is the topic of this task.)

Naw, i'll just do it as part of this one, thanks.

Change 460059 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove ReadingDepth from EventLogging MySQL whitelist

https://gerrit.wikimedia.org/r/460059

Change 460059 merged by Ottomata:
[operations/puppet@production] Remove ReadingDepth from EventLogging MySQL whitelist

https://gerrit.wikimedia.org/r/460059

Nice, something we need to figure out is how to test events in beta for most schemas, I added some docs of how to consume from eventlogging kafka topic directly as from now on the all-events.log will only have the schemas whitelisted

Nuria closed this task as Resolved.Sep 14 2018, 1:02 PM