Page MenuHomePhabricator

Archive data on eventlogging MySQL to analytics replica before decomisioning
Closed, ResolvedPublic

Description

We are looking to decomission the database host that serves now as el master once we pass away the mysql consumer. There is data on the host that does not exist on hadoop, we should probably do an export of all tables and load it in one of the machines that make the analytics replica cluster.

Event Timeline

@Nuria can you add more info to the description? What is the idea?

I asked some info to @Marostegui today and we might need to do some work to make this happen.

My assumption is that we only want to keep a read-only replica of what we currently have on db1107:log and store it as archive on one of the dbstore nodes.

Set of things to solve:

  1. the log database is tokudb, and the data persistence team is trying to not support it anymore. The dbstore nodes might not support it as they are configured now. We may move to InnoDB, but Toku was used in the first place for its data compression features, so the following point would be even worse.
  2. size is a problem, since the log /srv/sqldata dir on db1107 is 1.7TB meanwhile on db1108 is 2.1TB. The only place where we could store one of these monsters is dbstore1005, but it would leave the host with few hundreds GBs of free space in the first case and probably close to none in the latter. Both use cases are too risky, so we'd need to drop data if possible (maybe unused things?)
  3. as follow up of the above point, we should need to verify if we want to keep db1108's log db or db1107's since they are different.

db1108 has also the staging database, so that might be the reason for the different sizes.

Probably exploring what needs to be kept from the log database and what can be dropped would be our best approach here. There are lots of things in there and I am sure lots of them aren't really used anymore.

One question I have is whether you guys want to keep that database online (as in: it could be queryable) or just keep it as a backup in case it needs to be restored for some emergency/very concrete case.
If the latter, we can try to explore how much disk space a mysqldump can take, and where to store it. We'd also need to plan where it would need to be restored in, as we'd need to have a host (or be ready to have one) with enough disk space, a mysql installation with support for tokudb etc. Also managing expectations would be important: a restore can take days.
What are the odds of someone needing to query the log database once it has been removed from the current hosts?

One question I have is whether you guys want to keep that database online (as in: it could be queryable) or just keep it as a backup in case it needs to be restored for some emergency/very concrete case.

It should be queryable as there is data there that predates hadoop and while used sparingly needs to be accessible. Now, data as of the last 1.5 years (aprox, I can check for exact dates) on mysql is also in hadoop so, before moving, we can drop all data that exists in both storages, that should reduce size a bunch.

@Nuria, perhaps sqooping this into Hive wouldn't be too hard?

It is actually a lot of work to scoop this data as it is not doable in bulk, every table needs to have special mappings. I did 4/5 tables some quarters back and it was a lot of menial work. In many instances it *seemed* that tables had been properly imported but some in some columns data was mingled, every column for every table would need to be queried to make sure imports are correct.

I rather drop data that also exists in hadoop, thus reducing data size, and leave these tables available to be queried in mysql.

Hm that sounds harder than I thought. Why can't sqoop map the MySQL schemas?

Scoop can map column types but there are gotchas, like column types with '.' need to be quoted, many columns need an explicit utf8 conversion or casting (see example command below). I am not saying it scoops cannot be scripted, they can totally be, but after each one of them we need to verify data for all columns made it like we expected. Seems like it would be less work to just drop data not needed and archive on mysql, if you disagree and think we can script all these table scoops and export them to hadoop with not that much work do let me know.

time sudo -u hdfs sqoop import  --as-avrodatafile  --password-file '/user/hdfs/mysql-analytics-research-client-pw.txt' --username research --connect jdbc:mysql://analytics-store.eqiad.wmnet/log  --query 'select id,uuid,convert(timestamp using utf8) timestamp,convert(webHost using utf8) webhost,wiki,cast(event_isAPI as Integer) event_isAPI,cast(event_isMobile as Integer) event_isMobile,cast(event_revisionId as Integer) event_revisionId from PageContentSaveComplete_5588433 where $CONDITIONS'   --map-column-java event_isAPI=Integer,event_isMobile=Integer,event_revisionId=Integer  --target-dir /tmp/PageContentSaveCompleteAvro --split-by id

I think we should explore what Nuria suggested, trying to drop as much as possible before deciding where to archive this data (either MySQL or Hadoop), as archiving data we won't need is a waste of time and resources. So either way, we'd need to drop that data.
Maybe once it's been cleaned up, we can resume the discussion whether MySQL (if we are within disk space margins) or Hadoop?

Some data:

elukey@db1107:/srv/sqldata$ ls _* | cut -d "_" -f 2 | sort | uniq -c
      5
   2198 log
     16 otto
      2 test
elukey@db1108:/srv/sqldata$ ls _* | cut -d "_" -f 2 | sort | uniq -c
      5
   2174 log
     11 staging

elukey@db1108:/srv/sqldata$ du -hs _staging* | sort -h
16K	_staging_aharoni_test_001_main_2522e85a_1_1d_B_0.tokudb
16K	_staging_dartar_clean_patrol_log_status_229efcf_1_1d.tokudb
16K	_staging_dartar_pages_reviewed_status_229f052_1_1d.tokudb
64K	_staging_aharoni_test_001_status_2522e7ec_1_1d.tokudb
64K	_staging_cx_abuse_filter_daily_status_25d15674_1_1d.tokudb
64K	_staging_leila_edits_copy_status_229f0f3_1_1d.tokudb
160K	_staging_cx_abuse_filter_daily_main_25d15677_1_1d_B_0.tokudb
2.9M	_staging_dartar_pages_reviewed_main_22c5ab6_2_1d_B_0.tokudb
59M	_staging_dartar_clean_patrol_log_main_22c55bd_2_1d_B_0.tokudb
172M	_staging_leila_edits_copy_key_user_id_22c8cf7_2_1d_B_0.tokudb
648M	_staging_leila_edits_copy_main_22c8cf7_2_1d_B_1.tokudb

The staging db seems relatively small, so the log db is different from 1107 and 1108.

Using cut -d "_" and printing the first 3 fields of filenames:

  • db1107
   4 _log_ArticleCreationWorkflow
   4 _log_ChangesListFilterGrouping
   4 _log_MobileOptionsTracking
   4 _log_PageCreation
   4 _log_SearchSatisfactionErrors
   4 _log_TestSearchSatisfaction2
   5 _log_CentralNoticeImpression
   5 _log_ChangesListFilters
   5 _log_ContentTranslation
   5 _log_ContentTranslationAbuseFilter
   5 _log_ContentTranslationSuggestion
   5 _log_EUCCStats
   5 _log_EUCCVisit
   5 _log_InputDeviceDynamics
   5 _log_MobileWikiAppLangSelect
   5 _log_MobileWikiAppLanguageSettings
   5 _log_MobileWikiAppNotificationInteraction
   5 _log_QuickSurveysResponses
   5 _log_ReadingDepth
   5 _log_Test
   5 _log_UniversalLanguageSelector
   5 _log_UploadWizardErrorFlowEvent
   5 _log_UploadWizardStep
   5 _log_WMDEBannerSizeIssue
   8 _log_HumanSearchRelevance
   9 _log_Print
  10 _log_Edit
  10 _log_LandingPageImpression
  10 _log_MobileWebMainMenuClickTracking
  10 _log_MobileWikiAppAppearanceSettings
  10 _log_MobileWikiAppArticleSuggestions
  10 _log_MobileWikiAppCreateAccount
  10 _log_MobileWikiAppFeed
  10 _log_MobileWikiAppFindInPage
  10 _log_MobileWikiAppInstallReferrer
  10 _log_MobileWikiAppIntents
  10 _log_MobileWikiAppLanguageSearching
  10 _log_MobileWikiAppLogin
  10 _log_MobileWikiAppMediaGallery
  10 _log_MobileWikiAppOfflineLibrary
  10 _log_MobileWikiAppPageScroll
  10 _log_MobileWikiAppProtectedEditAttempt
  10 _log_MobileWikiAppReadingLists
  10 _log_MobileWikiAppTabs
  10 _log_MobileWikiAppWiktionaryPopup
  10 _log_MobileWikiAppiOSFeed
  10 _log_MobileWikiAppiOSSettingAction
  10 _log_QuickSurveyInitiation
  10 _log_ServerSideAccountCreation
  10 _log_TwoColConflictConflict
  10 _log_WMDEBannerEvents
  15 _log_MobileWikiAppDailyStats
  15 _log_MobileWikiAppEdit
  15 _log_MobileWikiAppFeedConfigure
  15 _log_MobileWikiAppLinkPreview
  15 _log_MobileWikiAppOnThisDay
  15 _log_MobileWikiAppRandomizer
  15 _log_MobileWikiAppSessions
  15 _log_MobileWikiAppShareAFact
  15 _log_MobileWikiAppiOSLoginAction
  15 _log_MobileWikiAppiOSUserHistory
  20 _log_MobileWikiAppSearch
  20 _log_MobileWikiAppiOSReadingLists
  20 _log_MobileWikiAppiOSSessions
  20 _log_WikidataCompletionSearchClicks
  25 _log_AdvancedSearchRequest
  25 _log_CitationUsage
  25 _log_MobileWikiAppToCInteraction
  70 _log_mediawiki
  79 _log_NavigationTiming
1363 _log_sql
  • db1108
  2 _log_ChangesListHighlights
  3 _log_MobileWebWatching
  4 _log_ArticleCreationWorkflow
  4 _log_ChangesListClickTracking
  4 _log_ChangesListFilterGrouping
  4 _log_DiacriticsPoll
  4 _log_DiacriticsVisibility
  4 _log_EditorActivation
  4 _log_GeoFeatures
  4 _log_MediaViewer
  4 _log_MobileWebDiffClickTracking
  4 _log_MobileWebSearch
  4 _log_MobileWebUIClickTracking
  4 _log_MobileWebWatchlistClickTracking
  4 _log_MobileWikiAppNavMenu
  4 _log_PageContentSaveComplete
  4 _log_PageCreation
  4 _log_RecentChangesTopLinks
  4 _log_SearchSatisfactionErrors
  4 _log_TaskRecommendation
  4 _log_TaskRecommendationClick
  4 _log_TaskRecommendationImpression
  4 _log_TestEventDuplication
  4 _log_TrackedPageContentSaveComplete
  4 _log_VET135171
  5 _log_CentralNoticeImpression
  5 _log_ContentTranslationAbuseFilter
  5 _log_EUCCStats
  5 _log_EUCCVisit
  5 _log_GettingStartedNavbarNoArticle
  5 _log_InputDeviceDynamics
  5 _log_MobileWikiAppLanguageSettings
  5 _log_MobileWikiAppNotificationInteraction
  5 _log_Test
  5 _log_VisualEditorDOMRetrieved
  5 _log_VisualEditorDOMSaved
  5 _log_WMDEBannerSizeIssue
  6 _log_EchoPrefUpdate
  6 _log_GatherClicks
  7 _log_ContentTranslationError
  7 _log_ExternalLinksChange
  7 _log_MobileWebLanguageSwitcher
  7 _log_UploadWizardExceptionFlowEvent
  7 _log_UploadWizardFlowEvent
  7 _log_UploadWizardUploadFlowEvent
  8 _log_ContentTranslationCTA
  8 _log_HumanSearchRelevance
  8 _log_Kartographer
  8 _log_MediaWikiPingback
  8 _log_MobileWikiAppWidgets
  8 _log_MultimediaViewerAttribution
  8 _log_MultimediaViewerDimensions
  8 _log_MultimediaViewerDuration
  8 _log_MultimediaViewerNetworkPerformance
  8 _log_MultimediaViewerVersusPageFilePerformance
  8 _log_TaskRecommendationLightbulbClick
  8 _log_TranslationRecommendationUIRequests
  8 _log_TranslationRecommendationUserAction
  8 _log_WikipediaZeroUsage
  9 _log_CentralAuth
  9 _log_EchoMail
  9 _log_EditConflict
  9 _log_MobileAppCategorizationAttempts
  9 _log_MobileAppLoginAttempts
  9 _log_MobileAppShareAttempts
  9 _log_MobileAppUploadAttempts
  9 _log_MobileWikiAppLangSelect
  9 _log_MobileWikiAppStuffHappens
  9 _log_NewEditorEdit
  9 _log_PrefUpdate
  9 _log_Print
  9 _log_QuickSurveysResponses
  9 _log_UploadWizardTutorialActions
  9 _log_WikimediaBlogVisit
 10 _log_GuidedTour
 10 _log_LandingPageImpression
 10 _log_MobileLeftNavbarEditCTA
 10 _log_MobileWebCta
 10 _log_MobileWebInfobox
 10 _log_MobileWikiAppLanguageSearching
 10 _log_MobileWikiAppOfflineLibrary
 10 _log_MobileWikiAppiOSFeed
 10 _log_MobileWikiAppiOSSettingAction
 10 _log_TwoColConflictConflict
 10 _log_WMDEBannerEvents
 11 _log_MobileWebWikiGrokError
 12 _log_ContentTranslation
 12 _log_MobileWebWikiGrok
 12 _log_MobileWebWikiGrokResponse
 12 _log_RelatedArticles
 12 _log_TranslationRecommendationAPIRequests
 12 _log_UploadWizardErrorFlowEvent
 12 _log_UploadWizardStep
 12 _log_WikipediaPortal
 13 _log_ContentTranslationSuggestion
 13 _log_MobileOptionsTracking
 13 _log_MobileWikiAppSavedPages
 13 _log_ReadingDepth
 14 _log_GuidedTourButtonClick
 14 _log_GuidedTourExited
 14 _log_GuidedTourExternalLinkActivation
 14 _log_GuidedTourGuiderHidden
 14 _log_GuidedTourGuiderImpression
 14 _log_GuidedTourInternalLinkActivation
 14 _log_MobileWikiAppFindInPage
 14 _log_MobileWikiAppInstallReferrer
 14 _log_MobileWikiAppOnboarding
 14 _log_MobileWikiAppPageScroll
 14 _log_MobileWikiAppTabs
 14 _log_QuickSurveyInitiation
 14 _log_UniversalLanguageSelector
 15 _log_MobileWikiAppFeedConfigure
 15 _log_MobileWikiAppOnThisDay
 15 _log_MobileWikiAppRandomizer
 15 _log_MobileWikiAppiOSLoginAction
 15 _log_MobileWikiAppiOSUserHistory
 16 _log_FlowReplies
 16 _log_SaveTiming
 17 _log_MobileWikiAppIntents
 17 _log_MobileWikiAppWiktionaryPopup
 18 _log_MobileWikiAppReadingLists
 19 _log_MobileWikiAppDailyStats
 19 _log_MobileWikiAppProtectedEditAttempt
 20 _log_EchoInteraction
 20 _log_MobileWikiAppiOSReadingLists
 20 _log_MobileWikiAppiOSSessions
 20 _log_WikidataCompletionSearchClicks
 21 _log_MobileWebMainMenuClickTracking
 22 _log_MobileWikiAppAppearanceSettings
 22 _log_MobileWikiAppFeed
 24 _log_GettingStartedRedirectImpression
 25 _log_AdvancedSearchRequest
 25 _log_CitationUsage
 25 _log_MobileWebUploads
 26 _log_GettingStartedOnRedirect
 26 _log_MobileWikiAppMediaGallery
 28 _log_MobileWikiAppCreateAccount
 28 _log_MobileWikiAppLogin
 29 _log_ChangesListFilters
 31 _log_MobileWikiAppLinkPreview
 31 _log_MobileWikiAppShareAFact
 32 _log_Popups
 35 _log_MobileWikiAppSessions
 36 _log_ServerSideAccountCreation
 37 _log_MobileWikiAppArticleSuggestions
 40 _log_Edit
 40 _log_TestSearchSatisfaction2
 44 _log_MobileWikiAppSearch
 46 _log_MobileWikiAppToCInteraction
 47 _log_MobileWikiAppEdit
 50 _log_MobileWebEditing
 66 _log_mediawiki
152 _log_NavigationTiming

Only on db1107 there are _log_sql_aff* files for up 507G, that are not on db1108. No idea what they are about.

If we check the top 35 of worst table sizes for both databases:

elukey@db1107:/srv/sqldata$ du -hsc _log*.tokudb | sort -h | tail -n 35
8.2G	_log_Print_17630514_key_ix_Print_17630514_uuid_197665d3d_3_1d_B_0.tokudb
8.5G	_log_sql_aff_4482858_key_ix_Popups_16364296_uuid_194806e54_3_1d.tokudb
8.8G	_log_sql_aff_4482858_main_19480624f_2_1d.tokudb
8.9G	_log_MobileWikiAppSessions_18948969_key_ix_MobileWikiAppSessions_18948969_timestamp_21e63dabe_3_1d_P_0.tokudb
8.9G	_log_TestSearchSatisfaction2_16909631_main_15a74b98f_2_1b.tokudb
9.1G	_log_MobileWikiAppLinkPreview_18531254_key_ix_MobileWikiAppLinkPreview_18531254_timestamp_1ef8ccc22_3_1d_P_0.tokudb
9.5G	_log_mediawiki_page_create_2_main_1612e43c8_2_1b.tokudb
9.6G	_log_sql_aff_4482858_main_194805985_2_1d.tokudb
9.8G	_log_NavigationTiming_18988839_main_223f7f50d_2_1d.tokudb
10G	_log_MobileWikiAppShareAFact_18144594_main_1c5c03a57_2_1d.tokudb
11G	_log_Edit_17541122_main_18ea7a08e_2_1d.tokudb
11G	_log_MobileWikiAppToCInteraction_19044853_main_228f9913d_2_1d.tokudb
11G	_log_sql_aff_4482858_main_1948035d5_2_1d.tokudb
11G	_log_sql_aff_4482858_main_194804364_2_1d.tokudb
12G	_log_MobileWikiAppSessions_18115099_main_1c5d5d4e4_2_1d.tokudb
12G	_log_sql_aff_4482858_main_194804cae_2_1d.tokudb
13G	_log_sql_aff_4482858_main_194806adc_2_1d.tokudb
15G	_log_ChangesListFilters_17629781_key_ix_ChangesListFilters_17629781_uuid_19762269a_3_1d_B_0.tokudb
15G	_log_sql_aff_4482858_main_1948023c8_2_1d.tokudb
16G	_log_CitationUsage_18359729_main_1e2d5abbc_2_1d.tokudb
18G	_log_CitationUsage_18810892_main_215d81072_2_1d.tokudb
23G	_log_MobileWikiAppDailyStats_18115101_main_1c5c25328_2_1d.tokudb
23G	_log_QuickSurveyInitiation_18397507_main_1ebce0f89_2_1d.tokudb
24G	_log_ChangesListFilters_17629781_main_197622695_2_1d.tokudb
24G	_log_sql_aff_4482858_main_194808080_2_1d.tokudb
25G	_log_sql_aff_4482858_key_ix_MediaViewer_10867062_uuid_194802877_3_1d.tokudb
26G	_log_Print_17630514_main_197665cb7_2_1d.tokudb
30G	_log_sql_aff_4482858_main_194806e54_2_1d.tokudb
32G	_log_sql_aff_4482858_main_194807792_2_1d.tokudb
45G	_log_sql_aff_4482858_main_194802877_2_1d.tokudb
57G	_log_MobileWikiAppLinkPreview_18531254_key_ix_MobileWikiAppLinkPreview_18531254_uuid_1ef8ccc1e_3_1d_B_0.tokudb
57G	_log_MobileWikiAppSessions_18948969_key_ix_MobileWikiAppSessions_18948969_uuid_21e63daba_3_1d_B_0.tokudb
203G	_log_MobileWikiAppLinkPreview_18531254_main_1ef8ccc19_2_1d.tokudb
223G	_log_MobileWikiAppSessions_18948969_main_21e63dab7_2_1d.tokudb
elukey@db1108:/srv/sqldata$ du -hsc _log*.tokudb | sort -h | tail -n 35
9.4G	_log_NavigationTiming_18988839_main_32f05193_2_1d.tokudb
9.7G	_log_MobileWikiAppArticleSuggestions_10590869_15423246_main_3fe3_2_1d_B_0.tokudb
9.7G	_log_MobileWikiAppShareAFact_18144594_main_15c554f8_2_1d.tokudb
9.7G	_log_NavigationTiming_10785754_main_4698_2_1d_B_0.tokudb
9.8G	_log_Edit_11448630_key_ix_Edit_11448630_uuid_5036_2_1d_B_1.tokudb
11G	_log_Edit_17541122_main_6af6fb0_2_1d.tokudb
12G	_log_MobileWikiAppLinkPreview_15730939_main_417b_2_1d_B_0.tokudb
12G	_log_MobileWikiAppSessions_18115099_main_15cbbec5_2_1d.tokudb
14G	_log_MobileWikiAppToCInteraction_8461467_main_5033_2_1d_B_0.tokudb
14G	_log_RelatedArticles_16352530_main_49c1_2_1d_B_0.tokudb
15G	_log_ChangesListFilters_17629781_key_ix_ChangesListFilters_17629781_uuid_8e30138_3_1d.tokudb
16G	_log_MobileWikiAppSessions_18948969_key_ix_MobileWikiAppSessions_18948969_uuid_31425d76_3_1d.tokudb
17G	_log_MobileWikiAppMediaGallery_10923135_15423246_main_41ce_2_1d_B_0.tokudb
17G	_log_NavigationTiming_15485142_15423246_main_46b3_2_1d_B_0.tokudb
18G	_log_CitationUsage_18810892_main_2eaf26df_2_1d.tokudb
18G	_log_Edit_13457736_main_3cde_2_1d_B_0.tokudb
19G	_log_CitationUsage_18359729_main_1eb5b5f4_2_1d.tokudb
20G	_log_Popups_15906495_main_487f_2_1d_B_0.tokudb
21G	_log_MobileWebEditing_8599025_main_3e4f_2_1d_B_0.tokudb
22G	_log_MobileWikiAppDailyStats_18115101_main_15c5f25a_2_1d.tokudb
22G	_log_QuickSurveyInitiation_18397507_main_217e3cd1_2_1d.tokudb
23G	_log_MobileWikiAppSearch_10641988_15423246_main_502f_2_1d_B_0.tokudb
24G	_log_ChangesListFilters_17629781_main_8e30138_2_1d.tokudb
24G	_log_MediaViewer_10867062_key_ix_MediaViewer_10867062_uuid_5026_2_1d_B_1.tokudb
26G	_log_Print_17630514_main_8e40bb4_2_1d.tokudb
26G	_log_ReadingDepth_16325045_main_4970_2_1d_B_0.tokudb
28G	_log_MobileWikiAppArticleSuggestions_11448426_15423246_main_3ff8_2_1d_B_0.tokudb
31G	_log_MobileWikiAppLinkPreview_18531254_key_ix_MobileWikiAppLinkPreview_18531254_uuid_22ad86b3_3_1d.tokudb
32G	_log_Edit_11448630_main_5036_2_1d_B_0.tokudb
34G	_log_QuickSurveyInitiation_15278946_main_5029_2_1d_B_0.tokudb
36G	_log_Popups_16364296_main_48b5_2_1d_B_0.tokudb
48G	_log_MediaViewer_10867062_main_5026_2_1d_B_0.tokudb
57G	_log_MobileWikiAppSessions_18948969_main_31425d76_2_1d.tokudb
98G	_log_MobileWikiAppLinkPreview_18531254_main_22ad86b3_2_1d.tokudb

If there are tables that we can drop because related to old/not-wanted EL schemas, let's do it, it should give us some space back. After that, I'd take a mysqldump of both databases and archive it in Hadoop, just to be sure. Then we will be able to decide what to do, and I can see the following macro-question to answer:

What database should be moved to the dbstore nodes after the archive step (since they are not the same)? I would say db1108 since it is the replica that has been used for all this time by Analytics users, and not db1107.

There is also another option that we could think about to avoid spending a ton of time on this. Since we want to allow people to query the log database for historical purposes, it seems a waste of human resources to move it from one db host to another one (reducing space available for other important databases like the wiki replicas). We could simply stop pushing data to db1107, set it to spare to allow the Data Persistence team to repurpose it in other ways and keep db1108 untouched. Our users will still have a "known" place to query for historical data, and we'll not have to advertise to them how to migrate their scripts etc.. We'll also don't spend time on migrating databases (involving Manuel as well).

We could then decide with our users if db1108 is needed in say a year time (since by that date we'll have 2.5y of data in hadoop and hopefully everybody will have migrated away from sql).

Just 2c :)

s/mysqldump/mydumper/

Regarding the machines, I would suggest to either decom both or keep both. Keeping a service with redundancy of 0 is undesirable, even if with low or no usage. There is space on backups for offline long term archiving of important data, although not for large datasets.

Yes I am aware, but in case of a big disaster (like host completely broken) we could think about an alternative starting from the db dumps taken before. If one of the three dbstore node goes down indefinitely now we'll have to either order hardware or ask for a spare replacement, I see the same problem but in a different context. Given the fact that the past 1.5y of eventlogging data is already in Hadoop and people are using it, I don't see a huge problem in keeping only one db host.

Yes I am aware, but in case of a big disaster (like host completely broken) we could think about an alternative starting from the db dumps taken before. If one of the three dbstore node goes down indefinitely now we'll have to either order hardware or ask for a spare replacement, I see the same problem but in a different context. Given the fact that the past 1.5y of eventlogging data is already in Hadoop and people are using it, I don't see a huge problem in keeping only one db host.

I am fine with this approach (keeping just db1108) as long as expectations are handled with stakeholders. A host can fail anytime, meaning that the data on that single host should be considered volatile and a plan B should be available (like the one you suggest) and also recovery times should be communicated to stakeholders as well.

We appreciate the effort you guys are making to decommission these host and the service! :-)

Given T234826, the next steps for this task in my opinion are:

  1. stop producing events from eventlogging to db1107
  2. stop/disable replication between db1107/db1108
  3. sanitize data in the log databases (the past 90d I mean) before archiving - @mforns to triple check if my thoughts are correct
  4. disable sanitization timer
  5. archive both db1107 and db1108 on hadoop

archive both db1107 and db1108 on hadoop

Sounds good, this last step is about doing a one time backup right?

We also need to update docs about how stakeholders can access the data through 1108 going forward.

Checking in on this.

If there are tables that we can drop because related to old/not-wanted EL schemas, let's do it

How do we find this out?

In this case we decided to drop the data for the subtasks linked, the rest will remain in host that from now on will act as a backup for other mysql storages we use in hadoop or superset.

To be clear EL mysql is served by two hosts (master and replica) . We are releasing just one of them to DBA team and keeping the other.

Ahhh ok right. Right.

sanitize data in the log databases

This shouldn't be necessary, right? IIUC, data should be sanitized already?

archive both db1107 and db1108 on hadoop

Do we need to archive db1107 data? Analysts never query it, everything they have access to is on db1108.

I'd like to do T232349: Disable production EventLogging analytics MySQL consumers, but I think first we need to hear back about T212367: Start refining ChangesListHighlights events, yes?

Ahhh ok right. Right.

sanitize data in the log databases

This shouldn't be necessary, right? IIUC, data should be sanitized already?

I need to check with @mforns but my understanding is that the last 90d of events are not sanitized yet..

archive both db1107 and db1108 on hadoop

Do we need to archive db1107 data? Analysts never query it, everything they have access to is on db1108.

Ok for me to backup only 1108 :)

I'd like to do T232349: Disable production EventLogging analytics MySQL consumers, but I think first we need to hear back about T212367: Start refining ChangesListHighlights events, yes?

Seems so yes!

In T231858#5612967, @Ottomata wrote:
Ahhh ok right. Right.

sanitize data in the log databases

This shouldn't be necessary, right? IIUC, data should be sanitized already?

I need to check with @mforns but my understanding is that the last 90d of events are not sanitized yet..

Yes, the in MySQL the last 90 days of data are not sanitized yet.
We can try to run the sanitization script to the remaining data, I guess.

I have created a task about re-running sanitization: T236818: Rerun sanitization before archiving eventlogging mysql data

Sanitization is needed for archival of data but not to disable the consumers.

@Ottomata @Nuria
Another solution is archiving only the sanitized data.
I don't think having the last 90 days of data in the backup is critical, no?

You are right that it is not critical but deleting it selectively from all tables is probably more work than sanitizing it .

Sanitization done!

The remaining step seems to be to take a mysql dump of both log databases (on db1107 and db1108) and then we are done!

Mysqldumping both hosts now:

sudo mysqldump --all-databases --skip-lock-tables --quick | gzip > mysqldump-$(hostname)-$(date +%F).sql

db1107:

real    1258m49.602s
user    1875m25.784s
sys     54m34.796s
696G Nov 14 15:50 mysqldump-db1107-2019-11-13.sql.gz

db1108:

real    1277m7.727s
user    1933m36.412s
sys     56m37.912s

680G Nov 14 16:09 mysqldump-db1108-2019-11-13.sql.gz

Copying to stat1007 and then into hdfs...

If you don't plan to recover the data, and it is for archival purposes, that is ok. However I strongly suggest to use mydumper in the future, or a recovery on a single thread would take around 5 days, and will make very difficult to do a partial recovery. We preciselly wrap backup_mariadb.py and recover_dump.py so sane defaults are used. The backup taking would also have been 5-10 times faster.

Ah! Good to know. Will do next time. Or...I could dump again, the DBs are still up. Do you think I should?

I closed this but i think there is an open question of whether the dump needs to happen again cc @jcrespo

ping @Ottomata as to whether this issue can be closed

I think we can close!