We are looking to decomission the database host that serves now as el master once we pass away the mysql consumer. There is data on the host that does not exist on hadoop, we should probably do an export of all tables and load it in one of the machines that make the analytics replica cluster.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ottomata | T159170 Sunset MySQL data store for eventlogging | |||
Resolved | Ottomata | T231858 Archive data on eventlogging MySQL to analytics replica before decomisioning | |||
Resolved | elukey | T233891 Drop Navigationtiming data entirely from mysql storage? | |||
Resolved | elukey | T233892 Drop page create event data on mysql | |||
Resolved | elukey | T233893 drop CitatitionUsage data on mysql | |||
Resolved | elukey | T236818 Rerun sanitization before archiving eventlogging mysql data |
Event Timeline
I asked some info to @Marostegui today and we might need to do some work to make this happen.
My assumption is that we only want to keep a read-only replica of what we currently have on db1107:log and store it as archive on one of the dbstore nodes.
Set of things to solve:
- the log database is tokudb, and the data persistence team is trying to not support it anymore. The dbstore nodes might not support it as they are configured now. We may move to InnoDB, but Toku was used in the first place for its data compression features, so the following point would be even worse.
- size is a problem, since the log /srv/sqldata dir on db1107 is 1.7TB meanwhile on db1108 is 2.1TB. The only place where we could store one of these monsters is dbstore1005, but it would leave the host with few hundreds GBs of free space in the first case and probably close to none in the latter. Both use cases are too risky, so we'd need to drop data if possible (maybe unused things?)
- as follow up of the above point, we should need to verify if we want to keep db1108's log db or db1107's since they are different.
db1108 has also the staging database, so that might be the reason for the different sizes.
Probably exploring what needs to be kept from the log database and what can be dropped would be our best approach here. There are lots of things in there and I am sure lots of them aren't really used anymore.
One question I have is whether you guys want to keep that database online (as in: it could be queryable) or just keep it as a backup in case it needs to be restored for some emergency/very concrete case.
If the latter, we can try to explore how much disk space a mysqldump can take, and where to store it. We'd also need to plan where it would need to be restored in, as we'd need to have a host (or be ready to have one) with enough disk space, a mysql installation with support for tokudb etc. Also managing expectations would be important: a restore can take days.
What are the odds of someone needing to query the log database once it has been removed from the current hosts?
One question I have is whether you guys want to keep that database online (as in: it could be queryable) or just keep it as a backup in case it needs to be restored for some emergency/very concrete case.
It should be queryable as there is data there that predates hadoop and while used sparingly needs to be accessible. Now, data as of the last 1.5 years (aprox, I can check for exact dates) on mysql is also in hadoop so, before moving, we can drop all data that exists in both storages, that should reduce size a bunch.
It is actually a lot of work to scoop this data as it is not doable in bulk, every table needs to have special mappings. I did 4/5 tables some quarters back and it was a lot of menial work. In many instances it *seemed* that tables had been properly imported but some in some columns data was mingled, every column for every table would need to be queried to make sure imports are correct.
I rather drop data that also exists in hadoop, thus reducing data size, and leave these tables available to be queried in mysql.
Scoop can map column types but there are gotchas, like column types with '.' need to be quoted, many columns need an explicit utf8 conversion or casting (see example command below). I am not saying it scoops cannot be scripted, they can totally be, but after each one of them we need to verify data for all columns made it like we expected. Seems like it would be less work to just drop data not needed and archive on mysql, if you disagree and think we can script all these table scoops and export them to hadoop with not that much work do let me know.
time sudo -u hdfs sqoop import --as-avrodatafile --password-file '/user/hdfs/mysql-analytics-research-client-pw.txt' --username research --connect jdbc:mysql://analytics-store.eqiad.wmnet/log --query 'select id,uuid,convert(timestamp using utf8) timestamp,convert(webHost using utf8) webhost,wiki,cast(event_isAPI as Integer) event_isAPI,cast(event_isMobile as Integer) event_isMobile,cast(event_revisionId as Integer) event_revisionId from PageContentSaveComplete_5588433 where $CONDITIONS' --map-column-java event_isAPI=Integer,event_isMobile=Integer,event_revisionId=Integer --target-dir /tmp/PageContentSaveCompleteAvro --split-by id
I think we should explore what Nuria suggested, trying to drop as much as possible before deciding where to archive this data (either MySQL or Hadoop), as archiving data we won't need is a waste of time and resources. So either way, we'd need to drop that data.
Maybe once it's been cleaned up, we can resume the discussion whether MySQL (if we are within disk space margins) or Hadoop?
Some data:
elukey@db1107:/srv/sqldata$ ls _* | cut -d "_" -f 2 | sort | uniq -c 5 2198 log 16 otto 2 test
elukey@db1108:/srv/sqldata$ ls _* | cut -d "_" -f 2 | sort | uniq -c 5 2174 log 11 staging elukey@db1108:/srv/sqldata$ du -hs _staging* | sort -h 16K _staging_aharoni_test_001_main_2522e85a_1_1d_B_0.tokudb 16K _staging_dartar_clean_patrol_log_status_229efcf_1_1d.tokudb 16K _staging_dartar_pages_reviewed_status_229f052_1_1d.tokudb 64K _staging_aharoni_test_001_status_2522e7ec_1_1d.tokudb 64K _staging_cx_abuse_filter_daily_status_25d15674_1_1d.tokudb 64K _staging_leila_edits_copy_status_229f0f3_1_1d.tokudb 160K _staging_cx_abuse_filter_daily_main_25d15677_1_1d_B_0.tokudb 2.9M _staging_dartar_pages_reviewed_main_22c5ab6_2_1d_B_0.tokudb 59M _staging_dartar_clean_patrol_log_main_22c55bd_2_1d_B_0.tokudb 172M _staging_leila_edits_copy_key_user_id_22c8cf7_2_1d_B_0.tokudb 648M _staging_leila_edits_copy_main_22c8cf7_2_1d_B_1.tokudb
The staging db seems relatively small, so the log db is different from 1107 and 1108.
Using cut -d "_" and printing the first 3 fields of filenames:
- db1107
4 _log_ArticleCreationWorkflow 4 _log_ChangesListFilterGrouping 4 _log_MobileOptionsTracking 4 _log_PageCreation 4 _log_SearchSatisfactionErrors 4 _log_TestSearchSatisfaction2 5 _log_CentralNoticeImpression 5 _log_ChangesListFilters 5 _log_ContentTranslation 5 _log_ContentTranslationAbuseFilter 5 _log_ContentTranslationSuggestion 5 _log_EUCCStats 5 _log_EUCCVisit 5 _log_InputDeviceDynamics 5 _log_MobileWikiAppLangSelect 5 _log_MobileWikiAppLanguageSettings 5 _log_MobileWikiAppNotificationInteraction 5 _log_QuickSurveysResponses 5 _log_ReadingDepth 5 _log_Test 5 _log_UniversalLanguageSelector 5 _log_UploadWizardErrorFlowEvent 5 _log_UploadWizardStep 5 _log_WMDEBannerSizeIssue 8 _log_HumanSearchRelevance 9 _log_Print 10 _log_Edit 10 _log_LandingPageImpression 10 _log_MobileWebMainMenuClickTracking 10 _log_MobileWikiAppAppearanceSettings 10 _log_MobileWikiAppArticleSuggestions 10 _log_MobileWikiAppCreateAccount 10 _log_MobileWikiAppFeed 10 _log_MobileWikiAppFindInPage 10 _log_MobileWikiAppInstallReferrer 10 _log_MobileWikiAppIntents 10 _log_MobileWikiAppLanguageSearching 10 _log_MobileWikiAppLogin 10 _log_MobileWikiAppMediaGallery 10 _log_MobileWikiAppOfflineLibrary 10 _log_MobileWikiAppPageScroll 10 _log_MobileWikiAppProtectedEditAttempt 10 _log_MobileWikiAppReadingLists 10 _log_MobileWikiAppTabs 10 _log_MobileWikiAppWiktionaryPopup 10 _log_MobileWikiAppiOSFeed 10 _log_MobileWikiAppiOSSettingAction 10 _log_QuickSurveyInitiation 10 _log_ServerSideAccountCreation 10 _log_TwoColConflictConflict 10 _log_WMDEBannerEvents 15 _log_MobileWikiAppDailyStats 15 _log_MobileWikiAppEdit 15 _log_MobileWikiAppFeedConfigure 15 _log_MobileWikiAppLinkPreview 15 _log_MobileWikiAppOnThisDay 15 _log_MobileWikiAppRandomizer 15 _log_MobileWikiAppSessions 15 _log_MobileWikiAppShareAFact 15 _log_MobileWikiAppiOSLoginAction 15 _log_MobileWikiAppiOSUserHistory 20 _log_MobileWikiAppSearch 20 _log_MobileWikiAppiOSReadingLists 20 _log_MobileWikiAppiOSSessions 20 _log_WikidataCompletionSearchClicks 25 _log_AdvancedSearchRequest 25 _log_CitationUsage 25 _log_MobileWikiAppToCInteraction 70 _log_mediawiki 79 _log_NavigationTiming 1363 _log_sql
- db1108
2 _log_ChangesListHighlights 3 _log_MobileWebWatching 4 _log_ArticleCreationWorkflow 4 _log_ChangesListClickTracking 4 _log_ChangesListFilterGrouping 4 _log_DiacriticsPoll 4 _log_DiacriticsVisibility 4 _log_EditorActivation 4 _log_GeoFeatures 4 _log_MediaViewer 4 _log_MobileWebDiffClickTracking 4 _log_MobileWebSearch 4 _log_MobileWebUIClickTracking 4 _log_MobileWebWatchlistClickTracking 4 _log_MobileWikiAppNavMenu 4 _log_PageContentSaveComplete 4 _log_PageCreation 4 _log_RecentChangesTopLinks 4 _log_SearchSatisfactionErrors 4 _log_TaskRecommendation 4 _log_TaskRecommendationClick 4 _log_TaskRecommendationImpression 4 _log_TestEventDuplication 4 _log_TrackedPageContentSaveComplete 4 _log_VET135171 5 _log_CentralNoticeImpression 5 _log_ContentTranslationAbuseFilter 5 _log_EUCCStats 5 _log_EUCCVisit 5 _log_GettingStartedNavbarNoArticle 5 _log_InputDeviceDynamics 5 _log_MobileWikiAppLanguageSettings 5 _log_MobileWikiAppNotificationInteraction 5 _log_Test 5 _log_VisualEditorDOMRetrieved 5 _log_VisualEditorDOMSaved 5 _log_WMDEBannerSizeIssue 6 _log_EchoPrefUpdate 6 _log_GatherClicks 7 _log_ContentTranslationError 7 _log_ExternalLinksChange 7 _log_MobileWebLanguageSwitcher 7 _log_UploadWizardExceptionFlowEvent 7 _log_UploadWizardFlowEvent 7 _log_UploadWizardUploadFlowEvent 8 _log_ContentTranslationCTA 8 _log_HumanSearchRelevance 8 _log_Kartographer 8 _log_MediaWikiPingback 8 _log_MobileWikiAppWidgets 8 _log_MultimediaViewerAttribution 8 _log_MultimediaViewerDimensions 8 _log_MultimediaViewerDuration 8 _log_MultimediaViewerNetworkPerformance 8 _log_MultimediaViewerVersusPageFilePerformance 8 _log_TaskRecommendationLightbulbClick 8 _log_TranslationRecommendationUIRequests 8 _log_TranslationRecommendationUserAction 8 _log_WikipediaZeroUsage 9 _log_CentralAuth 9 _log_EchoMail 9 _log_EditConflict 9 _log_MobileAppCategorizationAttempts 9 _log_MobileAppLoginAttempts 9 _log_MobileAppShareAttempts 9 _log_MobileAppUploadAttempts 9 _log_MobileWikiAppLangSelect 9 _log_MobileWikiAppStuffHappens 9 _log_NewEditorEdit 9 _log_PrefUpdate 9 _log_Print 9 _log_QuickSurveysResponses 9 _log_UploadWizardTutorialActions 9 _log_WikimediaBlogVisit 10 _log_GuidedTour 10 _log_LandingPageImpression 10 _log_MobileLeftNavbarEditCTA 10 _log_MobileWebCta 10 _log_MobileWebInfobox 10 _log_MobileWikiAppLanguageSearching 10 _log_MobileWikiAppOfflineLibrary 10 _log_MobileWikiAppiOSFeed 10 _log_MobileWikiAppiOSSettingAction 10 _log_TwoColConflictConflict 10 _log_WMDEBannerEvents 11 _log_MobileWebWikiGrokError 12 _log_ContentTranslation 12 _log_MobileWebWikiGrok 12 _log_MobileWebWikiGrokResponse 12 _log_RelatedArticles 12 _log_TranslationRecommendationAPIRequests 12 _log_UploadWizardErrorFlowEvent 12 _log_UploadWizardStep 12 _log_WikipediaPortal 13 _log_ContentTranslationSuggestion 13 _log_MobileOptionsTracking 13 _log_MobileWikiAppSavedPages 13 _log_ReadingDepth 14 _log_GuidedTourButtonClick 14 _log_GuidedTourExited 14 _log_GuidedTourExternalLinkActivation 14 _log_GuidedTourGuiderHidden 14 _log_GuidedTourGuiderImpression 14 _log_GuidedTourInternalLinkActivation 14 _log_MobileWikiAppFindInPage 14 _log_MobileWikiAppInstallReferrer 14 _log_MobileWikiAppOnboarding 14 _log_MobileWikiAppPageScroll 14 _log_MobileWikiAppTabs 14 _log_QuickSurveyInitiation 14 _log_UniversalLanguageSelector 15 _log_MobileWikiAppFeedConfigure 15 _log_MobileWikiAppOnThisDay 15 _log_MobileWikiAppRandomizer 15 _log_MobileWikiAppiOSLoginAction 15 _log_MobileWikiAppiOSUserHistory 16 _log_FlowReplies 16 _log_SaveTiming 17 _log_MobileWikiAppIntents 17 _log_MobileWikiAppWiktionaryPopup 18 _log_MobileWikiAppReadingLists 19 _log_MobileWikiAppDailyStats 19 _log_MobileWikiAppProtectedEditAttempt 20 _log_EchoInteraction 20 _log_MobileWikiAppiOSReadingLists 20 _log_MobileWikiAppiOSSessions 20 _log_WikidataCompletionSearchClicks 21 _log_MobileWebMainMenuClickTracking 22 _log_MobileWikiAppAppearanceSettings 22 _log_MobileWikiAppFeed 24 _log_GettingStartedRedirectImpression 25 _log_AdvancedSearchRequest 25 _log_CitationUsage 25 _log_MobileWebUploads 26 _log_GettingStartedOnRedirect 26 _log_MobileWikiAppMediaGallery 28 _log_MobileWikiAppCreateAccount 28 _log_MobileWikiAppLogin 29 _log_ChangesListFilters 31 _log_MobileWikiAppLinkPreview 31 _log_MobileWikiAppShareAFact 32 _log_Popups 35 _log_MobileWikiAppSessions 36 _log_ServerSideAccountCreation 37 _log_MobileWikiAppArticleSuggestions 40 _log_Edit 40 _log_TestSearchSatisfaction2 44 _log_MobileWikiAppSearch 46 _log_MobileWikiAppToCInteraction 47 _log_MobileWikiAppEdit 50 _log_MobileWebEditing 66 _log_mediawiki 152 _log_NavigationTiming
Only on db1107 there are _log_sql_aff* files for up 507G, that are not on db1108. No idea what they are about.
If we check the top 35 of worst table sizes for both databases:
elukey@db1107:/srv/sqldata$ du -hsc _log*.tokudb | sort -h | tail -n 35 8.2G _log_Print_17630514_key_ix_Print_17630514_uuid_197665d3d_3_1d_B_0.tokudb 8.5G _log_sql_aff_4482858_key_ix_Popups_16364296_uuid_194806e54_3_1d.tokudb 8.8G _log_sql_aff_4482858_main_19480624f_2_1d.tokudb 8.9G _log_MobileWikiAppSessions_18948969_key_ix_MobileWikiAppSessions_18948969_timestamp_21e63dabe_3_1d_P_0.tokudb 8.9G _log_TestSearchSatisfaction2_16909631_main_15a74b98f_2_1b.tokudb 9.1G _log_MobileWikiAppLinkPreview_18531254_key_ix_MobileWikiAppLinkPreview_18531254_timestamp_1ef8ccc22_3_1d_P_0.tokudb 9.5G _log_mediawiki_page_create_2_main_1612e43c8_2_1b.tokudb 9.6G _log_sql_aff_4482858_main_194805985_2_1d.tokudb 9.8G _log_NavigationTiming_18988839_main_223f7f50d_2_1d.tokudb 10G _log_MobileWikiAppShareAFact_18144594_main_1c5c03a57_2_1d.tokudb 11G _log_Edit_17541122_main_18ea7a08e_2_1d.tokudb 11G _log_MobileWikiAppToCInteraction_19044853_main_228f9913d_2_1d.tokudb 11G _log_sql_aff_4482858_main_1948035d5_2_1d.tokudb 11G _log_sql_aff_4482858_main_194804364_2_1d.tokudb 12G _log_MobileWikiAppSessions_18115099_main_1c5d5d4e4_2_1d.tokudb 12G _log_sql_aff_4482858_main_194804cae_2_1d.tokudb 13G _log_sql_aff_4482858_main_194806adc_2_1d.tokudb 15G _log_ChangesListFilters_17629781_key_ix_ChangesListFilters_17629781_uuid_19762269a_3_1d_B_0.tokudb 15G _log_sql_aff_4482858_main_1948023c8_2_1d.tokudb 16G _log_CitationUsage_18359729_main_1e2d5abbc_2_1d.tokudb 18G _log_CitationUsage_18810892_main_215d81072_2_1d.tokudb 23G _log_MobileWikiAppDailyStats_18115101_main_1c5c25328_2_1d.tokudb 23G _log_QuickSurveyInitiation_18397507_main_1ebce0f89_2_1d.tokudb 24G _log_ChangesListFilters_17629781_main_197622695_2_1d.tokudb 24G _log_sql_aff_4482858_main_194808080_2_1d.tokudb 25G _log_sql_aff_4482858_key_ix_MediaViewer_10867062_uuid_194802877_3_1d.tokudb 26G _log_Print_17630514_main_197665cb7_2_1d.tokudb 30G _log_sql_aff_4482858_main_194806e54_2_1d.tokudb 32G _log_sql_aff_4482858_main_194807792_2_1d.tokudb 45G _log_sql_aff_4482858_main_194802877_2_1d.tokudb 57G _log_MobileWikiAppLinkPreview_18531254_key_ix_MobileWikiAppLinkPreview_18531254_uuid_1ef8ccc1e_3_1d_B_0.tokudb 57G _log_MobileWikiAppSessions_18948969_key_ix_MobileWikiAppSessions_18948969_uuid_21e63daba_3_1d_B_0.tokudb 203G _log_MobileWikiAppLinkPreview_18531254_main_1ef8ccc19_2_1d.tokudb 223G _log_MobileWikiAppSessions_18948969_main_21e63dab7_2_1d.tokudb
elukey@db1108:/srv/sqldata$ du -hsc _log*.tokudb | sort -h | tail -n 35 9.4G _log_NavigationTiming_18988839_main_32f05193_2_1d.tokudb 9.7G _log_MobileWikiAppArticleSuggestions_10590869_15423246_main_3fe3_2_1d_B_0.tokudb 9.7G _log_MobileWikiAppShareAFact_18144594_main_15c554f8_2_1d.tokudb 9.7G _log_NavigationTiming_10785754_main_4698_2_1d_B_0.tokudb 9.8G _log_Edit_11448630_key_ix_Edit_11448630_uuid_5036_2_1d_B_1.tokudb 11G _log_Edit_17541122_main_6af6fb0_2_1d.tokudb 12G _log_MobileWikiAppLinkPreview_15730939_main_417b_2_1d_B_0.tokudb 12G _log_MobileWikiAppSessions_18115099_main_15cbbec5_2_1d.tokudb 14G _log_MobileWikiAppToCInteraction_8461467_main_5033_2_1d_B_0.tokudb 14G _log_RelatedArticles_16352530_main_49c1_2_1d_B_0.tokudb 15G _log_ChangesListFilters_17629781_key_ix_ChangesListFilters_17629781_uuid_8e30138_3_1d.tokudb 16G _log_MobileWikiAppSessions_18948969_key_ix_MobileWikiAppSessions_18948969_uuid_31425d76_3_1d.tokudb 17G _log_MobileWikiAppMediaGallery_10923135_15423246_main_41ce_2_1d_B_0.tokudb 17G _log_NavigationTiming_15485142_15423246_main_46b3_2_1d_B_0.tokudb 18G _log_CitationUsage_18810892_main_2eaf26df_2_1d.tokudb 18G _log_Edit_13457736_main_3cde_2_1d_B_0.tokudb 19G _log_CitationUsage_18359729_main_1eb5b5f4_2_1d.tokudb 20G _log_Popups_15906495_main_487f_2_1d_B_0.tokudb 21G _log_MobileWebEditing_8599025_main_3e4f_2_1d_B_0.tokudb 22G _log_MobileWikiAppDailyStats_18115101_main_15c5f25a_2_1d.tokudb 22G _log_QuickSurveyInitiation_18397507_main_217e3cd1_2_1d.tokudb 23G _log_MobileWikiAppSearch_10641988_15423246_main_502f_2_1d_B_0.tokudb 24G _log_ChangesListFilters_17629781_main_8e30138_2_1d.tokudb 24G _log_MediaViewer_10867062_key_ix_MediaViewer_10867062_uuid_5026_2_1d_B_1.tokudb 26G _log_Print_17630514_main_8e40bb4_2_1d.tokudb 26G _log_ReadingDepth_16325045_main_4970_2_1d_B_0.tokudb 28G _log_MobileWikiAppArticleSuggestions_11448426_15423246_main_3ff8_2_1d_B_0.tokudb 31G _log_MobileWikiAppLinkPreview_18531254_key_ix_MobileWikiAppLinkPreview_18531254_uuid_22ad86b3_3_1d.tokudb 32G _log_Edit_11448630_main_5036_2_1d_B_0.tokudb 34G _log_QuickSurveyInitiation_15278946_main_5029_2_1d_B_0.tokudb 36G _log_Popups_16364296_main_48b5_2_1d_B_0.tokudb 48G _log_MediaViewer_10867062_main_5026_2_1d_B_0.tokudb 57G _log_MobileWikiAppSessions_18948969_main_31425d76_2_1d.tokudb 98G _log_MobileWikiAppLinkPreview_18531254_main_22ad86b3_2_1d.tokudb
If there are tables that we can drop because related to old/not-wanted EL schemas, let's do it, it should give us some space back. After that, I'd take a mysqldump of both databases and archive it in Hadoop, just to be sure. Then we will be able to decide what to do, and I can see the following macro-question to answer:
What database should be moved to the dbstore nodes after the archive step (since they are not the same)? I would say db1108 since it is the replica that has been used for all this time by Analytics users, and not db1107.
There is also another option that we could think about to avoid spending a ton of time on this. Since we want to allow people to query the log database for historical purposes, it seems a waste of human resources to move it from one db host to another one (reducing space available for other important databases like the wiki replicas). We could simply stop pushing data to db1107, set it to spare to allow the Data Persistence team to repurpose it in other ways and keep db1108 untouched. Our users will still have a "known" place to query for historical data, and we'll not have to advertise to them how to migrate their scripts etc.. We'll also don't spend time on migrating databases (involving Manuel as well).
We could then decide with our users if db1108 is needed in say a year time (since by that date we'll have 2.5y of data in hadoop and hopefully everybody will have migrated away from sql).
Just 2c :)
s/mysqldump/mydumper/
Regarding the machines, I would suggest to either decom both or keep both. Keeping a service with redundancy of 0 is undesirable, even if with low or no usage. There is space on backups for offline long term archiving of important data, although not for large datasets.
Yes I am aware, but in case of a big disaster (like host completely broken) we could think about an alternative starting from the db dumps taken before. If one of the three dbstore node goes down indefinitely now we'll have to either order hardware or ask for a spare replacement, I see the same problem but in a different context. Given the fact that the past 1.5y of eventlogging data is already in Hadoop and people are using it, I don't see a huge problem in keeping only one db host.
I am fine with this approach (keeping just db1108) as long as expectations are handled with stakeholders. A host can fail anytime, meaning that the data on that single host should be considered volatile and a plan B should be available (like the one you suggest) and also recovery times should be communicated to stakeholders as well.
We appreciate the effort you guys are making to decommission these host and the service! :-)
Given T234826, the next steps for this task in my opinion are:
- stop producing events from eventlogging to db1107
- stop/disable replication between db1107/db1108
- sanitize data in the log databases (the past 90d I mean) before archiving - @mforns to triple check if my thoughts are correct
- disable sanitization timer
- archive both db1107 and db1108 on hadoop
archive both db1107 and db1108 on hadoop
Sounds good, this last step is about doing a one time backup right?
We also need to update docs about how stakeholders can access the data through 1108 going forward.
Checking in on this.
If there are tables that we can drop because related to old/not-wanted EL schemas, let's do it
How do we find this out?
In this case we decided to drop the data for the subtasks linked, the rest will remain in host that from now on will act as a backup for other mysql storages we use in hadoop or superset.
To be clear EL mysql is served by two hosts (master and replica) . We are releasing just one of them to DBA team and keeping the other.
Ahhh ok right. Right.
sanitize data in the log databases
This shouldn't be necessary, right? IIUC, data should be sanitized already?
archive both db1107 and db1108 on hadoop
Do we need to archive db1107 data? Analysts never query it, everything they have access to is on db1108.
I'd like to do T232349: Disable production EventLogging analytics MySQL consumers, but I think first we need to hear back about T212367: Start refining ChangesListHighlights events, yes?
I need to check with @mforns but my understanding is that the last 90d of events are not sanitized yet..
archive both db1107 and db1108 on hadoop
Do we need to archive db1107 data? Analysts never query it, everything they have access to is on db1108.
Ok for me to backup only 1108 :)
I'd like to do T232349: Disable production EventLogging analytics MySQL consumers, but I think first we need to hear back about T212367: Start refining ChangesListHighlights events, yes?
Seems so yes!
In T231858#5612967, @Ottomata wrote:
Ahhh ok right. Right.sanitize data in the log databases
This shouldn't be necessary, right? IIUC, data should be sanitized already?
I need to check with @mforns but my understanding is that the last 90d of events are not sanitized yet..
Yes, the in MySQL the last 90 days of data are not sanitized yet.
We can try to run the sanitization script to the remaining data, I guess.
I have created a task about re-running sanitization: T236818: Rerun sanitization before archiving eventlogging mysql data
Sanitization is needed for archival of data but not to disable the consumers.
You are right that it is not critical but deleting it selectively from all tables is probably more work than sanitizing it .
Sanitization done!
The remaining step seems to be to take a mysql dump of both log databases (on db1107 and db1108) and then we are done!
Mysqldumping both hosts now:
sudo mysqldump --all-databases --skip-lock-tables --quick | gzip > mysqldump-$(hostname)-$(date +%F).sql
db1107:
real 1258m49.602s user 1875m25.784s sys 54m34.796s 696G Nov 14 15:50 mysqldump-db1107-2019-11-13.sql.gz
db1108:
real 1277m7.727s user 1933m36.412s sys 56m37.912s 680G Nov 14 16:09 mysqldump-db1108-2019-11-13.sql.gz
Copying to stat1007 and then into hdfs...
If you don't plan to recover the data, and it is for archival purposes, that is ok. However I strongly suggest to use mydumper in the future, or a recovery on a single thread would take around 5 days, and will make very difficult to do a partial recovery. We preciselly wrap backup_mariadb.py and recover_dump.py so sane defaults are used. The backup taking would also have been 5-10 times faster.
Ah! Good to know. Will do next time. Or...I could dump again, the DBs are still up. Do you think I should?
I closed this but i think there is an open question of whether the dump needs to happen again cc @jcrespo