During the namenode outage a few weeks ago, and also during a failed namenode failover today, gobblin seems to have lost data without erroring about it.
hdfs dfs -ls /wmf/data/raw/eventlogging_legacy/eventlogging_ContentTranslationCTA/year=2022/month=06/day=23/hour=12/ Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 Found 3 items -rw-r----- 3 analytics analytics-privatedata-users 0 2022-06-23 13:12 /wmf/data/raw/eventlogging_legacy/eventlogging_ContentTranslationCTA/year=2022/month=06/day=23/hour=12/_IMPORTED -rw-r----- 3 analytics analytics-privatedata-users 23154 2022-06-23 12:11 /wmf/data/raw/eventlogging_legacy/eventlogging_ContentTranslationCTA/year=2022/month=06/day=23/hour=12/part.task_eventlogging_legacy_1655986216453_179_1.txt.gz -rw-r----- 3 analytics analytics-privatedata-users 0 2022-06-23 13:31 /wmf/data/raw/eventlogging_legacy/eventlogging_ContentTranslationCTA/year=2022/month=06/day=23/hour=12/part.task_eventlogging_legacy_1655989815701_0_0.txt.gz\
The file written by the Gobblin job that was launched at 13:10 wrote a 0 length .gz file. This 0 length .gz file is causing Unexpected end of input stream errors when Refine tries to decompress and ingest this file.
This time around the failures arein 2022-06-23 hours 12 and 13 for the following legacy eventlogging topics:
# 'analytics' eventlogging_MobileWikiAppInstallReferrer eventlogging_MobileWikiAppWatchlist eventlogging_ContentTranslationCTA eventlogging_CentralAuth eventlogging_MobileWikiAppEdit eventlogging_ContentTranslationSuggestion # 'legacy' (migrated to Event Platform) eventlogging_ReferencePreviewsBaseline eventlogging_HelpPanel eventlogging_SuggestedTagsAction
Once we fix the gobblin data, the proper re-refine commands will be:
sudo -u analytics kerberos-run-command analytics refine_eventlogging_analytics --ignore_failure_flag=true --table_include_regex='mobilewikiappedit|contenttranslationsuggestion|centralauth|mobilewikiappinstallreferrer|contenttranslationcta|contenttranslationsuggestion|mobilewikiappwatchlist|centralauth' --since='2022-06-22T12:00:00.000Z' --until='2022-06-23T15:00:00.000Z' sudo -u analytics kerberos-run-command analytics refine_eventlogging_legacy --ignore_failure_flag=true --table_include_regex='helppanel|referencepreviewsbaseline|suggestedtagsaction' --since='2022-06-22T12:00:00.000Z' --until='2022-06-23T15:00:00.000Z'
Investigation into why gobblin failed and how to restore the data will go below in comments.