Page MenuHomePhabricator

Table Cleanup - Drop Unused tables
Closed, ResolvedPublic3 Estimated Story Points

Event Timeline

EChetty triaged this task as High priority.Nov 28 2022, 8:28 PM
EChetty set the point value for this task to 3.
EChetty moved this task from To be discussed to Sprint 05-06 on the Data Pipelines board.
EChetty edited projects, added Data Pipelines (Sprint 05-06); removed Data Pipelines.
EChetty moved this task from Ready to Next Up on the Data Pipelines (Sprint 05-06) board.

I checked that those tables:

  • Are not active: currently not ingesting data (contenttranslation stopped at 1st Oct, flowreplies has no data for last 90 days, changeslistfiltergrouping has 1 hour of data on 24th Nov)
  • Are not in the event sanitization allow-list (the sanitization algorithm will not look for them when executing).
  • Are not listed as Event Platform streams in InitialiseSettings.php (in mediawiki-config).

I think we can proceed with deletion.


The procedure for deletion should be:

# ssh to a machine with hdfs superuser kerberos credentials
ssh an-launcher1002.eqiad.wmnet

# drop hive tables
sudo -u hdfs hive
drop table event.flowreplies;
drop table event.changeslistfiltergrouping;
drop table event.contenttranslation;
exit;

# drop data
sudo -u hdfs hdfs dfs -rm -r /wmf/data/event/flowreplies
sudo -u hdfs hdfs dfs -rm -r /wmf/data/event/changeslistfiltergrouping
sudo -u hdfs hdfs dfs -rm -r /wmf/data/event/contenttranslation

Can someone review this and confirm, please? @JAllemandou @BTullis :-)
If OK, I will proceed and delete them!

Except for the missing kerberos-run-command, looks good to me :)