Page MenuHomePhabricator

Move wmf_dumps.wikitext_rc1 to the correct HDFS directory
Closed, ResolvedPublic3 Estimated Story Points

Description

Just realized that wmf_dumps.wikitext_rc1 lives in the wrong HDFS directory:

$ hdfs dfs -ls /user/hive/warehouse/wmf_dumps.db
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Found 1 items
drwxrwx---   - analytics analytics-privatedata-users          0 2023-08-24 16:28 /user/hive/warehouse/wmf_dumps.db/wikitext_raw_rc1

While most of the production tables typically live under /wmf/data. We had done it correctly for wmf_dumps.wikitext_rc0:

$ hdfs dfs -ls /wmf/data/wmf_dumps
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Found 1 items
drwxrwxr-x   - analytics analytics-privatedata-users          0 2023-07-11 19:50 /wmf/data/wmf_dumps/wikitext_raw_rc0

In this ticket we should:

  • Iceberg unfortunately persist fully qualified filenames in its metadata. Figure if the community has developed an utility to mv tables.
  • If so, then use the utility and mv the table to the proper place.
  • If not, then let's copy its content over? Or declare a new wmf_dumps.wikitext_rc2 release candidate in the proper place?

Event Timeline

Moving is fine, let's not make a new RC until we have a new schema

xcollazo set the point value for this task to 3.Sep 19 2023, 1:45 PM

We need a new schema for T340863, so to solve this, I will make sure that that new table gets created in the proper LOCATION.

Will leave this ticket one open for now so that we don't forget to clean up by deleting the wmf_dumps.wikitext_rc1 table.

Ran the following in Hive as analytics user:

$ hostname -f
an-launcher1002.eqiad.wmnet
$ sudo -u analytics hive
hive (wmf_dumps)> DROP TABLE wmf_dumps.wikitext_raw_rc0;
OK
Time taken: 0.188 seconds
hive (wmf_dumps)> DROP TABLE wmf_dumps.wikitext_raw_rc1;
OK
Time taken: 0.129 seconds

Then to delete files for wikitext_raw_rc0:

$ sudo -u analytics hdfs dfs -ls /wmf/data/wmf_dumps
Found 2 items
drwxrwxr-x   - analytics analytics-privatedata-users          0 2023-07-11 19:50 /wmf/data/wmf_dumps/wikitext_raw_rc0
drwxrwxr-x   - analytics analytics-privatedata-users          0 2023-10-24 17:10 /wmf/data/wmf_dumps/wikitext_raw_rc2
$ sudo -u analytics hdfs dfs -rm -r -skipTrash /wmf/data/wmf_dumps/wikitext_raw_rc0
Deleted /wmf/data/wmf_dumps/wikitext_raw_rc0

And for for wikitext_raw_rc1:

$ sudo -u analytics hdfs dfs -ls /user/hive/warehouse/wmf_dumps.db
Found 1 items
drwxrwx---   - analytics analytics-privatedata-users          0 2023-08-24 16:28 /user/hive/warehouse/wmf_dumps.db/wikitext_raw_rc1
$ sudo -u analytics hdfs dfs -rm -r -skipTrash /user/hive/warehouse/wmf_dumps.db/wikitext_raw_rc1
Deleted /user/hive/warehouse/wmf_dumps.db/wikitext_raw_rc1