Page MenuHomePhabricator

Move wmf_dumps.wikitext_rc1 to the correct HDFS directory
Open, Needs TriagePublic3 Estimated Story Points

Description

Just realized that wmf_dumps.wikitext_rc1 lives in the wrong HDFS directory:

$ hdfs dfs -ls /user/hive/warehouse/wmf_dumps.db
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Found 1 items
drwxrwx---   - analytics analytics-privatedata-users          0 2023-08-24 16:28 /user/hive/warehouse/wmf_dumps.db/wikitext_raw_rc1

While most of the production tables typically live under /wmf/data. We had done it correctly for wmf_dumps.wikitext_rc0:

$ hdfs dfs -ls /wmf/data/wmf_dumps
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Found 1 items
drwxrwxr-x   - analytics analytics-privatedata-users          0 2023-07-11 19:50 /wmf/data/wmf_dumps/wikitext_raw_rc0

In this ticket we should:

  • Iceberg unfortunately persist fully qualified filenames in its metadata. Figure if the community has developed an utility to mv tables.
  • If so, then use the utility and mv the table to the proper place.
  • If not, then let's copy its content over? Or declare a new wmf_dumps.wikitext_rc2 release candidate in the proper place?