Here is an hdfs listing of files in a partition created by Oozie:
aqu@stat1004:~$ hdfs dfs -ls -h /wmf/data/wmf/wikidata/entity/snapshot=2022-02-14 | head Found 513 items -rw-r----- 3 analytics analytics-privatedata-users 0 2022-02-25 01:58 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-14/_PARTITIONED -rwxr-x--- 3 analytics analytics-privatedata-users 263.0 M 2022-02-25 01:55 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-14/part-00000-4b8ace4f-d908-4272-8dfe-7bda5ee03198.c000 -rwxr-x--- 3 analytics analytics-privatedata-users 266.1 M 2022-02-25 01:55 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-14/part-00001-4b8ace4f-d908-4272-8dfe-7bda5ee03198.c000 -rwxr-x--- 3 analytics analytics-privatedata-users 265.6 M 2022-02-25 01:55 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-14/part-00002-4b8ace4f-d908-4272-8dfe-7bda5ee03198.c000 -rwxr-x--- 3 analytics analytics-privatedata-users 263.9 M 2022-02-25 01:55 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-14/part-00003-4b8ace4f-d908-4272-8dfe-7bda5ee03198.c000 -rwxr-x--- 3 analytics analytics-privatedata-users 264.6 M 2022-02-25 01:54 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-14/part-00004-4b8ace4f-d908-4272-8dfe-7bda5ee03198.c000 -rwxr-x--- 3 analytics analytics-privatedata-users 265.2 M 2022-02-25 01:54 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-14/part-00005-4b8ace4f-d908-4272-8dfe-7bda5ee03198.c000 -rwxr-x--- 3 analytics analytics-privatedata-users 264.6 M 2022-02-25 01:54 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-14/part-00006-4b8ace4f-d908-4272-8dfe-7bda5ee03198.c000 -rwxr-x--- 3 analytics analytics-privatedata-users 264.3 M 2022-02-25 01:54 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-14/part-00007-4b8ace4f-d908-4272-8dfe-7bda5ee03198.c000
Here are some results with an Airflow-triggered spark-jobs:
aqu@stat1004:~$ hdfs dfs -ls -h /wmf/data/wmf/wikidata/entity/snapshot=2022-02-21 | head Found 512 items -rw-r----- 3 analytics hdfs 269.6 M 2022-03-07 11:00 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-21/part-00000-173e32e5-81fe-4374-83ec-380bcb12d107.c000 -rw-r----- 3 analytics hdfs 269.4 M 2022-03-07 11:00 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-21/part-00001-173e32e5-81fe-4374-83ec-380bcb12d107.c000 -rw-r----- 3 analytics hdfs 268.9 M 2022-03-07 11:00 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-21/part-00002-173e32e5-81fe-4374-83ec-380bcb12d107.c000 -rw-r----- 3 analytics hdfs 269.4 M 2022-03-07 11:00 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-21/part-00003-173e32e5-81fe-4374-83ec-380bcb12d107.c000 -rw-r----- 3 analytics hdfs 268.8 M 2022-03-07 11:00 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-21/part-00004-173e32e5-81fe-4374-83ec-380bcb12d107.c000 -rw-r----- 3 analytics hdfs 270.5 M 2022-03-07 11:00 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-21/part-00005-173e32e5-81fe-4374-83ec-380bcb12d107.c000 -rw-r----- 3 analytics hdfs 269.4 M 2022-03-07 11:00 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-21/part-00006-173e32e5-81fe-4374-83ec-380bcb12d107.c000 -rw-r----- 3 analytics hdfs 269.4 M 2022-03-07 11:00 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-21/part-00007-173e32e5-81fe-4374-83ec-380bcb12d107.c000 -rw-r----- 3 analytics hdfs 269.8 M 2022-03-07 11:00 /wmf/data/wmf/wikidata/entity/snapshot=2022-02-21/part-00008-173e32e5-81fe-4374-83ec-380bcb12d107.c000
You can see that the hdfs user-group is now hdfs in place of analytics-privatedata-users. I think the consequence is a lack of access to those files for the wmf-internal-users.
Slack original thread: https://wikimedia.slack.com/archives/C02291Z9YQY/p1646402078889739