It has [recently been agreed](https://wikimedia.slack.com/archives/CSV483812/p1711748851120569?thread_ts=1711584717.581009&cid=CSV483812) [WMF internal link] that the `wmf_*` Data Lake databases will hold not only Data Platform Engineering–owned tables, but also "production-grade" tables owned by other teams.
However, this raises the question of where the data for those tables should be stored in HDFS. So far, there has been a tendency to place the datasome data has been placed in `/wmf/data` within directories corresponding to the database (e.g. `/wmf/data/wmf_readership`) while others have been placed within `/wmf/data` within directories corresponding to the database (eowning team (e.g. `/wmf/data/research`). `/wmf/data/wmf_readership`) but tThere is no explicit guideline to do this.
Beyond the question of organization, permissions are an obstacle. Currently, most of these directories can only be written by the `analytics` system user, so if it is agreed that non-DPE owned data belongs here, we need to decide how the non-DPE users will get their tables created.
The simplest answer (which seems good to me)In my opinion, the simple and sensible answer is just to say:
1. yes, data for "production-grade" tables belongs in the directory `/wmf/data/{{database}}`
2. if you need a table created but don't have permissions, just ask someone from DPE to do it (this will not happen frequently, so the burden should be minimal).
An alternative (pointed out by @fkaelin) is to place data in directories corresponding to the owning team (while maintaining the team-agnostic organization of tables). This would make it very easy to set appropriate permissions. This system would go out of date eventually as team structures and ownership changed, but it would be possible (if somewhat complex) to move the underlying data accordingly with minimal impact on users.