These are 'statistics-privatedata-users' and 'analytics-privatedata-users'. 'statistics-users' and 'analytics-users' also exist. I'm not convinced that these non-privatedata ones don't have access to data that is private, as statistics-users grants login on statistics cruncher boxes (currently stat1003), where other groups (statistics-privatedata-users, statistics-admins, researchers) may have left private data open to reading by these users (e.g. readable to default group wikidev, or world-readable).
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T142815 Enhance account handling (meta bug) | |||
| Declined | None | T149222 Reconsider/check naming of 'privatedata' shell groups compared to their theoretically non-sensitive counterparts |
Event Timeline
The '*private*' user groups here grant access to stat1002. Historically, stat1002 was used to host local file based webrequest logs (and some other data), so getting access to it was the only way to access that data. stat1003 has never hosted private webrequest logs, but some user accounts there there (particularly the ones in the 'researchers' group) do have access to MySQL analytics slaves, which may themselves contain some private data. So, you are right that the names may not be 100% accurate. They are named such for historical reasons, and mostly indicate an ill defined and convoluted hierarchy of privileged access:
- analytics-privatedata-users > analytics-users (file access in Hadoop)
- statistics-privatedata-users > statistics-users (stat1002 shell + analytics slaves vs just stat1003 shell)
- researchers > statistics-users (MySQL analytics slaves vs just stat1003 shell access)
Changing these group names now would be a huge headache. Our posix group based shell access isn't really enough to cover our use cases, so no matter what we do, we'd end up with a lot of confusion. We'd really need a better and finer grained user permissions system. Perhaps LDAP based access in prod would help, but I don't think making that happen is a priority in ops at all.