Page MenuHomePhabricator

Add user_central_id to mediawiki_history and mediawiki_history_reduced Hive tables
Open, Needs TriagePublic

Description

I'd like to request the addition of user Central ID to mediawiki_history table in Hive. It is not urgent, but I think it would greatly improve work for many users who analyze that table. A couple of other people who frequently use the mediawiki_history table also expressed interest in having this.

Currently, mediawiki_history table has wiki_db and event_user_id that together identify a unique user per wiki database, but there's no way of analyzing users across wiki databases without joining on a Central ID from another table. The table that contains Central ID is centralauth.localuser in MariaDB. Both tables are very large and located separately, so there's no way to easily join that information, other than for small subsets.

For context:

Event Timeline

CC @lbowmaker

Downstream work (AQS & Dumps) could get ignored, but that would limit the value.

It would also be interesting to add this information to MediaWiki EventBus emitted events (if there are no PII concerns).

We could add it to standard performer / user entity modeling?

If the request in T389666 is implemented, it would solve the need for this ticket, at least on my end. It would be a broader solution that can be used with other tables in addition to the mediawiki_history table, so I think it would be better.

Ottomata renamed this task from Add user Central ID to mediawiki_history table in Hive to Add user_central_id to mediawiki_history and mediawiki_history_reduced Hive tables.Oct 2 2025, 7:06 PM