To serve Global Editor Metrics, we need `user_central_id` in Druid mediawiki_history_reduced: {T406263}
As part of {T405039} we are going to incrementally update the Druid [[ https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake/Edits/Mediawiki_history_reduced | mediawiki_history_reduced ]] dataset.
{T365648} will be done for monthly snapshots.
To update this dataset daily, we need an incremental datasource. Our options are:
- `event.mediawiki_page_change_v1`
- `mediawiki_content_history_v1`
{T403664} is done, so we could use `event.mediawiki_page_change_v1`.
However, `mediawiki_content_history_v1` is daily reconciled, so will be more accurate. We'd prefer to use `mediawiki_content_history_v1`.
We could join `mediawiki_content_history_v1` with `event.mediawiki_page_change_v1` to look up the relevant `user_central_id`.
But, it would be much better and less work if `mediawiki_content_history_v1` had `user_central_id`.
This field will be very useful for things other than Global Editor Metrics, so it makes sense to add this field to `mediawiki_content_history_v1`. Along the way, we should also add it to `mediawiki_content_current_v1`.
=== Done is
[] `user_central_id` added to `mediawiki_content_history_v1`, populated ongoing from mediawiki.page_change.v1, but backfilled from either `centralauth_localuser`, or from MariaDB `centralauth.localuser` table
[] `user_central_id` added to `mediawiki_content_current_v1` and backfilled.
===