Page MenuHomePhabricator

Confirm whether or not the current definition of new active editors are based on global registration and update data glossary
Open, HighPublic

Description

Our data glossary is not clear whether new active editors (see documentation on Data Hub) are defined by registration date GLOBALLY or within a specific project. This task is to clarify the current definition based on our code and make sure this is more explicit in the data glossary.

GDI has related requests & work:
https://phabricator.wikimedia.org/T310224
https://phabricator.wikimedia.org/T304995

As we begin to slice data further, we want to make sure we are maintaining consistency across definitions or specifying differences in terms.

Event Timeline

kzimmerman created this task.

Created a related task for the New Editors table which is used to calculate new editor retention and added New editors table related comments there: https://phabricator.wikimedia.org/T313622

Confirming, active_editors pulls user_name and the user name's first registration_date min(event_user_creation_timestamp) from Mediawiki_history and is thus a global measure and not a local per wiki measure. Another method is the user_first_edit_timestamp from the Mediawiki_user_history table or to add in a filter by start_timestamp is null from the Mediawiki_user_history table which @Milimetric says should be the first record in every user's history on each wiki. The latter seems unnecessary here. The former could be useful (see also T206803 and others) and is being tested by GDI.

More:
Active editors (Active editors update code), which includes new active editors, pulls data from editor_month (Editor month update code).

Editor month pulls, among a number of data, global user_name, the first user_registration date, local wiki_db, and local user_id:
max(event_user_text) AS user_name, -- Some rows incorrectly have a null event_user_text (T218463)
min(event_user_creation_timestamp) AS user_registration
wiki_db AS wiki,
event_user_id AS user_id
and it groups by month, wiki_db, event_user_id

Active editors pulls month, user_name, content_edits sum, bot_by_group max, and user_registration from the editor_month table
and groups by month, user_name

There are a few other active editor supplementary queries which output data by different parameters.

mpopov moved this task from Doing to Needs Investigation on the Product-Analytics (Kanban) board.
mpopov removed a project: User-Iflorez.
mpopov added a subscriber: Iflorez.
Iflorez updated the task description. (Show Details)
Iflorez added a subscriber: mpopov.