Page MenuHomePhabricator

Update Movement Insights tables and movement metrics code to accomodate temporary users
Closed, ResolvedPublic

Description

The rollout of temporary users requires some changes from us. Now that Data Engineering has made the final data changes to adapt to temp accounts (T377293), we can update our pipelines accordingly.

Step 1: update and backfill the Airflow-based editor_month table

  • Make changes to SQL queries to add a new user_is_temporary field (we won't use the modified create query, but Data Engineering says it's standard practice to change it anyway)
  • Once the changes are merged, add a Git tag for the new version of the queries
  • Update the Airflow job to use the new version of the update query
  • Once the job changes are merged, deploy the changes and run a manual alter table to add the new field (make sure it's in the same position as in the update query since I'm pretty sure SQL inserts only follow the position, not the name)
  • Run the backfill script for the months when temporary editors have been deployed at any wiki
  • Check that the data is correct for the backfilled period

(No changes needed for new_editor since NOT event_user_is_created_by_system already excludes temp users.)

Step 2: update the movement metrics code.

  • Update active_editors.sql to use wmf_contributors.editor_month and filter out temp users with the new field rather than the regex workaround (since the two filters are equivalent, this will not require a backfill)
  • Update active_editors_region_type.sql as necessary
  • Update regional_active_editors.sql, re-enable the query, and backfill the missing data
  • Update new_editor_retention.sql to use wmf_contributors.new_editor

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Update queries to accomodate temporary users and perform backfillrepos/movement-insights/movement-metrics!30hghanifix_editors_by_wiki_typemain
Update queries to accomodate temporary users and perform backfillrepos/movement-insights/movement-metrics!29hghanitemporary_user_updatesmain
editor_month: switch to new query for temp usersrepos/data-engineering/airflow-dags!1091nshahquinn-wmfupdate_editor_monthmain
editor_month: switch to new query for temp usersrepos/data-engineering/airflow-dags!1090hghaniupdate_editor_monthmain
Update editor_month to flag temporary usersrepos/movement-insights/sql!7hghanieditor_month_temp_user_updatemain
Customize query in GitLab

Event Timeline

OSefu-WMF moved this task from Incoming to Backlog on the Movement-Insights board.
kostajh added subscribers: Niharika, kostajh.

The deadline is the introduction of temporary accounts on the first pilot wikis. As of August 2024, this is scheduled to happen in January 2025.

We are now looking at October 2024 (cc @Niharika)

The deadline is the introduction of temporary accounts on the first pilot wikis. As of August 2024, this is scheduled to happen in January 2025.

We are now looking at October 2024 (cc @Niharika)

@nshahquinn-wmf in your view, should the work in this task block deployments to pilot wikis?

The deadline is the introduction of temporary accounts on the first pilot wikis. As of August 2024, this is scheduled to happen in January 2025.

We are now looking at October 2024 (cc @Niharika)

@nshahquinn-wmf in your view, should the work in this task block deployments to pilot wikis?

@kostajh I'm just now seeing this while @nshahquinn-wmf is out on sabbatical. No this task and the work is not blocking deployment to pilot wikis. We're confident in a method described here (T374148) to account for the addition of temp account 'users' in our reporting (FYI @Hghani)

The deadline is the introduction of temporary accounts on the first pilot wikis. As of August 2024, this is scheduled to happen in January 2025.

We are now looking at October 2024 (cc @Niharika)

@nshahquinn-wmf in your view, should the work in this task block deployments to pilot wikis?

@kostajh I'm just now seeing this while @nshahquinn-wmf is out on sabbatical. No this task and the work is not blocking deployment to pilot wikis. We're confident in a method described here (T374148) to account for the addition of temp account 'users' in our reporting (FYI @Hghani)

Sounds good, thanks.

nshahquinn-wmf lowered the priority of this task from High to Medium.Feb 5 2025, 2:08 AM
Hghani updated the task description. (Show Details)