The rollout of temporary users requires some changes from us. Now that Data Engineering has made the final data changes to adapt to temp accounts (T377293), we can update our pipelines accordingly.
Step 1: update and backfill the Airflow-based editor_month table
- Make changes to SQL queries to add a new user_is_temporary field (we won't use the modified create query, but Data Engineering says it's standard practice to change it anyway)
- Once the changes are merged, add a Git tag for the new version of the queries
- Update the Airflow job to use the new version of the update query
- Once the job changes are merged, deploy the changes and run a manual alter table to add the new field (make sure it's in the same position as in the update query since I'm pretty sure SQL inserts only follow the position, not the name)
- Run the backfill script for the months when temporary editors have been deployed at any wiki
- Check that the data is correct for the backfilled period
(No changes needed for new_editor since NOT event_user_is_created_by_system already excludes temp users.)
Step 2: update the movement metrics code.
- Update active_editors.sql to use wmf_contributors.editor_month and filter out temp users with the new field rather than the regex workaround (since the two filters are equivalent, this will not require a backfill)
- Update active_editors_region_type.sql as necessary
- Update regional_active_editors.sql, re-enable the query, and backfill the missing data
- Update new_editor_retention.sql to use wmf_contributors.new_editor