Page MenuHomePhabricator

[Update Pipeline] edit_hourly
Closed, ResolvedPublic5 Estimated Story Points

Description

  • Dependencies
  • HQL logic that needs to change
    • Since this logic just copies is_anonymous from its source table, MediaWiki History, our task here is probably to copy the two new fields: is_temporary and is_permanent as well. This means everything downstream will change.
  • HQL table creation scripts that need to change
    • Just the edit_hourly table here, rest in downstream section below
  • Deployment plan script
    • <<plan steps>>
  • Airflow DAG that schedules the HQL logic

Testing notes

List of affected downstream pipelines that we discover

Vetting notes

  • run old data through new code and comparing yields identical results to old job results
  • run new data through new code yields expected results. A list of wikis where temp accounts is deployed should be leveraged here.
    • check logged-in users
    • check temp accounts
    • check anonymous accounts

Once vetting is complete, deploy according to deployment script in related task.

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
T377767 Create Edit Hourly New DAGrepos/data-engineering/airflow-dags!901jebeT377767-Create-Test-Dag-for-Edit-Hourlymain
Customize query in GitLab

Event Timeline

Change #1084873 had a related patch set uploaded (by Jennifer Ebe; author: Jennifer Ebe):

[analytics/refinery@master] Add New Edit Hourly HQL for Temp Account Change

https://gerrit.wikimedia.org/r/1084873

@JEbe-WMF Hi! I saw you created the patch with a new query and a new DAG, as if we were going for running a track of pipelines parallel to production.
I thought that strategy was discarded, and that we would fo for regular modifications to existing queries and DAGs, but maybe I misunderstood it...? 🙏

+1 to keeping changes in the same files, we discussed the parallel track so I'm sorry if that was confusing. We decided to keep it simple as testing was going to be involved no matter how we went about it.

Milimetric set the point value for this task to 5.

Change #1084873 merged by Snwachukwu:

[analytics/refinery@master] Add New Edit Hourly HQL for Temp Account Change

https://gerrit.wikimedia.org/r/1084873

Mass-closing lingering open tasks which have been for months in the "Done" column on DPE Temporary Accounts sprint 1. Please set task status to "resolved" once a task is done.