Page MenuHomePhabricator

[reportupdater] consider not requiring date as a first colum of query/script results
Closed, DeclinedPublic

Description

Now, reportupdater queries and scripts are required to return a first column with the date of the results.
This is a leftover from generate.py, but today reportupdater doesn't look at that column any more.
Queries and scripts return data *for a single data point*. Hence they do not need to return the date, because RU already knows it.

However, removing that means all queries and scripts that currently use RU will need changes...
Not sure how to do the transition. Maybe make RU recognize automatically that a date is there and ignore it?

Related Objects

Event Timeline

mforns renamed this task from [reportupdater] consider not requiring date as a first colum of wuery/script results to [reportupdater] consider not requiring date as a first colum of query/script results.Apr 26 2018, 3:17 PM
mforns added a parent task: T193167: reportupdater TLC.
fdans triaged this task as Unbreak Now! priority.Apr 26 2018, 4:31 PM
fdans moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
fdans lowered the priority of this task from Unbreak Now! to Needs Triage.Apr 26 2018, 4:31 PM
mforns triaged this task as Unbreak Now! priority.Apr 26 2018, 4:32 PM
mforns lowered the priority of this task from Unbreak Now! to Needs Triage.Apr 26 2018, 4:34 PM
mforns triaged this task as Medium priority.Apr 30 2018, 4:50 PM
mforns raised the priority of this task from Medium to Needs Triage.Mar 25 2019, 5:33 PM
mforns triaged this task as Medium priority.
mforns lowered the priority of this task from Medium to Low.Dec 9 2019, 5:47 PM

Since we have control over all jobs using this tool, I think we can move quickly with the migration. It's still nice to include a soft cutover, in case of rollback, etc. My thought is:

  • Phase 1: RU checks whether the first output column header is date. If so, logic is enabled to strip the value during import.
  • Phase 2: Once RU with soft migration has been deployed for a few days without errors, then we remove the date column from all reports.
  • Phase 3: At our leisure, remove the soft migration code from RU.

Another detail to mention: the output writer currently includes a date column, and I believe that removing it would cause the header change detection to invalidate old results. Maybe we add a date column back into the results before writing?

Change 667159 had a related patch set uploaded (by Awight; owner: Awight):
[analytics/reportupdater@master] [WIP] The date column should be optional

https://gerrit.wikimedia.org/r/667159