Page MenuHomePhabricator

Populate aqs with legacy page-counts
Closed, ResolvedPublic13 Estimated Story Points

Description

To do:

  • Move data on files to hadoop (make sure we have a backup easy to access) /wmf/data/legacy/project-counts
  • Agreggate/transform format
  • Create workflow (no corrdinator needed) to load into aqs so we have a repeteable process
    • Load aqs from hadoop data:
  • Evangelize this is available

Path:
/metrics/legacy/pagecounts

Let's make sure to include mobile pagecounts too

Event Timeline

Likely to involve some hive to transform data format into something that can be loaded on aqs

Create an AQS loading job, should be productionized so we can rerun it easily.

Nuria updated the task description. (Show Details)
Nuria renamed this task from Populate aqs with legacy pageviews on new endpoint to Populate aqs with legacy pageviews.Jan 26 2017, 5:12 PM
Nuria renamed this task from Populate aqs with legacy pageviews to Populate reportcard with legacy pageviews.Jan 26 2017, 5:20 PM
Nuria renamed this task from Populate reportcard with legacy pageviews to Populate aqs with legacy pageviews.
Nuria updated the task description. (Show Details)
Nuria set the point value for this task to 8.
Nuria changed the point value for this task from 8 to 13.Jan 26 2017, 5:26 PM
Nuria updated the task description. (Show Details)

Change 337593 had a related patch set uploaded (by Mforns):
Add spark job to aggregate historical projectviews

https://gerrit.wikimedia.org/r/337593

Change 339421 had a related patch set uploaded (by Mforns):
Add oozie workflow to load projectcounts to AQS

https://gerrit.wikimedia.org/r/339421

Change 337593 merged by jenkins-bot:
[analytics/refinery/source] Add spark job to aggregate historical projectviews

https://gerrit.wikimedia.org/r/337593

Nuria renamed this task from Populate aqs with legacy pageviews to Populate aqs with legacy page-counts.Mar 22 2017, 8:16 PM

Change 344665 had a related patch set uploaded (by Mforns):
[analytics/refinery/source@master] Lowercase domain abbreviations in projectcounts aggregation

https://gerrit.wikimedia.org/r/344665

Change 344665 merged by jenkins-bot:
[analytics/refinery/source@master] Lowercase domain abbreviations in projectcounts aggregation

https://gerrit.wikimedia.org/r/344665

Change 344914 had a related patch set uploaded (by Mforns):
[analytics/refinery@master] Fix domain_abbrev_map job to disambiguate wikimedia projects

https://gerrit.wikimedia.org/r/344914

Change 339421 merged by Mforns:
[analytics/refinery@master] Add oozie workflow to load projectcounts to AQS

https://gerrit.wikimedia.org/r/339421

Change 344914 merged by Mforns:
[analytics/refinery@master] Fix domain_abbrev_map job to disambiguate wikimedia projects

https://gerrit.wikimedia.org/r/344914