Page MenuHomePhabricator

Make job to backfill data from mediacounts into mediarequests tables in cassandra so as to have historical mediarequest data
Closed, ResolvedPublic

Description

Backfill data from mediacounts into mediarequests tables in cassandra so as to have historical mediarequest data

This data will not include the referrer dimension, thus we will not know what was the project that did the original request

Event Timeline

Milimetric moved this task from Incoming to Analytics Query Service on the Analytics board.
Nuria renamed this task from Backfill data from mediacounts into mediarequests tables in cassandra so as to have historical mediarequest data to Make job to backfill data from mediacounts into mediarequests tables in cassandra so as to have historical mediarequest data .Oct 30 2019, 4:21 PM

Per the many issues we have seen recently with cassandra not being able to keep up with loading let's stop doing manual loading and let's just do a job that loads about 20 days at a time of the day when cassandra has more resources (so it does not coincide with pageview loading). Assigning to @fdans

Per file failed jobs during backfilling:

  • 17 jul 2015
  • 5 mar 2015
  • 20 jan 2015

Restarting these before rerunning backfilling.

Once these are complete, the backfilled range will be Jan 1st 2015 to Sep 9th 2015

I think we should start backfilling from 2019 backwards so as to have a "continuous" dataset. For a live api that (in theory) can be queries probably that makes more sense than having data ranges in 2015 and 2019 and nothing in between.

I think we should start backfilling from 2019 backwards so as to have a "continuous" dataset. For a live api that (in theory) can be queries probably that makes more sense than having data ranges in 2015 and 2019 and nothing in between.

+1. Let's fix the missing days and start backfilling backward.