Page MenuHomePhabricator

CentralNotice: Consolidate typical analytics queries (for example, for impression rates) into a library
Closed, ResolvedPublic4 Story Points

Tags
Assigned To
Authored By
AndyRussG, Oct 24 2017

Description

We've been doing copypasta for this for too long... A bit of work on this will go a long way towards ease and reliability of many future queries.

Now that we have Druid, many types of queries can run much faster, too.

A small library like this would also help create a feed of data for monitoring and visualization tools (see T124132).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 24 2017, 4:52 PM

Here's the sort of situation where a library for this will come in handy: T177653

Here it is (temporary GitHub repo).

AndyRussG changed the point value for this task from 2 to 4.Jan 10 2018, 6:09 PM

Out of interest, where did this task originate? Between turnilo (pivot's successor), Superset and the usual Hive queries it seems like we are pretty covered for the time being in terms of our ability to review impression data. Concerned that this is coming at the expense of Clone Campaign (which seems to have stalled) and therefore preventing working on campaign fallback which with a switch to bundle style campaigns, is now becoming more mission critical a requirement.

Ejegg added a subscriber: Ejegg.Dec 13 2018, 1:07 AM

The code looks internally consistent and well designed, definitely would give it a +1. I'm still getting up to speed on the pydruid and matplotlib libraries, so I can't comment on the usage of those libs yet.

Ejegg added a comment.Dec 14 2018, 4:30 PM

OK, this all looks great! And that pandas library is pretty impressive too. Nice use of the left join / merge in the RatesQuery. Let's request a wmf-hosted repo!

New Gerrit repo is here!

AndyRussG closed this task as Resolved.Jan 22 2019, 9:49 PM