Page MenuHomePhabricator

Add ability to query for legacy pageviews for projects
Closed, ResolvedPublic

Description

For this we will use the upcoming API that will serve legacy pageviews data (pre-July 2015). In the interface we need to somehow indicate this data is based on a different pageviews definition and shouldn't be directly compared to the new data.

Event Timeline

MusikAnimal renamed this task from Query stats.grok.se for data older than July 2015 to Add ability to query for legacy pageviews.Mar 8 2017, 8:00 PM
MusikAnimal updated the task description. (Show Details)
Nuria renamed this task from Add ability to query for legacy pageviews to Add ability to query for legacy pageviews for projects .EditedMar 16 2017, 4:25 PM
Nuria subscribed.

We will be modifying pageviews.js https://www.npmjs.com/package/pageviews to add ability to query new AQS endpoint. see https://phabricator.wikimedia.org/T160655

Change 345197 had a related patch set uploaded (by Nuria):
[analytics/dashiki@master] Bump up pageviews.js to version that supports pagecounts

https://gerrit.wikimedia.org/r/345197

Change 345197 abandoned by Nuria:
Bump up pageviews.js to version that supports pagecounts

Reason:
not needed

https://gerrit.wikimedia.org/r/345197

@Nuria Sorry for the confusion... this ticket was for me to add support in Pageviews Analysis for the new endpoint (which does not use pageviews.js). I don't mind sharing the same ticket, but don't close it until my work is done :) I could stop adding Pageviews-API tickets as parent tasks, which I think is what's causing the confusion, but it's nice for followers of this ticket to get notified when the requisite work is done. Any advice is most welcomed, I don't mean to disrupt your workflow.

Or actually, maybe you just thought I was using pageviews.js, in which case your updates here are very helpful. If that is case let me simply say thank you :) I should probably be using pageviews.js anyway, the nifty helpers would take a lot of cruft out of my code.

@MusikAnimal I see, I agree with not adding Pageview_API tag to tags that identify your backlog, otherwise it is going to be confusing. That is the tag we use for features to be added to pageview API and it pings analytics by default.

And, +1000 to use pageview.js client.

@MusikAnimal I see, I agree with not adding Pageview_API tag to tags that identify your backlog, otherwise it is going to be confusing. That is the tag we use for features to be added to pageview API and it pings analytics by default.

Well, I'm not using the tag, rather I'm setting your tickets with the Pageviews-API tag (or related tags) as parent tasks of mine. That means it shows up on your Task Graph which indeed must be confusing for you all. I'm actually subscribed to your tickets too, along with pretty much every other Pageviews-related tickets you guys create :) This means I will at least get notified when your work is done, which is the important part, so moving forward I won't add any Analytics-related tickets as related tasks. Cheers and thanks for all you do! Very excited to work with the legacy data.

@MusikAnimal we still to triple vet data and fix docs, we will send an announcement when all this work is done.

@MusikAnimal we still to triple vet data and fix docs, we will send an announcement when all this work is done.

https://lists.wikimedia.org/pipermail/wikitech-l/2017-April/087936.html ?

I have a branch cut to add this to Siteviews, but there are still some major challenges. This is because people want the pagecounts alongside the pageviews.

  • For older data, some options don't apply, such as "metric" (pageviews, unique devices) or "platform" (mobile app), etc.
    • How do make this clear to the user without overcomplicating the UI?
    • Maybe allow them to attempt to change the options, but if they are not applicable for the selected date range, just show a message and revert the options back to the last valid value?
    • Should we disable or hide those options altogether for date ranges where they don't apply? Won't they be confused why they aren't there for certain date ranges?
    • What if they are overlapping old data with new data, we let them show "mobile app" views for the new data, but show "mobile web" for the older?
  • The All time date range option currently gives you ~630 days of data. With the legacy data included, we're bringing that up to ~3,400 days of data
    • This obviously is too much to show on a chart.
    • Do we default to monthly date type? If so that gives us January 2008 to the previous month, which means it won't truly be "all time" since we're missing the partial data for December 2007 and that for the current month.
  • How do we convey that the old data shouldn't be taken to heart when comparing side by side with the new data?

If you have any ideas please share them, but I'm starting to realize maybe this just isn't doable. Instead, we can introduce "Pagecounts" as a new new "Metric" (just like Analytics did with the API). Then the applicable options and date range restrictions are shown. That seems to be the easiest and most sensible way to do it... not sure why I didn't think of that...? People will not be perfectly happy, since they want to see everything on one chart, but they'll just have to live with it.

I have a branch cut to add this to Siteviews, but there are still some major challenges. This is because people want the pagecounts alongside the pageviews.

I caution against doing that, there are huge differences among those two metrics, also breakdowns do not match at all, as you have noticed.

How do make this clear to the user without overcomplicating the UI?

I think you need a designer to work on this issue, it is not trivial but mixing both metrics will cause major confusion, specially for breakdowns.

Instead, we can introduce "Pagecounts" as a new new "Metric" (just like Analytics did with the API)

+1 , it IS a different metric.

MusikAnimal claimed this task.
MusikAnimal moved this task from In Development to Done on the Tool-Pageviews board.

Deployed! Per above pagecounts are available as a separate metric, rather than placing them side by side with the modern pageviews. Hopefully this won't be too confusing for people. I've added a little ? link next to "Metric" that goes to the FAQ.

Example: http://tools.wmflabs.org/siteviews/?platform=all-sites&source=pagecounts&start=2008-01&end=2016-07&sites=fr.wikipedia.org|de.wikipedia.org

I will create tickets for adding pagecounts to the other tools once the API endpoints for those are available.

As always, thanks a bajillion to the Analytics team for being so awesome!

Super thanks for the fast turnarround @MusikAnimal