Page MenuHomePhabricator

Add option to pageviews tool to show stats for all redirects to a page as well
Closed, DuplicatePublic8 Estimated Story Points

Description

See https://github.com/MusikAnimal/pageviews/issues/9.

There is an API we can use for retrieving all the redirects: https://www.mediawiki.org/wiki/API:Redirects.

Event Timeline

kaldari moved this task from New & TBD Tickets to Needs Discussion on the Community-Tech board.
DannyH set the point value for this task to 8.Mar 1 2016, 6:41 PM
DannyH moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.

I'll add my thoughts here as well: There will have to be some performance checks in place as this could be expensive.

Let's assume a page has 50 redirects, which I believe is rare. This is fine when querying for a single page, but we currently support up to 10 at a time. That potentially 500 unthrottled requests to pageviews API. The API warns not to make more than 500 requests per second, but that's quite liberal in my opinion and we could limit it to less than that.

I recommend we implement a general solution to request throttling – not just when the user has chosen to include redirects. The updateChart function should be broken out into other functions, where we could use a combination of JavaScript promises and recursion with setTimeout to intelligently throttle requests.

This specific task requires a change to the view (probably just a checkbox or dropdown floated right of the Pages label), so T128103 should be considered a blocker, as that branch includes a complete PHP reworking of the views.

@MusikAnimal: I finally wrote up some code to handle this, but I'm having second thoughts about it. Even for the simple test case of Cat and Dog, it has to fire off 99 API requests! This just isn't a sane (or performant) approach and I think we need to request that the pageviews API support handling redirects on its end, rather than spamming it with hundreds of separate requests that have to be throttled.

@kaldari Wow! That is a hefty operation for a simple two page query. I imagine users are going to turn on the redirects option and leave it on, then of course play around with date ranges, platforms, etc. We'll be wearing the API out for sure.

I'm all for at least asking the Analytics team (or services team?) to give it some consideration. It is a fine feature for data analysis so seemingly in the best interest of the analytics team and not just our users.

I also wonder if when you get redirected to another page, the view count of the target page is also incremented? Let's ask that question too. I bet it is, and if so we can explain that in our FAQ. That still doesn't address the issue of wanting to know the view count of a page at it's old location(s), as opposed to a redirect that was always a redirect. For that, see #26. We could query the page move log, and collect pageview data for each date range the page was under each respective title. This will likely be just as complicated to implement, but it should involve much lower API consumption.

@MusikAnimal: From T121912, it sounds like it currently only counts the view for the redirect page itself, not the target.

For motivation, see also https://mako.cc/academic/hill_shaw-consider_the_redirect.pdf (the last two pages, demonstrating how ignoring pageviews to redirects has distorted some previous research).

We could still go about this, but we'd have to throttle requests like we do for Langviews and Massviews. I don't want to introduce this into the main Pageviews app, and yet another separate app (Redirectviews?) seems like overkill. As @kaldari said the issue is best addressed on the API-side and not on the client.

But, if there's enough demand for it, we could look into some interface changes for Pageviews, such that the "throttling" and progress bar UI stuff only happens if the redirects option is checked. What I fear is users will always keep that option on, which is going to deteriorate the user experience and could put an unnecessary load on the API. Perhaps we could force the user to check the redirects option every time, that way it hopefully would only be used when truly needed.

I still favor fixing this on the API side (even if that takes longer).

Closing as declined since we've collectively decided querying for all redirects isn't going to work for us, especially when the API might support this with T121912

Instead I'm going to try to work on T141332 which will query the move log of a given page and include pageviews from redirects created as a result of the move. This seems to be the most common use case: A user moves a page, checks Pageviews Analysis, and sees that when given the new title pageviews before the time of the move are not shown.