Page MenuHomePhabricator

Track page views by page ID rather than title (handles moved pages)
Closed, DeclinedPublic

Description

If a page is moved, the API does not move the data to the new title. The data should be moved the same way as the revision history does.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@MusikAnimal, I think you are wrong. T121912 talkes about redirections as aliases, I'm talking, in the opposite, about deleted pages.

Ideally this will be fixed in the API itself (T121912), but we might offer a solution in Pageviews Analysis (T141332). This is a bit challenging so I can't say if/when we'll have a client side solution. In the meantime you can simply add the old title and compare them side by side, assuming the old title is a redirect. When searching for redirects, make sure you have the "Search method" set to "Autocompletion including redirects" in the "Settings". You could also try Redirect Views.

Again, @MusikAnimal, the old title does not exist any more. As history moves to the new name, so the views statistics should move. I can't check both, because the old one does not exist anywhere.

@IKhitron Sorry I submitted my comment before yours showed up. There are two issues here: The first you're saying is that the stats should move with the page. T121912 should effectively resolve that, even though the old page no longer exists. Assuming you are using Pageviews Analysis, your second problem is something I can help with, though. The API does return any data available, even if the page does not exist anymore. Pageviews Analysis however will error out saying it is unknown, since it also queries for other data about the page, which it is unable to do if it is deleted. Let me create a ticket for that, and we'll leave this one merged to T121912 if that's OK, since the broader problem you are reporting deals with the API itself.

Ah, I'm sorry, I see what you are saying now. T121912 indeed is about redirects, my mistake, but it's the same premise since the common issue people have is resolving them after a page move. I think T141332 is what you want, assuming you are using Pageviews Analysis and not the raw API or PageViewInfo, which were all tagged. Please confirm?

Maybe, @MusikAnimal, but you're still talking about redirections (#9) over there. I'm talking about changing API source. The API, or the PageView, should not be aware about the move. And no performance problem should not be involved. The views statistics should be moved during the page move, same way as editing history does. This must be changed on core engine level.

Well, T141332 is about querying the page move log to get the data for older locations, which should accomplish what you're asking. This is basically a workaround for the issue you describe, though :) Anyway, I'm not sure about moving the data along with the title in the API layer. I can speculate ways that this could be done but the Analytics team will be able to tell you more. I'm re-opening this and removing the unrelated tags. Sorry for the confusion!

MusikAnimal renamed this task from PageViews ignores moved pages to Pageviews API ignores moved pages.Feb 26 2017, 1:49 AM

I will also share my thoughts on this: I assume since the Pageviews API works independently of MediaWiki it is by design going by title and ignoring any of the wiki's logs. The only possible solution I can think of is storing the data by page ID rather than title in the database. So when I request pageviews for a page, it queries the MediaWiki database to find the page ID and returns that data. This means data doesn't have to be updated when the page is moved, only we'd need to query for the ID of the title on every API request. That could have performance implications, and it still closely ties together the Pageviews API and MediaWiki, which I'm not sure is straightforward or perhaps even desirable.

This is better solved by serving metrics by page id instead of title. Once we have the mediawiki edit history data available on an API page id and title history can be tracked and thus pageviews can be computed accounting for moves.

Halfak renamed this task from Pageviews API ignores moved pages to Track page views by page ID rather than title (handles moved pages).Apr 24 2017, 3:28 PM
Halfak added a subscriber: Hall1467.
Milimetric triaged this task as Medium priority.May 8 2017, 2:33 PM

Thanks for the context, @Pine. This issue is in our backlog, meaning it's behind all of our other priorities. It has very little chance of being picked up by itself. We do have the data now to group pageviews by page id, and to figure out all names that a particular page had over its history. So someone could pick this up at a hackathon and make a workable prototype of collecting pageviews by title or someone has to argue that this should be higher priority for our team to pick it up.

Milimetric raised the priority of this task from Medium to High.Jun 5 2020, 3:54 PM
Milimetric moved this task from Backlog (Later) to Analytics Query Service on the Analytics board.

Elevating priority per discussion around T251777#6119752

@EChetty is there an explanation for why this task was declined? As far as I know, it's still not possible to query the pageviews APIs with page IDs but speaking personally at least, that would be very useful functionality if it's feasible to add that support.