Page MenuHomePhabricator

Massviews: bug with pagepile adding extra Wikipedia: on GLAM cultural partnership project pages
Open, NormalPublic2 Story Points

Description

Massviews is throwing errors on pagepile 3058:

https://tools.wmflabs.org/massviews/?platform=all-access&agent=user&source=pagepile&target=3058&range=latest-20&sort=views&direction=1

It says Error querying Pageviews API - Not found.

The problem may be that it's adding an extra Wikipedia: to the start of the article page.

Example:

If you click on the link in Massviews for
https://en.wikipedia.org/wiki/Wikipedia:Culture/New_York_Public_Library
it gives you
https://en.wikipedia.org/wiki/Wikipedia:Wikipedia:Culture/New_York_Public_Library

Event Timeline

DannyH created this task.May 16 2016, 7:49 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 16 2016, 7:49 PM
MusikAnimal added a comment.EditedMay 16 2016, 8:10 PM

I noticed this too. Indeed it's the Project: prefix we get from Page Pile that's evaluating to an additional Wikipedia: namespace prefix. I think this is an issue on the Page Pile side, or the users who are entering the pages, but we should be able to work around it either way.

My assumption is Project: gets evaluated to whatever the project namespace is for a given wiki. For enwiki we can simply check if Wikipedia: is already there, and if so remove Project:, but for other wikis we may not know what to remove, if that makes sense. If we need to we can make an initial query to the "siteinfo" API endpoint to get the name of the project namespace.

kaldari added a subscriber: Magnus.EditedMay 17 2016, 5:42 PM

@Magnus: Is this something that you could fix on your end?

@Magnus any thoughts on this?

Looks like it's not just Project: but other namespaces as well:
http://tools.wmflabs.org/pagepile/api.php?action=get_data&id=3052

Sadads awarded a token.Jul 5 2016, 8:48 PM
Sadads added a project: GLAM-Tech.

I'm going to attempt to fix this in Massviews. It's a bit nasty, but we can use the API to determine what Project: will evaluate to, then chop off any duplicates. E.g. Project:Wikipedia:GLAM will evaluate to Wikipedia:Wikipedia:GLAM, so we know to remove the duplicate Wikipedia:. Meanwhile it's possible people will use Project: without adding in Wikipedia: in which case the evaluated Wikipedia: would be retained. I think that makes sense. This solution would not cause anything to break should Magnus fix it on his end.

Sounds great! Thats a bit hacky, but it would make a huge difference for
those of us working in metaspaces

MusikAnimal set the point value for this task to 2.

I have fixed this for the English Wikipedia only. It's actually a fair amount of work to query the siteinfo API and get the namespace, etc, and this issue is mostly an edge case. Leaving this open for now because the issue still exists for other wikis; I just don't think we should fix it on our end.

I'm actually now routinely fetching siteinfo for other features in Massviews, so it's possible to implement a cross-wiki solution. However it is still a bit of work and as I said this should be fixed in PagePile.

@Sadads Could we confirm this workaround will suffice?

  • Create a page on your wiki (sandbox, for instance)
  • Add a list of wikilinks to the pages you want pageviews data on, in any format
  • Use the "Wikilinks" source in Massviews and put in the URL to the wiki page

Or were you all by chance using another tool to generate the list of pages? If so, the above solution may not be so favourable given you'd have to manually remove the extraneous namespaces... let me know and I will reevaluate whether adding a cross-wiki hack is worthwhile. Note also you could use a text editor to find and remove all instances of Project:

Yeah, I think I can hack around this for the time being: use the pagepile
to create a report.

Cheers,

Alex

T156993 may be interesting in this context.