Page MenuHomePhabricator

WMF pageview API (404 error) when requesting statitsics over around 1000 files on GLAMorgan
Open, Needs TriagePublic

Description

The API is blocking page view requests on GLAMorgan once they get to over around 1000 files. This means that partner organisations can't see the impact of releasing their content to Wikimedia if they make more than 1000 files available.

Here's an example where I get the error for files from UNESCO

https://tools.wmflabs.org/glamtools/glamorgan.html?&category=Media_files_produced_by_UNESCO&depth=12&month=last

Event Timeline

Restricted Application added a project: Analytics. · View Herald TranscriptSep 9 2016, 1:49 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Sadads added subscribers: MusikAnimal, Sadads.

@MusikAnimal do you have a sense if there is a limit on the API or the computing for this, that would be prohibitive of this kind of opening up of the tool?

Nuria added a subscriber: Nuria.Sep 12 2016, 3:37 PM

@Sadads: I think you need to work on this with @MusikAnimal regarding "number of requests" that the tool can do.

The link you sent requests data for outreach.wikimedia, aproject that s you know is not supported by pageview API yet. Thus, the 404 error code.

Nuria moved this task from Incoming to Radar on the Analytics board.Sep 12 2016, 3:37 PM

@Nuria This isn't primarily an outreach.wikimedia problem: most of the files used for GLAMMorgan end up on other community projects, and the tool is creating a ceiling of some sort on the number of requests it runs. I wonder, perhaps, if it has to do with running individual page stats rather than in batches, etc.

When I spoke to @Magnus before, he said this was the problem causing his tools to return false results

https://www.mediawiki.org/wiki/HyperSwitch/errors/request_rate_exceeded

Nuria added a comment.Sep 16 2016, 2:55 PM

@Sadads:
Throttling requests is not a mistake, is done on purpose as API cannot support arbitrary traffic. In your request above you also are requesting files for which pageviews are not counted. and thus you are receiving 404s. The API stores pageviews for wiki pages but not files.

@Nuria Is there a way round this throttling specifically for these tools? I use these tools as part of my WMF grant funded project and to report to WMF on what I've done.

Hey, I didn't author this tool (not sure if that's why I was pinged). When I tried it out the only failed requests were for outreach wikis, Wikimania, etc, which are not supported by the API (see T130249). If throttling is an issue here, I'm happy to share with @Magnus some solutions I used for Pageviews. Regards

Nuria added a comment.Sep 17 2016, 3:51 AM

@Mrjohncummings Again, the requests that are erroing with 404s are failing because the pageview API does not hold data for those wikis (example: outreach).

@Nuria Does this error include Wikidata? I ran a larger requests with 16,000 files and I'm getting a very large number of pages that could not be loaded due to the error (7953)

https://tools.wmflabs.org/glamtools/glamorgan.html?&category=Media_files_produced_by_UNESCO&depth=12&month=last

Nuria added a comment.Sep 19 2016, 4:24 PM

@Mrjohncummings: we are talking pass each other,. Let me re-explain: the tool is running into several issues:

  1. requests data for wikis that are not supported by pageview api (outreach)
  1. it might be sending too many requests to pageview api and thus running into limitations that pageview Api has for clients.

These issues is something you need to work on with the tool owner, some projects we have upcoming might fix 1) and mitigate 2) but neither are happening until some weeks from now.