Page MenuHomePhabricator

Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API
Closed, ResolvedPublic3 Estimated Story Points

Description

There's a non existing article among the most viewed articles of the Basque language wikipedia since the 8th of October.

The article in question is "Teratologia": https://eu.wikipedia.org/wiki/Teratologia

It's one of the most viewed articles according to the API but I think that it's quite strange and probably it's a bug.
https://wikimedia.org/api/rest_v1/metrics/pageviews/top/eu.wikipedia/all-access/2016/10/07

I found this because I created a website that uses this API's data. It's in Basque but you can check the evolution of the page views of the "Teratologia" article in the graphic:
http://wikidosia.aldatsa.eus/joerak?artikuluak=Teratologia

teratologia-page-views-evolution.png (464×1 px, 33 KB)

It has zero page views until the 8th of October as expected for a non existing article but then there's a high number of page views. Isn't it strange?

Event Timeline

Asier_Iturralde_Sarasola renamed this task from Non existing article in the data returned by the /metrics/pageviews/top/ API to Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API.Oct 26 2016, 11:01 AM

This issue is caused by usual hidden bot traffic. I link this task as a subtask of T138207.

Hive query:

SELECT
  user_agent_map,
  SUM(view_count)
FROM wmf.pageview_hourly
WHERE year = 2016
  AND month = 10
  AND project = 'eu.wikipedia'
  AND page_title = 'Teratologia'""").registerTempTable("eu")
GROUP BY
user_agent_map;

Result: [Windows XP,-,-,6,IE,Other,-,11307]

JAllemandou moved this task from Backlog (Later) to Wikistats on the Analytics board.
JAllemandou subscribed.

In this case i think mediawiki might be retuning 200 to a non existing page, correct? For us to be counting these as pageviews

Milimetric triaged this task as Medium priority.May 8 2017, 2:28 PM
Nuria moved this task from Next Up to In Progress on the Analytics-Kanban board.
Nuria unsubscribed.
Nuria set the point value for this task to 3.Jul 11 2017, 6:42 PM

I am sorry we didi not looked at this earlier. Page got created December 7th, so pageviews before then are likely from edit/preview requests (we had a bug that counted those). See: https://phabricator.wikimedia.org/T156628

The UA responsable for those pageviews is : {"browser_major":"6","os_family":"Windows XP","os_major":"-","device_family":"Other","browser_family":"IE","os_minor":"-","wmf_app_version":"-"} which is responsible for 19000 pageviews on that page, that points to bot traffic and probably a bogus UA (hard to say). That doesn't mean that traffic is not real, it is, but it is automated. Our top computation is really not reliable when it comes to "human" traffic as it is affected by bot traffic of bots that are not self-reported as such. This is something we hope to solve in the upcoming year.