Page MenuHomePhabricator

Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API
Closed, ResolvedPublic3 Story Points

Description

There's a non existing article among the most viewed articles of the Basque language wikipedia since the 8th of October.

The article in question is "Teratologia": https://eu.wikipedia.org/wiki/Teratologia

It's one of the most viewed articles according to the API but I think that it's quite strange and probably it's a bug.
https://wikimedia.org/api/rest_v1/metrics/pageviews/top/eu.wikipedia/all-access/2016/10/07

I found this because I created a website that uses this API's data. It's in Basque but you can check the evolution of the page views of the "Teratologia" article in the graphic:
http://wikidosia.aldatsa.eus/joerak?artikuluak=Teratologia


It has zero page views until the 8th of October as expected for a non existing article but then there's a high number of page views. Isn't it strange?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 26 2016, 10:59 AM
Asier_Iturralde_Sarasola renamed this task from Non existing article in the data returned by the /metrics/pageviews/top/ API to Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API.Oct 26 2016, 11:01 AM
Restricted Application added a project: Analytics. · View Herald TranscriptOct 31 2016, 11:08 AM
Nuria edited projects, added Analytics-Kanban; removed Analytics.Oct 31 2016, 3:33 PM
Restricted Application added a project: Analytics. · View Herald TranscriptOct 31 2016, 3:33 PM
Nuria moved this task from Incoming to Backlog (Later) on the Analytics board.Oct 31 2016, 3:33 PM
JAllemandou moved this task from Next Up to In Progress on the Analytics-Kanban board.
Restricted Application added a project: Analytics. · View Herald TranscriptNov 2 2016, 8:14 PM

This issue is caused by usual hidden bot traffic. I link this task as a subtask of T138207.

Hive query:

SELECT
  user_agent_map,
  SUM(view_count)
FROM wmf.pageview_hourly
WHERE year = 2016
  AND month = 10
  AND project = 'eu.wikipedia'
  AND page_title = 'Teratologia'""").registerTempTable("eu")
GROUP BY
user_agent_map;

Result: [Windows XP,-,-,6,IE,Other,-,11307]

JAllemandou removed JAllemandou as the assignee of this task.Nov 2 2016, 8:18 PM
JAllemandou moved this task from Backlog (Later) to Wikistats Production on the Analytics board.
JAllemandou added a subscriber: JAllemandou.
Nuria added a subscriber: Nuria.Nov 2 2016, 9:33 PM

In this case i think mediawiki might be retuning 200 to a non existing page, correct? For us to be counting these as pageviews

Nuria moved this task from Wikistats Production to Dashiki on the Analytics board.Mar 13 2017, 5:02 PM
Milimetric triaged this task as Normal priority.May 8 2017, 2:28 PM
Nuria edited projects, added Analytics-Kanban; removed Analytics.Jul 3 2017, 5:00 PM
Nuria moved this task from In Progress to Next Up on the Analytics-Kanban board.Jul 5 2017, 6:48 PM
Nuria claimed this task.Jul 11 2017, 6:38 PM
Nuria moved this task from Next Up to In Progress on the Analytics-Kanban board.
Nuria removed a subscriber: Nuria.
Nuria set the point value for this task to 3.Jul 11 2017, 6:42 PM
Nuria added a comment.Jul 11 2017, 7:57 PM

I am sorry we didi not looked at this earlier. Page got created December 7th, so pageviews before then are likely from edit/preview requests (we had a bug that counted those). See: https://phabricator.wikimedia.org/T156628

The UA responsable for those pageviews is : {"browser_major":"6","os_family":"Windows XP","os_major":"-","device_family":"Other","browser_family":"IE","os_minor":"-","wmf_app_version":"-"} which is responsible for 19000 pageviews on that page, that points to bot traffic and probably a bogus UA (hard to say). That doesn't mean that traffic is not real, it is, but it is automated. Our top computation is really not reliable when it comes to "human" traffic as it is affected by bot traffic of bots that are not self-reported as such. This is something we hope to solve in the upcoming year.

Nuria moved this task from In Progress to Done on the Analytics-Kanban board.Jul 11 2017, 7:57 PM
Nuria closed this task as Resolved.Jul 12 2017, 7:18 PM