Page MenuHomePhabricator

"Venuše (planeta)" on cs.wp has surprisingly high numbers in Pageviews Analysis (and also Topviews Analysis)
Closed, ResolvedPublic

Description

According to Turnillo's data, the page is repeatedly visited with a rather specific bot. It can be easily fixed by adding that bot's UA to https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L116.


Hello, article Venuše (planeta) has got extremly high views. See analysis. Could you avoid this suspiciously high views?

Event Timeline

Patriccck updated the task description. (Show Details)
Patriccck updated the task description. (Show Details)

Could you avoid this suspiciously high views?

@Patriccck: What is the problem to solve? Just because you don't like something does not mean that someone else should hide that something? :)
If there are actually signs for manipulation (I don't see any provided here), see T123442. Thanks.

Aklapper renamed this task from Venuše (planeta) in Pageviews Analysis (and also Topviews Analysis) to "Venuše (planeta)" on cs.wp has surprisingly high numbers in Pageviews Analysis (and also Topviews Analysis).Dec 1 2019, 1:17 PM

I was wondering if you could prohibit it.

MusikAnimal subscribed.

This is not an issue with Tool-Pageviews but with the underlying data. Unfortunately these "false positives" are common, but as I understand it the Analytics team is working to improve bot detection.

I'm declining this task as I see nothing actionable here.

Urbanecm added a project: Analytics.
Urbanecm updated the task description. (Show Details)

Change 586057 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[analytics/refinery/source@master] Add MeetingRoomApp to the bot regex

https://gerrit.wikimedia.org/r/586057

The bot detection running on shadow mode now should ba able to detect this case, ping @JAllemandou to verify.

In the meantime still i think is correct to add UA to regex, +1

Change 586057 merged by Nuria:
[analytics/refinery/source@master] Add MeetingRoomApp to the bot regex

https://gerrit.wikimedia.org/r/586057

I confirm that our automated-traffic detection heuristic (shadow mode for now) removes the page from the top list (checked on 2 days).

Milimetric subscribed.

\o/ for our new detection :)

Milimetric triaged this task as Medium priority.Apr 6 2020, 4:15 PM