Page MenuHomePhabricator

Manipulation of pageview statistics German Wikipedia
Closed, ResolvedPublic

Assigned To
Authored By
Superbass
Sep 16 2019, 11:43 AM
Referenced Files
F31092769: Screenshot.png
Nov 16 2019, 8:28 AM
Tokens
"The World Burns" token, awarded by Steinsplitter."100" token, awarded by Emha."Burninate" token, awarded by Superbass."Heartbreak" token, awarded by real68er."Heartbreak" token, awarded by Stepro."Heartbreak" token, awarded by MichaelSchoenitzer.

Description

Since one month we have had unusual results in the pageview statistics of the German Wikipedia. A moderately well-known musician and two of his projects are high in the Top 10, which is not really comprehensible: [[de:Tobias Sammet]], [[de:Avantasia]], [[de:Edguy]]

This is likely to be a bot-based manipulation. The statistics are displayed in the mobile apps, so you could promote something in this way.

The articles cound be excluded from the statistics, perhaps there are also countermeasures that directly affect those bots,. Alternatively, the display of the most frequently viewed articles could also be removed from the APP in order to combat this manipulation.

Event Timeline

We also had this issue before, see [[de:Anthocyane]], s. [[de:Diskussion:Anthocyane#Aufrufzahlen]]

Hi @Superbass, thanks for taking the time to report this and welcome to Wikimedia Phabricator!

Removing wikipedia.de as this does not seem to be about wikipedia.de but de.wikipedia.org (or MobileFrontend code) and resetting assignee as per https://www.mediawiki.org/wiki/How_to_report_a_bug

Please also provide a link to the "pageview statistics of the German Wikipedia" to make sure everybody is talking about the same thing. Thanks.

I removed [[Formelsammlung Trigonometrie]] from Topviews as an obvious false positive, though I realize that wasn't reported in this task.

I am not sure about the three musician pages. The mobile/desktop ratio seems normal, so it will require further investigation of private data to confirm it is fake traffic. This is very tedious, but I can try to look into when I have the time.

I think MobileFrontend was tagged because these pages are showing up in the app (which I think would be Wikipedia-Android-App-Backlog and/or Wikipedia-iOS-App-Backlog), but this isn't really the app's fault. The core issue is T123442: Pageview API: Better filtering of bot traffic on top enpoints. Topviews and the app are merely pulling data from there.

Thank you for your intervention and comment. I would recommend to take out the three articles about musicians/bands as well. The pageviews of the three articles are synchronized over weeks, and they are far too high as these items have no corresponding presence in the news or charts.

https://tools.wmflabs.org/pageviews/?project=de.wikipedia.org&platform=all-access&agent=user&start=2019-06-17&end=2019-09-15&pages=Tobias_Sammet|Avantasia|Edguy

About the mobile frontend: I thought it would be an option to temporarily remove the "Most read on Wikipedia" section from the app, if it is such an easy target for manipulations. That was just a suggestion, I don't know if that's a good idea. Topviews, on the other hand, do not play a major role on the Web interface.

Status update: The three articles have been in the top 5 since August. There are still 40,000 bot pageviews per day. The manipulators are completely fooling us.

Can someone please remove those articles from Topviews? I don't know any other solution than to temporarily move them to another namespace.

MusikAnimal claimed this task.

By now it's fairly obvious the traffic to these three articles is automated, so I have removed them from Topviews. I normally do a mobile/desktop comparison but in this case they were hitting both mobile-web and desktop. Mobile-app offers more genuine traffic https://tools.wmflabs.org/pageviews/?project=de.wikipedia.org&platform=mobile-app&agent=user&range=latest-120&pages=Tobias_Sammet|Avantasia|Edguy , and the spikes you see there were surely because visitors got to them from the "Trending articles" section of the mobile app.

Indeed this may at least exemplify a scenario where bad actors can surface lesser-known subjects in a high visible place, where it claims they are "trending" topics. Twitter and the like must suffer from similar problems. I can't say for certain that the mobile apps are using the same API, but I suspect they are. I have created T236121 for the Android app, feel free to create one for the iPhone app too (I can't verify the issue as I don't own an iPhone).

Just as with T236121 I think we've done all we can do here. We've identified obvious false traffic and removed it from Topviews. The long-term solution is tracked at T123442: Pageview API: Better filtering of bot traffic on top enpoints.

The removal of bot spikes we are working on would work for spikes like the ones on Formelsammlung Trigonometrie : https://tools.wmflabs.org/pageviews/?project=de.wikipedia.org&platform=all-access&agent=user&start=2019-06-01&end=2019-09-01&pages=Formelsammlung_Trigonometrie

For spikes like the ones on tobias_sammet it will not work as the rate of requests is actually quite low (it is sustained constantly at 1.5 reqs/sec). It is not that is hard to do, the bot detection (for now) is targeting high volume spikes and in this case volume is not that high.

Der_Keks added a subscriber: Der_Keks.

I need to reopen the task as I see that the articles are still on top of most viewed articles. It doesn't help to remove falst traffic, we need either to remove these articles from the index or removing this whole list to combat against it. If the articles are removed from index Mr. Sammet and co. have no chance to get into it if they would become famous so I propose a time-based "ban" of maybe 2 Month.

I'd suggest to remove the articles from the list for at least two months (I thouht that happened already?). It is very unlikely that Mr. Sammet and co. will become famous in the next few months.

That's why I suggested 2 month :)

I would like to add that we now regularly receive complaints in OTRS about this manipulation. It's a bit embarrassing that nothing has changed effectively so far.

Just as with T236121 I think we've done all we can do here. We've identified obvious false traffic and removed it from Topviews. The long-term solution is tracked at T123442: Pageview API: Better filtering of bot traffic on top enpoints.

@MusikAnimal Could you consider to remove the three articles completely out of topviews? We get OTRS-Complaints every few days about that topic and I ran out of ideas how to answer them.

I would like to add that we now regularly receive complaints in OTRS about this manipulation. It's a bit embarrassing that nothing has changed effectively so far.

I can confirm this. I think the priority for this task should be raised.

Just as with T236121 I think we've done all we can do here. We've identified obvious false traffic and removed it from Topviews. The long-term solution is tracked at T123442: Pageview API: Better filtering of bot traffic on top enpoints.

@MusikAnimal Could you consider to remove the three articles completely out of topviews? We get OTRS-Complaints every few days about that topic and I ran out of ideas how to answer them.

Can you link me to an example? As far as I can tell they are being excluded from https://tools.wmflabs.org/topviews. Are you sure the OTRS complaints are about this tool, or are they about the "trending" list in the mobile app? I unfortunately can't help you with the mobile app :(

Please see: T238357 for upcoming task to label bot spikes as automated traffic, this will address part of this issue and result in more sensical top lists

@MusikAnimal, that's what we can see since month:

Screenshot.png (2×1 px, 286 KB)

@MusikAnimal The complaints are about the mobile app with it's trending list. I thought it would use the same database as topviews.

So, who can remove artickes from the mobile app's trending topics?

@MusikAnimal The complaints are about the mobile app with it's trending list. I thought it would use the same database as topviews.

So, who can remove artickes from the mobile app's trending topics?

I created a task at T236121 but it was declined (and for good reason, it's not really the app's fault). I recommended a short-term solution at T236121#5669186.

See also: T239018 (Request for removing the whole trending-section from dewiki app)

Nuria renamed this task from Manipulation of pageview statistics to Manipulation of pageview statistics German Wikipedia.Mar 17 2020, 5:10 PM

Automated marker is been deployed, issues such as these should be mitigated going forward: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection