Page MenuHomePhabricator

Topviews Analysis of the Hungarian Wikipedia is flooded with spam
Open, HighPublic

Description

Since October, due some kind of spam, sex and narcotic-related articles are the most viewed articles of the Hungarian Wikipedia. For background, here you can read about the scandal.

Today as an example:
Home page 28k+
Cannabis 13k+
Oral sex 12k+

As you can see, the numbers of these articles are constant and almost the same.


https://tools.wmflabs.org/pageviews/?project=hu.wikipedia.org&platform=all-access&agent=user&start=2019-10-01&end=2019-11-02&pages=Or%C3%A1lis_szex|Metil%C3%A9ndioxi-metamfetamin|Kannabisz|Kokain|Szifilisz|H%C3%ADmvessz%C5%91%7CAn%C3%A1lis_szex|LSD|Hepatitis_C|Kank%C3%B3

Event Timeline

Bencemac created this task.Nov 4 2019, 5:27 PM
Restricted Application added subscribers: MusikAnimal, Aklapper. · View Herald TranscriptNov 4 2019, 5:27 PM
Bencemac updated the task description. (Show Details)Nov 4 2019, 5:28 PM
Nuria added a subscriber: Nuria.Nov 6 2019, 7:06 PM

This is a bot, see patterns that are symmetric per UA (just looked at Orális_szex page)

+---+-------------+-------------------+------------------------------------------------------------+

cipgeocoded_data[city]ua

+---+-------------+-------------------+------------------------------------------------------------+

309167.xFrankfurt am MainMozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) Appl
300167.xFrankfurt am MainMozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) App
275167.xFrankfurt am MainMozilla/5.0 (Linux; U; Android 4.4.2; en-us; SCH-I535 Build/
267167.xFrankfurt am MainMozilla/5.0 (Android 7.0; Mobile; rv:54.0) Gecko/54.0 Firefo
1661167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/201
1641167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
159167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/201
144167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
102167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
86167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36

+---+-------------+-------------------+------------------------------------------------------------+

Nuria added a comment.Nov 6 2019, 7:07 PM

This is a bot, see patterns that are symmetric per UA (just looked at Orális_szex page)

+---+-------------+-------------------+------------------------------------------------------------+

cipgeocoded_data[city]ua

+---+-------------+-------------------+------------------------------------------------------------+

309167.xFrankfurt am MainMozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) Appl
300167.xFrankfurt am MainMozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) App
275167.xFrankfurt am MainMozilla/5.0 (Linux; U; Android 4.4.2; en-us; SCH-I535 Build/
267167.xFrankfurt am MainMozilla/5.0 (Android 7.0; Mobile; rv:54.0) Gecko/54.0 Firefo
1661167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/201
1641167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
159167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/201
144167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
102167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
86167.xFrankfurt am MainMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36

+---+-------------+-------------------+------------------------------------------------------------+

Nuria added a comment.Nov 6 2019, 7:20 PM

Numbers above are just for 1 hour.

Nuria added a comment.Nov 6 2019, 7:23 PM

Also if you look at teh pageviews from this IP from 1 day these are the titles requested.

+-----+----------------------+

cpage_title

+-----+----------------------+

353706521_Pina
34046FASZ_Pirszósz_Grevenón
32473Hüvely
31998Hímvessző
29105Pina_(folyó)
28810Pina_(település)
28529Pina_(film)
26318Anális_szex
25991Orális_szex
25752Ondó

+-----+----------------------+

Nuria added a comment.Nov 7 2019, 11:50 PM

After running the data for hu.wikipedia through bot spikes detection the top list for 2019/10/16 looks like the following. Most rogue pages (marked in red) disappear, note that for a few pages about 80% of traffic is bot in nature. This are results for 2019/10/16

Nuria added a comment.Nov 7 2019, 11:53 PM

Pinging here Product-Analytics so they are aware that effects of bots in "small" sites like these can be dramatic

@Nuria Thanks for the details! Is there anything further we can do?

@Bencemac not for known, @JAllemandou and myself are thinking this quarter how to best deploy our bot spike detection algorithms, when we have more news we will send an update

Ottomata assigned this task to Nuria.Mon, Nov 11, 4:35 PM
Ottomata triaged this task as High priority.
Ottomata moved this task from Incoming to Data Quality on the Analytics board.

Since 19th of October, the flood has stopped (17th and 18th of October).