Page MenuHomePhabricator

Chinese scraper (?) with multiple IP addresses overloading wsexport
Closed, ResolvedPublic

Description

The requests come from a variety of Chinese IP addresses. Requests are of the form

<ip> - - [29/Dec/2015:12:24:55 +0000] "GET /wsexport/tool/book.php?lang=fr&format=pdf-a5&page=La_Fortune_de_Gaspard HTTP/1.1" 200 97999 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"

and are hitting wsexport every ~20 seconds. As loading the page takes much more than that, this is effectively killing wsxport.

All are using the same odd user agent

Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36

which is a year-old Chrome version on Windows XP. As Chrome auto-updates, this suggests to me it's a scraper lying about the user agent.

This is linked to the following IP addresses, all Chinese:

valhallasw@tools-proxy-01:~$ sudo tail -n 100000 /var/log/nginx/access.log | grep wsexport | grep  "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36" | cut -d"-" -f1 | sort |uniq -c
     41 119.188.12.11
     10 119.188.12.7
     36 119.188.50.138
      2 218.26.232.136
     10 218.26.232.164
    111 61.54.24.78

and this makes up for 210 of the 320 most recent requests to wsexport.

Blocking these using the user agent is probably the most effective.

Event Timeline

valhallasw raised the priority of this task from to Needs Triage.
valhallasw updated the task description. (Show Details)
valhallasw added a project: Toolforge.
valhallasw added subscribers: valhallasw, Tpt.
Restricted Application added a project: Cloud-Services. · View Herald TranscriptDec 29 2015, 12:29 PM
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript
valhallasw closed this task as Resolved.Feb 3 2016, 5:29 PM
valhallasw claimed this task.