Page MenuHomePhabricator

Improve traffic surge handling for translatewiki.net
Closed, ResolvedPublic

Description

Our uptime monitor flapped. Investigating the log, the cause is an abusive client. Downtime correlates when then abusive client requested Special:SupportedLanguages a lot:

grep [SNIP] /var/log/nginx/access.log* | grep SupportedLanguages | cut -f2 -d"[" | cut -f1 -d"]" | sort | uniq -c
      1 05/Apr/2020:01:28:53 +0200
      2 05/Apr/2020:01:52:18 +0200
      3 05/Apr/2020:01:52:19 +0200
      4 05/Apr/2020:01:52:20 +0200
      1 05/Apr/2020:01:52:21 +0200
      2 05/Apr/2020:01:52:22 +0200
      1 05/Apr/2020:01:52:23 +0200
      1 05/Apr/2020:01:52:24 +0200
      3 05/Apr/2020:01:52:25 +0200
      4 05/Apr/2020:01:52:26 +0200
      1 05/Apr/2020:01:52:27 +0200
      3 05/Apr/2020:01:52:28 +0200
      9 05/Apr/2020:01:52:29 +0200
     12 05/Apr/2020:01:52:30 +0200
      1 05/Apr/2020:01:52:31 +0200
      3 05/Apr/2020:01:52:34 +0200
      3 05/Apr/2020:01:52:35 +0200
      4 05/Apr/2020:01:52:36 +0200
      2 05/Apr/2020:01:52:38 +0200
      2 05/Apr/2020:01:52:39 +0200
      3 05/Apr/2020:01:52:40 +0200
      4 05/Apr/2020:01:52:41 +0200
      1 05/Apr/2020:01:52:42 +0200
     10 05/Apr/2020:01:52:44 +0200
      9 05/Apr/2020:01:52:45 +0200
      6 05/Apr/2020:01:52:46 +0200
      1 05/Apr/2020:01:52:49 +0200
      8 05/Apr/2020:01:52:50 +0200
     15 05/Apr/2020:01:52:51 +0200
     22 05/Apr/2020:01:52:52 +0200
     25 05/Apr/2020:01:52:53 +0200
     23 05/Apr/2020:01:52:54 +0200
      1 05/Apr/2020:01:52:55 +0200
      3 05/Apr/2020:01:52:56 +0200
      4 05/Apr/2020:01:52:57 +0200
      1 05/Apr/2020:01:52:58 +0200
      4 05/Apr/2020:01:52:59 +0200
      8 05/Apr/2020:01:53:00 +0200
     12 05/Apr/2020:01:53:01 +0200
      1 05/Apr/2020:01:53:02 +0200
      1 05/Apr/2020:07:57:58 +0200

To prevent this in future, investigate the following possibilities:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 5 2020, 7:30 AM

Limit Requests module is not included nginx-light package we have installed. It's included in nginx-full.

And The delay parameter (1.15.7) specifies a limit at which excessive requests become delayed but we're using 1.14.2.

Change 586139 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/Translate@master] Make Special:SupportedLanguages do less work during web requests

https://gerrit.wikimedia.org/r/586139

Change 587702 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[translatewiki@master] puppet: Add daily run of updateTranslatorActivity.php

https://gerrit.wikimedia.org/r/587702

Pginer-WMF triaged this task as Medium priority.Apr 9 2020, 10:42 AM

Change 586139 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Make Special:SupportedLanguages do less work during web requests

https://gerrit.wikimedia.org/r/586139

abi_ assigned this task to Nikerabbit.Apr 9 2020, 11:18 AM
abi_ added a subscriber: abi_.

Niklas has submitted patches to restructure the way we generate the stats for the Supported languages page.

  1. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/584905 - This patch reduced the number of queries we were doing to get per language output, at the cost of the information being slightly out of date. This has been deployed on translatewiki.net
  2. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/586139 - This patch caches the data for 3 days, but updates it every day via a daily scheduled job, reducing the amount of work that needs to be done via a web request. This will be deployed on translatewiki.net next Wednesday.

Change 587702 merged by jenkins-bot:
[translatewiki@master] puppet: Add daily run of updateTranslatorActivity.php

https://gerrit.wikimedia.org/r/587702

Nikerabbit closed this task as Resolved.Apr 27 2020, 8:05 AM
Nikerabbit moved this task from Backlog to System admin stuff on the translatewiki.net board.