Page MenuHomePhabricator

Production warning: Aborted connection 38950 to db: 'mwdb_wbstack_X' user: 'mwu_X' host: '10.108.4.7' (Got an error reading communication packets)
Closed, ResolvedPublic

Description

We've been seeing these warnings for quite some time but haven't been able to fully understand why they occur. This was first observed in T310066: Production error: sql-backup failed to start due to no database available but seems unrelated to that original problem.

https://cloudlogging.app.goo.gl/T9nXutUHCrmS5Aje7

The original suspicion was that this could be related to the secondary SQL not having enough memory but seeing that this has now plateaued this is probably not the case.

One alternative thread to investigate could be the errors of dropped connections we are also seeing on the konnectivity-agent https://cloudlogging.app.goo.gl/9jeWzRaxbSaDM7sm6, however when trying to correlate these two errors it seems the konnectivity-agent is actually referring to some other internal service most of the time.

AC

  • Figure out whats wrong and document it here
  • (Optional) Fix the problem

Event Timeline

Since first reported we were at 2400 aborted connections https://phabricator.wikimedia.org/T310066#7986077

We are now at 2800

MariaDB [(none)]> SHOW GLOBAL STATUS;
+--------------------------------------------------------+--------------------------------------------------+
| Variable_name                                          | Value                                            |
+--------------------------------------------------------+--------------------------------------------------+
| Aborted_clients                                        | 2859                                             |
| Aborted_connects                                       | 6                                                |

So in an attempt to correlate things I manually restarted secondary-sql pod today at 2022-06-16 12:27:07

The theory being, this is a resource/caching-configuration problem and we were seeing way less of these errors for a few days to a week after the secondary pod was OOMkilled last time.

As the memory allocated starts depleting the number of log entries from T310121: Production warning: [RedisBagOStuff] Rejected set() for X due to snapshot lag (late regeneration). and this ticket goes up and this could indicate that we need to increase the memory overall. For this we have a new ticket T310782: MariaDB: Follow recommended memory suggestions.

Resource usage looked like this after restart

{F35246576}

Up until the time the pod was restarted we had a couple of aborted connections every minute or so with a total of 2913 aborted connections reported in SHOW GLOBAL STATUS

MariaDB [(none)]> SHOW GLOBAL STATUS;
+--------------------------------------------------------+--------------------------------------------------+
| Variable_name                                          | Value                                            |
+--------------------------------------------------------+--------------------------------------------------+
| Aborted_clients                                        | 2913                                             |

The theory being, this is a resource/caching-configuration problem and we were seeing way less of these errors for a few days to a week after the secondary pod was OOMkilled last time.

So this did pretty much nothing, it restarted has not even gone up to using half of the allocated memory yet but these aborted connections just keep happening.

MariaDB [(none)]> SHOW GLOBAL STATUS;
+--------------------------------------------------------+--------------------------------------------------+
| Variable_name                                          | Value                                            |
+--------------------------------------------------------+--------------------------------------------------+
| Aborted_clients                                        | 2933                                             |
| Aborted_connects                                       | 9                                                |

So, giving up that idea I had a look in trying to correlate the name of the mediawiki database with the actual domain used for the requests I think i found something interesting.

https://cloudlogging.app.goo.gl/wabQJu6kkTFojtFUA

Will have a closer look tomorrow if there is anything we can do.

Ok, so looking in to these premature closures of the upstream it seems most of these are due a bot.

There is a bot called "Mj12Bot", that is literally flooding the platform with traffic in a very stupid way (Last 30 days: https://cloudlogging.app.goo.gl/QgZJJjJBCg9y6pyk6).

It seems to shoot many requests, sometimes from different IP:s at the same time to URL:s that just doesn't make much sense, it almost seems as if it's stuck in some loop and can't get out sometimes.

Most of these aborted connections are following the same pattern, very long querystrings and usually the same page, Special:CreateAccount.
I suspect Special:CreateAccount is for us rather heavy since we are loading recaptcha and multiple spam prevent extensions, hitting this from different ips with only seconds apart seems to result in these errors and aborted connections.

Since it's a paid service with no obvious public use I'd suggest we slow it down or disallow it completely following the instructions they provide on their website.

Slowing it down 20 second interval with robots.txt

User-Agent: MJ12bot
Crawl-Delay: 20

Disabling it with robots.txt

User-agent: MJ12bot
Disallow: /

Some example urls it's crawling, which is all the same page over and over again many thousand times in a day.

title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=oldid=344607%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14577&returntoquery=direction=next%26oldid=344286%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14572&returntoquery=action=history%26mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=prev%26oldid=344589&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14571&returntoquery=direction=next%26oldid=105949%26mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=91797%26oldid=86038 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=344718%26oldid=344654 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14577&returntoquery=oldid=344280%26mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14570&returntoquery=direction=prev%26oldid=86610 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14577&returntoquery=diff=prev%26oldid=328392&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14570&returntoquery=diff=329523%26oldid=329500 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14577&returntoquery=diff=next%26oldid=328385 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14570&returntoquery=diff=329515%26oldid=329514 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14577&returntoquery=diff=next%26oldid=328383 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14577&returntoquery=diff=cur%26oldid=91717&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14577&returntoquery=diff=cur%26oldid=80573&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q15512&returntoquery=direction=prev%26oldid=323373%26mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Minerva.css&returntoquery=direction=prev%26oldid=826%26mobileaction=toggle_view_mobile&mobileaction=toggle_view_mobile HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q15513&returntoquery=diff=prev%26oldid=322926 HTTP/1.1
title=Special:CreateAccount&returnto=Property:P10 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q15513&returntoquery=diff=prev%26oldid=322925 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q15513&returntoquery=diff=prev%26oldid=322922 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14581&returntoquery=diff=cur%26oldid=336043 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14581&returntoquery=diff=cur%26oldid=336037 HTTP/1.1
title=Special:UserLogin&returnto=Special:EditPage HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14581&returntoquery=diff=cur%26oldid=336034 HTTP/1.1
title=Special:UserLogin&returnto=Special:DispatchStats HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q6059 HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Mobile.css&returntoquery=action=edit%26oldid=827%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_mobile HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q6 HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Math-tracking-category-texvc-deprecation&returntoquery=action=edit%26section=0%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Math-tracking-category-render-error&returntoquery=action=edit%26section=0%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q5261 HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Minerva.css&returntoquery=direction=prev%26oldid=838&mobileaction=toggle_view_mobile HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q3187 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q15513&returntoquery=diff=cur%26oldid=322930 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=oldid=344637%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=oldid=344635 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14595&returntoquery=diff=next%26oldid=321349 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q2008&returntoquery=diff=274633%26oldid=160923 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14595&returntoquery=diff=cur%26oldid=310395 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=oldid=344634%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=oldid=344633&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14595&returntoquery=diff=cur%26oldid=171830 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q1959&returntoquery=oldid=8968 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14595&returntoquery=diff=cur%26oldid=171820 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14595&returntoquery=diff=cur%26oldid=171818 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14595&returntoquery=diff=cur%26oldid=171810 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14593&returntoquery=direction=prev%26oldid=169435%26mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14593&returntoquery=direction=next%26oldid=310121 HTTP/1.1
title=Special:CreateAccount&returnto=Item%3AQ29&returntoquery=mobileaction%3Dtoggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14583&returntoquery=diff=next%26oldid=80587 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14599&returntoquery=direction=next%26oldid=169447 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14594&returntoquery=oldid=87992%26mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14581&returntoquery=diff=336116%26oldid=156896 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14594&returntoquery=direction=prev%26oldid=310433%26mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=oldid=344641%26mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14594&returntoquery=direction=next%26oldid=76118%26mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14580&returntoquery=direction=next%26oldid=321503 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=prev%26oldid=344674 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=prev%26oldid=344528 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14580&returntoquery=diff=327917%26oldid=310999 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=next%26oldid=344626&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14577&returntoquery=direction=prev%26oldid=344260 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=direction=prev%26oldid=344583 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=direction=prev%26oldid=344570 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=next%26oldid=169556 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=cur%26oldid=344688 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=prev%26oldid=344631 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=direction=prev%26oldid=169544 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=prev%26oldid=344571&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=direction=prev%26oldid=156670 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=direction=next%26oldid=76014 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=prev%26oldid=156674 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=prev%26oldid=105966 HTTP/1.1
title=Special:CreateAccount&returnto=Item:Q14578&returntoquery=diff=next%26oldid=80575 HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Accesskey-pt-anontalk&returntoquery=action=edit%26section=0%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_mobile HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Accesskey-n-recentchanges&returntoquery=action=edit%26section=0%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_mobile HTTP/1.1
title=Special:CreateAccount&returnto=Item%3AQ29&returntoquery=mobileaction%3Dtoggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=Special:MobileDiff/192333&mobileaction=toggle_view_mobile HTTP/1.1
title=Special:CreateAccount&returnto=Property:P101 HTTP/1.1
title=Special:CreateAccount&returnto=Special:MobileDiff/130829 HTTP/1.1
title=Special:CreateAccount&returnto=Special:MobileDiff/130846...560434&returntoquery=mobileaction=toggle_view_mobile HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Minerva.css&returntoquery=action=edit%26oldid=57&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Accesskey-ca-move&returntoquery=action=edit%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Accesskey-ca-edit&returntoquery=action=edit%26section=0%26mobileaction=toggle_view_mobile&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Abusefilter-edit-builder-vars-tor-exit-node&returntoquery=action=edit%26section=0%26mobileaction=toggle_view_mobile HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Aboutpage&returntoquery=action=edit%26section=0%26mobileaction=toggle_view_mobile HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:About&returntoquery=action=edit%26section=0%26mobileaction=toggle_view_mobile HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Vector.css&returntoquery=diff=prev%26oldid=53&mobileaction=toggle_view_desktop HTTP/1.1
title=Special:CreateAccount&returnto=MediaWiki:Duplicate-args-category&returntoquery=action=info%26mobileaction=toggle_view_desktop&mobileaction=toggle_view_mobile HTTP/1.1

since we don't have a robots file yet, this has been previously done in the nginx-ingress chart by specifying the user-agent pattern.

staging/local: https://github.com/wmde/wbaas-deploy/pull/415

production: https://github.com/wmde/wbaas-deploy/pull/416

toan added a subscriber: Deniz_WMDE.

thanks @Deniz_WMDE for deploying the changes, moving back to blocked/stalled to keep and eye on it.

So, after a few days the large amount of aborted connections have stopped with the drop in traffic from the bot.

{F35259006}

Over the weekend we had about one per day which feels acceptable, I'm not sure we are gonna ever completely get rid of these.

@toan for some reason the image in your comment does not show. could you add a link to check the difference please?

Looking at this together with @Rosalie_WMDE we saw this just happened for a bunch of request, lets atleast figure out what happened there and see if it's some other bot or user.

Ok, we are definitely seeing this again and this time it seems like normal usage through the api which might indicate something still is broken.

https://cloudlogging.app.goo.gl/9yG4vWrHsMUz7BSKA

Will take a closer look at whats happening next week unless anyone beats me to it.

Since the pods got more memory last week we've had 10 aborted clients.

MariaDB [(none)]> SHOW GLOBAL STATUS;
+--------------------------------------------------------+--------------------------------------------------+
| Variable_name                                          | Value                                            |
+--------------------------------------------------------+--------------------------------------------------+
| Aborted_clients                                        | 10                                               |
| Aborted_connects                                       | 0                                                |

That being said we haven't seen that much traffic for that period (23rd june - 28th june)

I just noticed that these errors can be reproduced locally with this https://phabricator.wikimedia.org/T309070#8033358

2022-06-29  9:16:47 656 [Warning] Aborted connection 656 to db: 'mwdb_bda6d4a294' user: 'mwu_43c6ae9428' host: '172.17.0.1' (Got an error reading communication packets)

@Evelien_WMDE: The project tag got archived and this open task has no other active project tags. Could you please either add an active project tag so this task can be found, or update the task status? Thanks a lot!