Page MenuHomePhabricator

s51059 is doing unnecessarily slow queries
Closed, ResolvedPublic

Description

The queries look like this:

SELECT * FROM externallinks LIMIT 80340000,15000

This not only makes the queries unnecessarily long (performing a full table scan every time that is run), it also potentially contributes to the recent labsdb inestability by consuming more resources than necessary.

Please rewrite your queries to use the PRIMARY KEY for paging; it is as easy, and millions of times less expensive.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
jcrespo renamed this task from s51059 is doing inappropiate queries to s51059 is doing unnecessarily slow queries.Aug 9 2016, 1:32 PM

@Cyberpower678: s51059 is tools.cyberbot, of which you are the maintainer.

Oh. It's better to not use the DB username when mentioning problems otherwise, I'll think I was subscribed by mistake.

That aside I haven't done anything recently with the externallinks. InternetArchiveBot doesn't use that table.

So can I block that user account?

I'm not sure. I can never remember my DB username, since it's just a bunch of numbers.

So I am indeed s51059, and IABot is actively using that account, but not the external links table.

My fear is, if it is not using that table, but queries are being sent with it, maybe it is compromised?

Queries are being received from 10.68.17.155:46093

More likely, it uses some library function that performs such queries internally.

Oh wait. I do have a bot script that I commissioned over 2 years ago that does use the external links table. But that bot has been in continuous operation since I commissioned it, and has been scanning the table in its entirety, so I doubt it would have caused a recent outage.

Well, right now, I have to go over all tools doing things incorrectly. I suggest either stop it or rewrite it to page using indexes as mentioned it above.

Hmm... I seem to have another problem. My Wikitech password is being rejected. It's a saved password in my vault.

That's cyberbot-exec-01.cyberbot.eqiad.wmflabs, which suggests that these are legitimate queries. A compromise does not sound likely (why would one run externallinks queries?).

Indeed it is, and quickly fixed my password problem, hooray for backups. :p

To query as you suggested would require a complete rewrite of the bot script, so the only thing I can do right now is disable it.

I've disabled the affected scripts.

I do not think a full rewrite would be needed- a limit seems to be kept on each query. Change that to a PK limit (it should not be more than a few lines of code changed).

It is ok to disable it if it will take you some time to do it.

The problem with the current approach is that it queries millions of rows on every execution; and even it is unlikely to be the main cause of memory issues; it certainly contributes to make the database unnecessarily slow for everyone else. We had at least 3 crashes last week to excessive memory usage. For example, since Saturday, your user queried 234439026351 rows but only sent 83583746. Ideally that should be as close to a 1:1 ratio as possible.

jcrespo claimed this task.

@Cyberpower678 Thank you for your quick response! Remember that there was no problem with using the database in the way you did it, I am just contacting users that could improve its queries to make them more efficient for everyone's benefit.

I cannot go and provide a patch for everyone's tools (there is thousands of them); merely a review- however please do not hesitate to reopen this issue and ask for assistance and probably someone else could help.

Thank you again!