Page MenuHomePhabricator

database query timeouts since April 26th 2020
Closed, DeclinedPublic

Description

use dewiki_p;
select page_title, cl_to
from page b, categorylinks, templatelinks
where cl_from = b.page_id and tl_from = b.page_id and b.page_namespace = 0 and tl_from_namespace = 0 and tl_namespace = 10 and tl_title not in (
   select page_title
   from page a
   where a.page_namespace = 10
)
order by page_title;

This query running on dewiki.analytics.db.svc.eqiad.wmflabs timeouts since April 26th or 27th 2020. This query lasts about 100 minutes, sometimes 70, sometimes 110 minutes.

Proposal: The databases are growing and growing and growing every day. It is necessary to increase the timeout time in accordance with it.

Event Timeline

We are running with less capacity due to issues being discussed at T249188 and the query killer has been set to kill queries that take more than 60 minutes to avoid generating even more load.

@Marostegui: my client user is u4802

Is it possible, to exempt this client, that my queries will not longer fail any more?

@Marostegui: my client user is u4802

Is it possible, to exempt this client, that my queries will not longer fail any more?

No, I am sorry, we cannot do that. As I said, we are running in sort of a degraded state. Hopefully by next week we'll have the 3rd server back (fingers crossed!)

For longer term, you should know that we are in constant conversation with cloud team on the need to increase available resources for wikireplicas. That is how things will be able to run faster even with more data. Sadly this takes some time, but if the maintenance goes well, that will be the first step towards it. Apologies for the temporary disruption.

Declining as nothing can be done at the moment to mitigate the current timeouts.
We have finished reimporting all the data into labsdb1011, if all goes well, we will start replication soon and during the week we might be able to repool it back to service.