Page MenuHomePhabricator

s52721__pagecount_stats_p import is making labsdb1005 100% utilized (and lagging its backup slave)
Closed, ResolvedPublic

Description

Not an issue if it is a temporary import, but it is if it is intended to be a continuous process.

Either provide those on a separate instance or make a pause every X amount of time.

Event Timeline

jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo added projects: Tools, Cloud-VPS.
jcrespo added subscribers: jcrespo, Stigmj.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript

There are six temporary import tasks running right now processing 6 months of pagecount data (at the very end now). There is a crontabbed task running every hour checking for new pagecount files and importing them if they exist.

I could insert a sleep some places, but do you have any recommendation as to how long this should be?

@Stigmj tool-labs database management is based on the assumption that everywhere behaves responsibly. 50% of the time importing, 50% sleeping would what I would recommend you. Also, lock for each import so you make sure you only use 1 thread at a time (and that way you let other users use the available resources).

I have ways to enforce that, but I would want to make it the users' responsability first. To give you an idea of the impact, this is the current CPU usage:

labsdb1005.png (304×2 px, 46 KB)

(despite mysql not being too CPU-hungry, but mostly blocking on IOPS)

And its slave, with no other load other than replication is 7 hours behind:

labsdb1004_slave_lag.png (467×1 px, 22 KB)

Replication suffers from imports, if needed, we could import separately on the master and the slave.

I would believe you that the rate will be slower soon, and give you feedback otherwise.

I have not commented it, but if this requires specific resources, because it could be useful for more than 1 person and it is considered an important contribution, we could try to get them (I cannot guarantee it, of course) separate resources so it does not disrupt other tools' work.

I have put in some random sleeps (between 60 and 600 seconds) in between each time my importscript is called and implemented a lockfile-mechanism to only allow one instance to be active at a time. Hopefully this will ease the load on the DB-servers.

This seems to have worked. Lags is in the 0-10 range, which it is acceptable. There is still high cpu usage on labsdb1005, but I think this is now due to other users.

Thank you very much for your collaboration.

jcrespo claimed this task.