Not an issue if it is a temporary import, but it is if it is intended to be a continuous process.
Either provide those on a separate instance or make a pause every X amount of time.
jcrespo | |
Dec 9 2015, 11:16 AM |
F3061092: labsdb1004_slave_lag.png | |
Dec 9 2015, 2:31 PM |
F3061088: labsdb1005.png | |
Dec 9 2015, 2:31 PM |
Not an issue if it is a temporary import, but it is if it is intended to be a continuous process.
Either provide those on a separate instance or make a pause every X amount of time.
There are six temporary import tasks running right now processing 6 months of pagecount data (at the very end now). There is a crontabbed task running every hour checking for new pagecount files and importing them if they exist.
I could insert a sleep some places, but do you have any recommendation as to how long this should be?
@Stigmj tool-labs database management is based on the assumption that everywhere behaves responsibly. 50% of the time importing, 50% sleeping would what I would recommend you. Also, lock for each import so you make sure you only use 1 thread at a time (and that way you let other users use the available resources).
I have ways to enforce that, but I would want to make it the users' responsability first. To give you an idea of the impact, this is the current CPU usage:
(despite mysql not being too CPU-hungry, but mostly blocking on IOPS)
And its slave, with no other load other than replication is 7 hours behind:
Replication suffers from imports, if needed, we could import separately on the master and the slave.
I would believe you that the rate will be slower soon, and give you feedback otherwise.
I have not commented it, but if this requires specific resources, because it could be useful for more than 1 person and it is considered an important contribution, we could try to get them (I cannot guarantee it, of course) separate resources so it does not disrupt other tools' work.
I have put in some random sleeps (between 60 and 600 seconds) in between each time my importscript is called and implemented a lockfile-mechanism to only allow one instance to be active at a time. Hopefully this will ease the load on the DB-servers.
This seems to have worked. Lags is in the 0-10 range, which it is acceptable. There is still high cpu usage on labsdb1005, but I think this is now due to other users.
Thank you very much for your collaboration.