Page MenuHomePhabricator

dbprov2002 slower to generate snapshots
Closed, ResolvedPublic

Description

I the last week at least, dbprov2002 as been very slow to generate its snapshots, with single sections taking 5+ hours to compress the resulting files it on a tar.gz.

Monitor (eg. backup times lately) and research the cause of this- hardware issue? Software?

Event Timeline

It needs to finish the backups T236406#5619287 and ongoing snapshots for me to do a sanity restart.

jcrespo triaged this task as Medium priority.Oct 30 2019, 4:45 PM

Mentioned in SAL (#wikimedia-operations) [2019-10-31T16:14:06Z] <jynus> restart dbprov2002 after upgrade T236924

root@db1115.eqiad.wmnet[zarcillo]> select name, section, host, TIMESTAMPDIFF(MINUTE, start_date, end_date) as minutes FROM backups where type='snapshot' and start_date > now() - INTERVAL 7 day and source like '%.codfw.wmnet:%' order by host, section, start_date;
+----------------------------------+---------+------------------------+---------+
| name                             | section | host                   | minutes |
+----------------------------------+---------+------------------------+---------+
| snapshot.s4.2019-10-29--22-42-04 | s4      | dbprov2001.codfw.wmnet |      77 |
| snapshot.s4.2019-10-31--21-46-32 | s4      | dbprov2001.codfw.wmnet |     152 |
| snapshot.s4.2019-11-03--21-55-37 | s4      | dbprov2001.codfw.wmnet |     132 |
| snapshot.s5.2019-10-30--05-11-29 | s5      | dbprov2001.codfw.wmnet |      49 |
| snapshot.s5.2019-11-01--01-56-17 | s5      | dbprov2001.codfw.wmnet |      51 |
| snapshot.s5.2019-11-04--01-45-25 | s5      | dbprov2001.codfw.wmnet |      52 |
| snapshot.s6.2019-10-30--11-38-43 | s6      | dbprov2001.codfw.wmnet |      61 |
| snapshot.s6.2019-11-01--04-49-38 | s6      | dbprov2001.codfw.wmnet |      51 |
| snapshot.s6.2019-11-04--03-36-57 | s6      | dbprov2001.codfw.wmnet |      55 |
| snapshot.s8.2019-10-29--19-00-01 | s8      | dbprov2001.codfw.wmnet |      98 |
| snapshot.s8.2019-10-31--19-00-01 | s8      | dbprov2001.codfw.wmnet |     260 |
| snapshot.s8.2019-11-03--19-00-01 | s8      | dbprov2001.codfw.wmnet |     241 |
| snapshot.x1.2019-10-30--18-52-24 | x1      | dbprov2001.codfw.wmnet |      29 |
| snapshot.x1.2019-11-01--07-45-48 | x1      | dbprov2001.codfw.wmnet |      29 |
| snapshot.x1.2019-11-04--07-44-00 | x1      | dbprov2001.codfw.wmnet |      29 |
| snapshot.s1.2019-10-29--19-00-01 | s1      | dbprov2002.codfw.wmnet |     518 |
| snapshot.s1.2019-10-31--19-00-01 | s1      | dbprov2002.codfw.wmnet |      76 |
| snapshot.s1.2019-11-03--19-00-01 | s1      | dbprov2002.codfw.wmnet |      84 |
| snapshot.s2.2019-10-30--01-37-02 | s2      | dbprov2002.codfw.wmnet |     529 |
| snapshot.s2.2019-11-01--01-23-33 | s2      | dbprov2002.codfw.wmnet |     135 |
| snapshot.s2.2019-11-04--01-04-23 | s2      | dbprov2002.codfw.wmnet |      69 |
| snapshot.s3.2019-10-30--13-26-05 | s3      | dbprov2002.codfw.wmnet |     594 |
| snapshot.s3.2019-11-01--06-27-27 | s3      | dbprov2002.codfw.wmnet |     202 |
| snapshot.s3.2019-11-04--05-18-44 | s3      | dbprov2002.codfw.wmnet |     235 |
| snapshot.s7.2019-10-30--07-00-23 | s7      | dbprov2002.codfw.wmnet |     631 |
| snapshot.s7.2019-11-01--03-47-16 | s7      | dbprov2002.codfw.wmnet |     159 |
| snapshot.s7.2019-11-04--03-25-29 | s7      | dbprov2002.codfw.wmnet |     179 |
+----------------------------------+---------+------------------------+---------+
27 rows in set (0.00 sec)

Seems okish now (5x speedup), I will give the same treatment to the other dbprovs.

dbprov2* will allways be slower due to the daily backups happening there. The extreme slowdown, however, seems fixed. Backups are up to date.