Just ran lvextend -L+1100G /dev/mapper/tank-data && sudo xfs_growfs /srv on all the codfw hosts.
===== NODE GROUP ===== (4) pc[2011-2014].codfw.wmnet ----- OUTPUT of 'df -hT /srv;' ----- Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/tank-data xfs 8.7T 9.3G 8.7T 1% /srv ================
The new pc hosts in codfw are now in service. They're replicating from a blank start, so it will take 3 weeks for them to be populated fully. Once that's done, we can make one or more primary to see how that affects performance.
/srv resized on all eqiad hosts:
(4) pc[1011-1014].eqiad.wmnet ----- OUTPUT of 'df -hT /srv' ----- Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/tank-data xfs 8.7T 9.3G 8.7T 1% /srv
Steps to update replication tree after making pc2011 primary:
- downtime all of pc1, as we have circular replication in place with eqiad
- move other pc1/codfw nodes beneath pc2011:
- db-move-replica pc2010 pc2011
- db-move-replica pc2014 pc2011
- reset replication for all affected nodes:
- mysql.py -h pc1007 -e 'stop slave; reset slave all'
- mysql.py -h pc2007 -e 'stop slave; reset slave all'
- mysql.py -h pc2011 -e 'stop slave; reset slave all'
- re-setup replication for the remaining nodes using binlog coords:
- mysql.py -h pc1007 -e "change master to master_node='pc2011..."
- mysql.py -h pc2007 -e "change master to master_node='pc2011..."
- mysql.py -h pc2011 -e "change master to master_node='pc1007..."
- reenable gtid everywhere
Current state: After running for a day, the graphs for the new node (db2011) are looking very promising. In particular, disk latency is massively improved.
Read latencies: 3.92s to 38.1s. Avg: 14.8s
Write latencies: 1.42s to 36.1s. Avg: 9.4s
Read latencies: 172ms to 445ms. Avg: 260ms
Write latencies: 623ms to 3.63s. Avg: 1.39s
This makes sense, but it's still good to see :)