There've been 3 attempts to copy data between es2014 to es2026 and the three of the failed.
The first one after around 1TB, the second one after 8.1TB and the third one after around 70GB.
The error that was showed was:
WARNING: Firewall's temporary rule could not be deleted ERROR: Copy from es2014.codfw.wmnet:/srv/sqldata to es2026.codfw.wmnet:/srv/ failed
, however the nc processes were still open and data was still being transferred: even if the error was showing and the process was stopped, netcat processes remained alive on both hosts, source and target.
At the moment the transfer is ongoing on es2026 to a different directory than /srv/sqldata as there are suspects that the failure might be related to puppet runs and:
- Ferm being reloaded
- chmod/chown changing /srv/sqldata
The amount of data that needs to be transferred is about 9TB so it takes several hours.
There're many upcoming transfers in order to populate the new es hosts for eqiad and codfw.