Page MenuHomePhabricator

Framework to transfer files over the LAN
Open, NormalPublic

Description

Jaime already started to work on a framework to transfer files (in the DB context: to clone databases):

https://gerrit.wikimedia.org/r/#/c/326155/

The transfer method must:

* Be as fast as the network bandwidth allows it, with
  configurable throttling
* Be easy to use (take automatic decisions when safe)
* Allow for both single files and entire directories to be
  synced. The directories can be of thousands of small files
* Keep the original permissions and ownership
* Have in-place checks to avoid doing damaging stuff
* Allow encryption
* Allow configurable compression
* Allow configurable resource taking (e.g. number of CPUs)
* Checksum contents before and after the copy to check it has been
  done successfuly
* Allow multicast-like transfers from 1 server to many
* Report the status at any time, and if it fails, why
* Handle the firewall automatically
* Not require a constantly open port or service

The current code is just the barebones, it has to be integrated
with volan's packages for remote code execution.

Because the above, rsync is not enough. We have to give a look to
multi-thread FTP, tar + socat with user encryption, and bittorent.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 27 2017, 9:54 AM
jcrespo added subscribers: Rduran, jcrespo.EditedApr 17 2018, 12:56 PM

@Rduran Do you think you can take care of this? There is a prototype at https://gerrit.wikimedia.org/r/326155 but all the other Remote Calling methods should be dropped and use instead cumin ( https://wikitech.wikimedia.org/wiki/Cumin ). Sadly, Cumin is python2 only for now.

jcrespo updated the task description. (Show Details)Apr 17 2018, 12:57 PM
Rduran claimed this task.Apr 19 2018, 8:01 AM
Marostegui moved this task from Backlog to In progress on the DBA board.Apr 20 2018, 8:09 AM
Restricted Application added a project: Operations. · View Herald TranscriptMay 8 2018, 4:05 PM

@Vgutierrez suggested using https://github.com/vstakhov/hpenc , which I don't think is a bad idea at all- it would just change some of the executions of openssl and netcat to that tool, but the tools probably needs packaging and setup?

The recommended cipher, which is an easier change, is chacha20 or, alternatively, AES-GCM rather than the randomly selected one on the commit.

Rduran added a comment.May 9 2018, 2:15 PM

hpenc looks interesting, so maybe we can keep it in mind for future improvements.

Yes, I was not suggesting to do it now, just document the suggestion for the future- or maybe they can even set it up for us in parallel. Changing the algorithm, assuming openssl on stretch supports it, though, should be a 10 character patch.

in stretch chacha20 is available as "chacha20" and in jessie as "chacha20-poly1305", BTW for big enough block size (16384 bytes), chacha20 performs better than rc4 on one core :)

Rduran added a comment.May 9 2018, 2:35 PM

Thank you both! I'm using "chacha20" right now and it seems to work just fine (I'm using buster, but stretch is also on 1.1.0). Does jessie need to be supported too?

No, stick to stretch, that is ok- that is the target.

Change 432569 had a related patch set uploaded (by Rduran; owner: Rduran):
[operations/puppet@production] [WIP] Refactor code in transfer.py

https://gerrit.wikimedia.org/r/432569

Change 433558 had a related patch set uploaded (by Rduran; owner: Rduran):
[operations/software/wmfmariadbpy@master] [WIP] Refactor code in transfer.py

https://gerrit.wikimedia.org/r/433558

Change 432569 abandoned by Rduran:
[WIP] Refactor code in transfer.py

Reason:
Moved to operations/software/wmfmariadbpy

https://gerrit.wikimedia.org/r/432569

Marostegui triaged this task as Normal priority.Jun 8 2018, 3:40 PM

Change 433558 merged by Jcrespo:
[operations/software/wmfmariadbpy@master] [WIP] Refactor code in transfer.py

https://gerrit.wikimedia.org/r/433558

Change 446871 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfmariadbpy@master] transfer.py: Make checksum optional

https://gerrit.wikimedia.org/r/446871

Change 446871 merged by Jcrespo:
[operations/software/wmfmariadbpy@master] transfer.py: Make checksum optional

https://gerrit.wikimedia.org/r/446871

@jcrespo how do you feel about closing this task as resolved?

The original scope isn't met by far:

  • No throttling except it is easy to implement with pv
  • It is not intelligent
  • Compression is only on/off, not configurable
  • CPU resources is not configurable
  • Checksum works but it is very slow (serial execution before and after)
  • No multicast
  • No state reporting

Additionally, I would like to see:

  • More work towards mysql provisioning
  • Configurable compression and decompression on both ends (transmit a packaged file or create one)

We can create a separate ticket for mysql provisioning automation, I think, as part of the binary backups tasks, to do this.

ema moved this task from Triage to Watching on the Traffic board.Sep 5 2018, 8:09 AM
Marostegui moved this task from In progress to Next on the DBA board.Sep 12 2018, 5:23 AM
Marostegui moved this task from Next to Backlog on the DBA board.

transfer.py was modified to add hot mysql backup taking and compression/decompression handling for provisioning.

It is still a bit of a clunky mess, and it would be nice to be adopted by someone else to maintain the basic transfer functionality better.