Page MenuHomePhabricator

Transferpy: Enable PBKDF2 usage
Closed, ResolvedPublic

Description

While reviewing https://gerrit.wikimedia.org/r/c/operations/software/transferpy/+/859007 I've noticed that at least on bullseye with openssl 1.1.1n-0+deb11u3, openssl enc issues a WARNING when PBKDF2 isn't used:

*** WARNING : deprecated key derivation used.
Using -iter or -pbkdf2 would be better.

I'd recommend adding -iter 310000 (enables PBKDF2 with 310k iterations). The 310k comes from https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html:

If FIPS-140 compliance is required, use PBKDF2 with a work factor of 310,000 or more and set with an internal hash function of HMAC-SHA-256.

Event Timeline

jcrespo triaged this task as High priority.
jcrespo edited projects, added Data-Persistence-Backup, Data-Persistence; removed SRE.

Change 859047 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/software/transferpy@master] Transferer: Enable PBKDF2 usage with 310000 iterations

https://gerrit.wikimedia.org/r/859047

I can confirm the warning was showing up on verbose mode:

100.0% (1/1) success ratio (>= 100.0% threshold) for command: '/bin/mkdir /tmp/...eqiad.wmnet_4401'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
----- OUTPUT of '/bin/bash -c "/u...qiad.wmnet 4401"' -----                                                                   
*** WARNING : deprecated key derivation used.                                                                                 
Using -iter or -pbkdf2 would be better.

Thanks to @Vgutierrez for bringing it up, we normally don't log warnings as this is executed most of the time unattended.

I am validating the patch by doing the same transfer on production and checking the time invested: time transfer.py --verbose --encrypt dbprov1001.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.m1.2022-05-26--11-54-36.tar.gz dbprov1002.eqiad.wmnet:/srv/backups/snapshots/ongoing

jcrespo changed the task status from Open to In Progress.Nov 21 2022, 1:00 PM

Change 859446 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/software/transferpy@master] Update changelog for release 1.1

https://gerrit.wikimedia.org/r/859446

The new encryption parameters seems to make the transfer 20% slower. I will have more details after a deeper analysis and with more testing, to make sure it is not caused by external factors.

Ignore my previous comments, I was comparing with the wrong transfer. The speed times were:

transfer 1.0 with encryption: 40m28.177s
transfer 1.1 with encryption: 36m52.044s

This means it was faster, although similar enough that could be just the random state of the network. I think this validates no regression, which was the point of the test. Nice.

No longer a warning on verbose mode.

Take into account that PBKDF2 is only used for key derivation purposes, the number of iterations could slow down a little bit this process, but actual encryption/decryption isn't affected by that.

Thank you, I am going to productionize this as is after I validated it had no regressions (and there was no warning anymore).

One last question: do you have a suggestion for a hashing algorithm that would be more secure than md5 or sha1 but also would be as lightweight as md5?

According to openssl speed SHA-256 isn't slower than MD5:

vgutierrez@cp5020:~$ openssl speed md5
[...]
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5             111276.09k   247604.25k   437966.59k   551410.01k   589116.76k   578333.35k

vgutierrez@cp5020:~$ openssl speed sha256
[...]
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256          140355.86k   363260.31k   728866.73k   958996.48k  1060072.11k  1065877.50k

Change 859047 merged by Jcrespo:

[operations/software/transferpy@master] Transferer: Enable PBKDF2 usage with 310000 iterations

https://gerrit.wikimedia.org/r/859047

Change 859446 merged by Jcrespo:

[operations/software/transferpy@master] Update changelog for release 1.1

https://gerrit.wikimedia.org/r/859446

According to openssl speed SHA-256 isn't slower than MD5

Thanks, I will give it a look and test it on the right context (piping content on a single thread) and explore implementing it for the next release.

For now this specific ticket's issue has been merged and deployed into production with no known (speed or functionality) regressions.

Thanks, I will give it a look and test it on the right context (piping content on a single thread) and explore implementing it for the next release.

Consider testing openssl dgst besides coreutils' /usr/bin/sha256sum

Thanks, I will give it a look and test it on the right context (piping content on a single thread) and explore implementing it for the next release.

Consider testing openssl dgst besides coreutils' /usr/bin/sha256sum

I think you may have found the key to our problem when measuring performance in the past. This is not a proper benchmark (the file fits into memory so file caching is in effect) -just a quick and dirty check-, but sha256sum seems to be almost an order of magnitude slower than openssl, which may had forced us in the wrong path 🤦‍♂️:

jynus@sangai:/media/home/jynus$ time (cat ~/Downloads/nodes2.csv | openssl dgst -md5)   
(stdin)= 1bdd1e67957b0e91f90dc9cf38e290cf

real    0m4.758s
user    0m4.234s
sys     0m1.820s

jynus@sangai:/media/home/jynus$ time (cat ~/Downloads/nodes2.csv | md5sum)
1bdd1e67957b0e91f90dc9cf38e290cf  -

real    0m4.706s
user    0m4.412s
sys     0m0.964s

jynus@sangai:/media/home/jynus$ time (cat ~/Downloads/nodes2.csv | openssl dgst -sha256)
(stdin)= 16658ff9a0ce58c89de2c94866b5561d4f5578404d6eeda42f189f9b87e7d41b

real    0m2.318s
user    0m1.817s
sys     0m1.699s

jynus@sangai:/media/home/jynus$ time (cat ~/Downloads/nodes2.csv | sha256sum)
16658ff9a0ce58c89de2c94866b5561d4f5578404d6eeda42f189f9b87e7d41b  -

real    0m10.841s
user    0m10.623s
sys     0m0.962s

Thank you a lot, definitely looking into this.

We realized it was due to the cpu flags sha_ni:

These are the results on a production-like host, with the same file but when the instruction extensions are not available (again, please don't take these as serious benchmarks, just informal checks):

✔️ root@db2102:~$ time (cat ~/nodes2.csv | md5sum)          
1bdd1e67957b0e91f90dc9cf38e290cf  -

real    0m5.956s
user    0m5.360s
sys     0m2.472s
✔️ root@db2102:~$ time (cat ~/nodes2.csv | openssl dgst -md5)
(stdin)= 1bdd1e67957b0e91f90dc9cf38e290cf

real    0m6.281s
user    0m5.347s
sys     0m3.258s
✔️ root@db2102:~$ time (cat ~/nodes2.csv | sha256sum)
16658ff9a0ce58c89de2c94866b5561d4f5578404d6eeda42f189f9b87e7d41b  -

real    0m17.145s
user    0m16.384s
sys     0m3.471s
✔️ root@db2102:~$ time (cat ~/nodes2.csv | openssl dgst -sha256)
(stdin)= 16658ff9a0ce58c89de2c94866b5561d4f5578404d6eeda42f189f9b87e7d41b

real    0m9.006s
user    0m7.963s
sys     0m3.478s