>>! In T151425#3503094, @Bawolff wrote:
> Hmm, the cdb thing is perhaps not the best data structure, really we should use bloom filters instead.
>
> For a "mere" 700 mb, we could have a bloom filter with a 0.01% (1 in 10,000) false positive rate containing all 306 million passwords.
>
> More realistically, 100,000 passwords is 234 kb at 0.01% false positive, 292 kb for 0.001%, 351 kb for 0.001% (1 in a million).
>
> I guess its not really clear what is an acceptable false positive rate in this context, but 1 in a million certainly seems acceptable beyond any doubt... Possibly other structures like Cuckoo filters could give even better trade-offs but i don't know much about them.
>
> https://hur.st/bloomfilter?n=100000&p=0.0001
A quick look finds numerous implementations on github, some of which are available to pull in via composer. A few aren't licensed though, so that's annoying
| name | licence | last commit | composer | packagist | serializable | reproducible build | file size |
| ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| [[https://github.com/mrspartak/php.bloom.filter|mrspartak/php.bloom.filter]] |[[https://github.com/mrspartak/php.bloom.filter/issues/9|Unknown]] | 04/04/2015 | Y |[[https://github.com/mrspartak/php.bloom.filter/issues/10|N]] | Y (serialise whole object) | N | 1.4M |
| [[https://github.com/makinacorpus/php-bloom|makinacorpus/php-bloom]] | [[https://github.com/makinacorpus/php-bloom/issues/1|Unknown]] | 30/08/2016 | Y | Y | Y | Y | 236K |
| [[https://github.com/pleonasm/bloom-filter|pleonasm/bloom-filter]] | BSD 2-clause | 16/08/2014 | Y | Y | [[https://github.com/pleonasm/bloom-filter/issues/3|N]] (jsonSerialize not implemented) | N/A | |
| [[https://github.com/dsx724/php-bloom-filter|dsx724/php-bloom-filter]] | Apache License 2.0 | 03/10/2014 |[[https://github.com/dsx724/php-bloom-filter/issues/6|N]] | [[https://github.com/dsx724/php-bloom-filter/issues/6|N]] | [[https://github.com/dsx724/php-bloom-filter/issues/7|N]] | N/A | |
| [[https://github.com/rocket-internet-berlin/RocketLabsBloomFilter|rocket-internet-berlin/RocketLabsBloomFilter]] | MIT | 18/05/2017 | Y | Y | Y (can save to redis too) | Y | 176K |
| [[https://github.com/maxwilms/bloom-filter|maxwilms/bloom-filter]] | MIT | 15/09/2015 | Y | Y | [[https://github.com/maxwilms/bloom-filter/issues/1|N]] | N/A | |