Requirement:
- PHP implementation (loosely: no Python dependency)
- Flexible for incorporating future signals without breaking changes
General approach:
- Weighted LSH as a starting point
- Depending on dataset creation step https://phabricator.wikimedia.org/T383060 the algo will be adjusted and finalized