In T270613, we implemented a PoC to see whether we could orchestrate a tool to assist users in finding similarusers on Wikimedia projects.
In this task, we want to revisit that PoC, and based on it, deploy an automated data pipeline on the new platform_eng Airflow server.
Done
- Review previously proposed architecture
- Identify issues and challenges with building out job on Data Pipeline - update ticket with findings
- Review service's storage and schema - consider if appropriate to port to Cassandra