Opening a thread to be the center of discussion around finding a way to connect WMF Kafka securely to Wikimedia Enterprise's external cloud infrastructure.
Problem: Wikimedia Enterprise's data feeds are built on the HTTP EventStream APIs which are prone to connection time-outs and potential data loss. Whereas the team has built solutions to minimize the impact of this, see more in this library, it still represents a major risk of data transfer into the Wikimedia Enterprise systems.
Proposal: Create a strong bridge between "non-PII containing" Kafka streams to Wikimedia Enterprise's infrastructure to directly connect to the WMF event platform.
Next Steps: Put together a technical solution that can be passed around to relevant teams to map out the scope of work and timeline - @Ottomata, we have previously discussed this and had some thoughts, I can work with you to document them on this ticket!
Risks to consider / mitigate:
- Security Implications - Tagging WMF Security on this ticket to be a part of the design to ensure that we are not creating undo risk on the PII containing streams
- Reliability - As we scope out a technical solution, discovering who will be able to maintain the connection between the two infrastructures. To be decided how much work that will be.