For Dumps 2.0, we would like to have a stream that resolves the user/comment/content details when a flip of visibility happens from FALSE to TRUE. We have the Flink Stream Processing mechanism, so we can help ourselves with guidance from the Event folks. Hopefully such an enriched stream is also useful to other folks that would like this problem be solved upstream of their processes. This enriched stream should probably be compatible with the limited schema discussed in (1) over at T349845#9334970, so that in the event that (1) is implemented, we don't need to reimplement the enriched stream.
(Further context at T349845#9334970)
(Note that even though it would be nice to have T351565 first, that other task does not block this one)
In this task we should:
- Implement an enriched revision visibility stream as stated above
- Modify pyspark code that consumes visibility changes to instead consume from this new stream, and to update data when flips occur.
- Update Airflow job