Currently, we have a repository with 2 PyFlink pipelines. This repository was created with PyFlink rather than vanilla Flink with the idea that other teams could easily create their own pipelines.
It turns out that no one is building their own pipelines, and the integration between Python and Flink has created many issues in the past, or complications while upgrading versions.
We want to consider how hard would be to move these 2 pipelines (content_history.py and page_content_change.py) to Java or Scala and if that would make our lives easier in the future.
We need to consider at least these things:
- Language: Java or Scala
- Build system: Maven, Gradle, SBT or others.
@dcausse has suggested to avoid the Scala Flink API as it is now deprecated. We could still consider using Scala but using the Java API. They are also using Maven to build the project, so we could use it too unless there are other reasons to chose otherwise. We can also take their Search update pipeline as an example.
This task is completed if:
- The work to move the Pyflink pipelines is assessed and described.
- A decision is made: Whether to move the pipelines out of Python or not.
- If doing the migration: Java or Scala is chosen. -> Latest Java possible.
- If doing the migration: Maven, Gradle, SBT or others. -> Maven
- If doing the migration: Tasks have been created for the pending work.