Implement the functionality of synchronizing HDFS locations of refinery HQL files automatically on merging into the master branch of GitLab repository.
Follow the project proposal as described in this document.
Design work done here: T360968: [Developer Experience] [SPIKE] Investigate process to automate deployment of folders and artifacts to HDFS
Done is:
- Existing workflow_utils project modified to support its original use cases as well as this new one
- Add support for interacting with GitLab repositories through a designated script
- If needed, modify artifact.py script to support any additional functionality
- Implemented stand-alone Flask Web application that:
- Supports configuration of different repositories through a config file
- Provides REST API endpoints to initiate the synchronization process for each of the configured repositories
- Supports pulling the latest Git repository updates through workflow_utils package
- Recognizes files/folders to be synchronized to HDFS locations, through config file(s)
- Performs transactional synchronization from Git repository to HDFS locations
- Implemented synchronization task log web page
- Provides a web endpoint that lists all synchronization tasks received from Gitlab, with their status, in chronological order
- Application that is fully operable within the context of our Kubernetes cluster
- App/service is accessible to REST API calls from WMF GitLab
- Has functional access to WMF GitLab repositories
- Has functional access to WMF HDFS
- Will not be public facing
- Will not be accessible to any hosts other than GitLab
- Will require authentication.
- repository, k8s, DNS and other names are named 'blunderbuss' and not 'hdfs-synchronizer'
- T382348 Develop a GitLab CI/CD Component to be used by projects/repositories to integrate their CI/CD workflows with Blunderbuss
this is blunderbuss :)