Page MenuHomePhabricator

Implement automated deployment of refinery HQL files to HDFS (via blunderbuss)
Closed, ResolvedPublic13 Estimated Story Points

Description

Implement the functionality of synchronizing HDFS locations of refinery HQL files automatically on merging into the master branch of GitLab repository.

Follow the project proposal as described in this document.

Design work done here: T360968: [Developer Experience] [SPIKE] Investigate process to automate deployment of folders and artifacts to HDFS

Done is:

  • Existing workflow_utils project modified to support its original use cases as well as this new one
    • Add support for interacting with GitLab repositories through a designated script
    • If needed, modify artifact.py script to support any additional functionality
  • Implemented stand-alone Flask Web application that:
    • Supports configuration of different repositories through a config file
    • Provides REST API endpoints to initiate the synchronization process for each of the configured repositories
    • Supports pulling the latest Git repository updates through workflow_utils package
    • Recognizes files/folders to be synchronized to HDFS locations, through config file(s)
    • Performs transactional synchronization from Git repository to HDFS locations
  • Implemented synchronization task log web page
    • Provides a web endpoint that lists all synchronization tasks received from Gitlab, with their status, in chronological order
  • Application that is fully operable within the context of our Kubernetes cluster
    • App/service is accessible to REST API calls from WMF GitLab
    • Has functional access to WMF GitLab repositories
    • Has functional access to WMF HDFS
    • Will not be public facing
    • Will not be accessible to any hosts other than GitLab
    • Will require authentication.
  • repository, k8s, DNS and other names are named 'blunderbuss' and not 'hdfs-synchronizer'
  • T382348 Develop a GitLab CI/CD Component to be used by projects/repositories to integrate their CI/CD workflows with Blunderbuss

this is blunderbuss :)

Event Timeline

I talked to @BTullis about obtaining a functional test environment that would mimic the real world this service would be operating in, and he kindly provided a list of things to do in order to build such an environment. The list is in the subtask ticket https://phabricator.wikimedia.org/T371994

Change #1090972 had a related patch set uploaded (by Bking; author: Bking):

[operations/dns@master] dse-k8s-services: add CNAME for blunderbuss (nee hdfs-synchronizer)

https://gerrit.wikimedia.org/r/1090972

Change #1090977 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] dse-k8s: add ingress config for net-new service

https://gerrit.wikimedia.org/r/1090977

Change #1090972 merged by Bking:

[operations/dns@master] dse-k8s-services: add CNAME for blunderbuss (nee hdfs-synchronizer)

https://gerrit.wikimedia.org/r/1090972

Change #1090977 merged by Bking:

[operations/puppet@production] dse-k8s: add ingress config for net-new service

https://gerrit.wikimedia.org/r/1090977

Ottomata renamed this task from Implement automatic sync of refinery HQL files to HDFS to Implement automated deployment of refinery HQL files to HDFS (via blunderbuss).Dec 16 2024, 3:26 PM
amastilovic changed the task status from Open to In Progress.Dec 17 2024, 4:05 PM
amastilovic updated the task description. (Show Details)