Page MenuHomePhabricator

Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge
Closed, ResolvedPublic

Description

As per the parent ticket, there are five DAGs running on the platform_eng airflow instance which use WMF Data Workflow Utils to build a runtime conda environment, based on miniconda.

Due to upcoming licence changes in the Anaconda project, we wish to ensure that all of these environments are using the latest version of the workflow utils conda pipeline, in which we switch from miniconda to miniforge.

Please would you take steps to upgrade these DAGs and deploy them, when convenient?

It should just be a case of ensuring the gitlab-ci.yml file references v0.19.0 of repos/data-engineering/workflow_utils as stated here:
https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/Airflow/Developer_guide/Python_Job_Repos#GitLab_CI_setup

Some of the jobs also install mambaforge manually.
This should no longer be required and it would be best to remove this for the same reasons.

There is a reference GitLab MR here, which you may find useful:
https://gitlab.wikimedia.org/repos/data-engineering/example-job-project/-/merge_requests/36/

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
update CIrepos/structured-data/section-topics!38mfossatiT379545main
Bump workflow utils & most dependenciesrepos/structured-data/seal!3mfossatiT379545main
Bump workflow utils & all dependenciesrepos/structured-data/image-suggestions!45mfossatiT379545main
Customize query in GitLab

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
mfossati changed the task status from Open to In Progress.Dec 6 2024, 4:49 PM
mfossati claimed this task.

@BTullis this is done from the Structured Content team's side, so I'm removing tags.

Pending @Htriedman 's action.

@acooper : since security/differential-privacy seems to be in the scope of your team, could you take care of this transition to miniforge? @Htriedman : as the main contributor to this project, is this something you could help with?

Hi! I tried making the seemingly small changes requested here. Specifically, I bumped the version of repos/data-engineering/workflow_utils from v0.14.0 ==> v0.19.0 in .gitlab-ci.yml (see the example I'm basing this off of and the current state of the file at this link) and am trying to push to a new branch. Unfortunately, when I push I get a "You are not allowed to push code to this project" message. Looking at the current project members, I seem to have been removed as a developer at some point.

@acooper is there any way you can add me as a developer here to I can continue with this maintenance?

Hi all, given that Andy is no longer at the Foundation I think that maybe @sbassett should be pulled in for these perms.

Hey all - I've added @Htriedman as a Developer to https://gitlab.wikimedia.org/repos/security/differential-privacy, which expires on 2025-12-31. @Htriedman - if you need access beyond that, just let me know, and we can likely re-authorize you. Let me know if you have any questions or need anything else.

This is now complete, since @Htriedman has now released a couple of new versions of the differential_privacy pipelines and the version of workflow_utils that is in use is version 0.20.0.

BTullis updated the task description. (Show Details)