Page MenuHomePhabricator

Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all dependencies are available, and validation that deployment to production still works
Open, MediumPublic8 Estimated Story Points

Description

Existing active projects need to be migrated to fetch dependencies from Gitlab and publish artifacts to Gitlab.

Parent pom might need some further adaptation as we discover the specificity of existing projects.

Some external dependencies might not be available from maven central, those need to be tracked and added at the project level.

For each project, we want to validate:

  • releases are created by CI and published to Gitlab
  • production deployment are working
Projects to migrate
Data Engineering Maintained Repos

(See also T368927: [Epic] Migrate Data Platform Engineering maintained git repos to GitLab)

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Increase compatibility with projects hosted outside GitLabrepos/maven/wmf-jvm-parent-pom!24pfischerwmf-packagesmain
Use Maven CI/CD components.repos/data-engineering/gobblin-wmf!3pfischermaven-cimain
Migrate from https://gitlab.wikimedia.org/repos/maven/gerrit-artefactsrepos/wmf-packages!1pfischerpackage-importermain
Forge mvn.ymlrepos/maven/wmf-maven-ci!1pfischertinkermain
Add gitlab-ci script to populate project coordinatesrepos/maven/wmf-jvm-parent-pom!23pfischergitlab-ci-profilemain
Draft: CI: build java via child pipelinerepos/data-engineering/metrics-platform!87pfischermaven-cimain
CI: build java via child pipelinerepos/data-engineering/metrics-platform!86pfischermaven-cimain
Add release/deployment hint for Gerrit-hosted projectsrepos/maven/wmf-jvm-parent-pom!22pfischergerrit-artifactsmain
Add maven release capability to CI pipelinerepos/maven/wmf-jvm-parent-pom!21pfischergitlab-maven-releasemain
Customize query in GitLab

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I wrote a script that comes up with the following order of migration (based on inter-project dependencies):

== Inter project dependencies grouped by project:
org.wikimedia.gobblin:gobblin-wmf -> {'org.wikimedia:eventutilities-parent'}
org.wikimedia.analytics:hdfs-tools -> {'org.wikimedia.analytics.refinery:refinery'}
org.wikimedia.analytics.refinery:refinery -> {'org.wikimedia:eventutilities-parent'}
org.wikimedia:eventutilities-parent -> set()
org.wikimedia.metrics:metrics-platform -> set()
org.wikimedia.search:mjolnir -> set()
org.wikimedia.search.highlighter:cirrus -> set()
org.wikimedia.search:opensearch-extra-analysis -> set()
org.wikimedia.search:opensearch-extra-parent -> set()
org.wikimedia.discovery.cirrus.updater:cirrus-streaming-updater-parent -> {'org.wikimedia:eventutilities-parent'}
org.wikimedia.search:glent -> set()
org.wikidata.query.rdf:query-service-parent -> {'org.wikimedia:eventutilities-parent'}

== Order of projects for migration:
org.wikimedia:eventutilities-parent
org.wikimedia.metrics:metrics-platform
org.wikimedia.search:mjolnir
org.wikimedia.search.highlighter:cirrus
org.wikimedia.search:opensearch-extra-analysis
org.wikimedia.search:opensearch-extra-parent
org.wikimedia.search:glent
org.wikimedia.discovery.cirrus.updater:cirrus-streaming-updater-parent
org.wikidata.query.rdf:query-service-parent
org.wikimedia.analytics.refinery:refinery
org.wikimedia.gobblin:gobblin-wmf
org.wikimedia.analytics:hdfs-tools

Change #1135407 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[wmf-jvm-utils@master] Upgrade parent pom to latest version

https://gerrit.wikimedia.org/r/1135407

Migration process works as follows:

  • Make sure, each of the projects uses the latest WMF JVM Parent POM (1.92+), which defaults to GitLab as repository for distribution of maven artefacts.
  • Point each project to Gerrit Artefacts as temporary package-registry-only repository for Gerrit-hosted source code.
  • Make sure (via package importer) that Gerrit Artefacts holds the latest version of WMF dependencies required by any of the projects, for example, org.wikimedia:eventutilities requires org.mediawiki.utils:http-client-utils:1.0.0.
  • If a project (still) builds afterwards (with no reference to Archiva as source repository) it should be ready.

Q:

which defaults to GitLab as repository for distribution of maven artefacts.

Are we sure we want the pom to have references to GitLab? Where a project publishes seems more a CI related configuration than a pom related configuration? If I make a fork of a project and want to publish it elsewhere, I have to edit the pom?

If possible (maybe it is too hard), would it be better to make the parent pom's publish destination configurable via properties, and then set them in CI templates?

Related:

Are we sure we want the pom to have references to GitLab? Where a project publishes seems more a CI related configuration than a pom related configuration? If I make a fork of a project and want to publish it elsewhere, I have to edit the pom?

So do you suggest leaving the distributionManagement section empty and let the CI pipeline specify altSnapshotDeploymentRepository? That clutters commands without reason IMHO (you also have to specify a repositoryId so they can be mapped to server section in settings.xml). What's your motivation? Avoid a vendor lock-in? Prevent local deploys?

If possible (maybe it is too hard), would it be better to make the parent pom's publish destination configurable via properties, and then set them in CI templates?

You are right and that's how it's implemented: All the relevant properties can be overridden. To cover the deployment of maven artefacts to GitLab for external projects, a project inheriting from the WMF JVM Parent POM will have to override a single property: gitlab.projectId. That property is used as part of the distributionManagement/repository|snapshotRepository/url. By default that would point to the same project that hosts the git repository too. However, for source code that is hosted on Gerrit, this should point to the Gerrit Artefacts project instead (see above). In case you want to deploy to a completely different repository, you can still do that by specifying -DaltSnapshotDeploymentRepository=….

By inheriting from WMF JVM Parent POM you already have GitLab-related properties and URL in the POM (see scm). I assume the motivation is to reduce boilerplate code in sub-projects.

Anyways, we have to touch the existing Jenkins jobs (if they release/deploy) to inform them about the GitLab access token to deploy artefacts.

Related:

To cover the deployment of maven artefacts to GitLab for external projects, a project inheriting from the WMF JVM Parent POM will have to override a single property: gitlab.projectId

By default that would point to the same project that hosts the git repository too

How are these set by default?

Parent pom.xml has these gitlab related properties set to default values for the wmf-jvm-parent-pom gitlab project. If these aren't overridden (on the CLI or via CI), mvn release will attempt to publish to the wmf-jvm-parent-pom project.

What's your motivation?

I suppose, decoupling and purpose? Hardcoding gitlab specific information into the project's pom just seems...wrong? We don't include which registry a docker image should be published to in a Dockerfile.

If it is too hard and maven is weird, then okay, but have we tried?

That clutters commands without reason

Yeah, it can, but that can be hidden in CI templates.

so they can be mapped to server section in settings.xm

Yeah... I encoutnered this too :) https://gitlab.wikimedia.org/otto/eventutilities/-/blob/36477fdbd29af421986187cbecd7ba31d7d67e8e/.gitlab-ci.yml#L52 I

To cover the deployment of maven artefacts to GitLab for external projects, a project inheriting from the WMF JVM Parent POM will have to override a single property: gitlab.projectId
By default that would point to the same project that hosts the git repository too

How are these set by default?

That's right, there is no project-specific default. That property has to be set.

Parent pom.xml has these gitlab related properties set to default values for the wmf-jvm-parent-pom gitlab project. If these aren't overridden (on the CLI or via CI), mvn release will attempt to publish to the wmf-jvm-parent-pom project.

What's your motivation?

I suppose, decoupling and purpose? Hardcoding gitlab specific information into the project's pom just seems...wrong? We don't include which registry a docker image should be published to in a Dockerfile.
If it is too hard and maven is weird, then okay, but have we tried?

Putting a Dockerfile and a POM in the same basket is comparing apples and oranges. The scope of a POM is wider and so are the capabilities of maven as the build system. I get the point of keeping that kind of meta information out of a project-build-descriptor, to keep it lean and focused. We are on the same page and if we want to move to a pure CI/CD solution then that makes perfect sense. However, neither are we there yet, nor is that the scope of this ticket. So for the sake of this ticket, we should focus on what's relevant: Get rid of Archiva (and use GitLab package repositories instead)! With that in mind I proposed the aforementioned strategy to migrate deployment and dependency resolution based on the Parent POM away from Archiva to GitLab. That's all. Migrating the projects to GitLab is on a different page.

Hi okay! Yeah all that makes sense!

I didn't mean for my idea to block you! Sorry about that! I meant it more like: "this seems like the right thing to do, have we tried? How hard is it?"

I have a tendency to try to do things generic from the start, which can make them harder to implement. I stop when it gets too hard and then make compromises. I was hacking on this back in February, and I got close to making it work without GitLab details in parent pom, but I ran out of time to make it fully work. Now I’m all context switched and can’t remember all the complexities.

We can always make improvements in the future, especially as we do more GitLab CI template based releasing.

Please proceed!

Change #1138388 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[integration/config@master] maven: update to 3.9.9

https://gerrit.wikimedia.org/r/1138388

@Ottomata, I am pulling the conversation here.

Wow this looks cool!
Do you plan/hope to include these in workflow_utils as part of T386406 Create Gitlab CI templates for JVM packages ? Or do you intend to keep them separate?

I briefly discussed this with @Gehel. I first though that releng would be the right place but apparently they don't have anything maven-related yet. Then there's ci-tools, but the only thing maven-related is a template for SonarQube integration. IIUC, the scope of WMF Data Workflow Utils is on deploying conda environment, which might involve maven-packaged JARs. Since my pipeline can be included directly or as a child pipeline, I would suggest the following: We keep the maven pipeline in the GitLab maven namespace for now and test if Workflow Utils can include it as intended. This gives us clean separation of concerns and reusability. What do you think?

IIUC, the scope of WMF Data Workflow Utils is on deploying conda environment

Hm, not quite! But it is a bit bloated.

The original use case for CI templates in workflow_utils was conda envs and python wheels, but really the templates are generic templates for building and deploying things with gitlab. They also support python wheels, npm packages, tarballs, etc. too.

https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils/-/tree/main/gitlab_ci_templates/pipelines

I first though that releng would be the right place'

Yeah I agree that this would be much better somewhere above data-engineering.

Probably T382430: Create a GitLab CI/CD Component project for WMF CI/CD templates and components is the final right thing to do?

We keep the maven pipeline in the GitLab maven namespace for now and test if Workflow Utils can include it as intended. This gives us clean separation of concerns and reusability. What do you think?

Sounds good! I hope we don't forget or put this stuff in the backlog for long though! The fewer repos/docs/readmes people have to know about to use and understand package deployment, the better!

Take a look at how the templates are layed out in workflow_utils and see if it makes sense for what you are doing. We spent some time thinking about how to organize these things into reusable and parameterized pieces. It might be nice if we could at least start by organizing things in the same way.

Either way, since T386406: Create Gitlab CI templates for JVM packages exists, should we group this CI template under that ticket? This ticket here is about the actual migrating? Feel free to rewrite the other ticket to change the done critera, e.g. maybe we don't actually want to ultimately put this in workflow_utils. I hope we can intend to have a standardize place for this stuff though.

Perhaps, instead of eventually migrating this to workflow_utils, we should intend to start a new global (in collab with dev exp team?) project for T382430 and put the maven templates there, and then eventually also migrate workflow_utils homed templates there too?

Change #1135407 merged by jenkins-bot:

[wmf-jvm-utils@master] Upgrade parent pom to latest version

https://gerrit.wikimedia.org/r/1135407

Change #1138388 abandoned by Hashar:

[integration/config@master] maven: update to 3.9.9

https://gerrit.wikimedia.org/r/1138388

Change #1145174 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[integration/config@master] assign maven-release job to search/wmf-jvm-utils

https://gerrit.wikimedia.org/r/1145174

Change #1145174 merged by jenkins-bot:

[integration/config@master] jjb: add maven-release job to search/wmf-jvm-utils

https://gerrit.wikimedia.org/r/1145174

Change #1146975 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[wmf-jvm-utils@master] Prepare POM for deployment to wmf-packages

https://gerrit.wikimedia.org/r/1146975

Change #1147003 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[integration/config@master] jjb: use local settings for all maven goals

https://gerrit.wikimedia.org/r/1147003

Change #1147003 merged by jenkins-bot:

[integration/config@master] jjb: use local settings for all maven goals

https://gerrit.wikimedia.org/r/1147003

Change #1147017 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] Revert "jjb: use local settings for all maven goals"

https://gerrit.wikimedia.org/r/1147017

Change #1147017 merged by jenkins-bot:

[integration/config@master] Revert "jjb: use local settings for all maven goals"

https://gerrit.wikimedia.org/r/1147017

Change #1146975 merged by jenkins-bot:

[wmf-jvm-utils@master] Prepare POM for deployment to wmf-packages

https://gerrit.wikimedia.org/r/1146975

Change #1147771 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[analytics/refinery/source@master] Update parent POM

https://gerrit.wikimedia.org/r/1147771

Change #1147774 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[wikimedia-event-utilities@master] Update parent POM

https://gerrit.wikimedia.org/r/1147774

Change #1148277 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[wikidata/query/rdf@master] Update parent POM

https://gerrit.wikimedia.org/r/1148277

Change #1148304 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[wikidata/query/deploy@master] Use GitLab package registry to fetch dist archive

https://gerrit.wikimedia.org/r/1148304

Hey @pfischer , with regards to Gobblin-WMF we've migrated the project to GitLab and enabled the CI/CD release of package to the GitLab's Maven repository: https://gitlab.wikimedia.org/repos/data-engineering/gobblin-wmf

It'd be greatly appreciated if you'd take a look at the blubber.yaml and pom.xml files and ensure we're conforming to the new parent POM/release standards.

Change #1148793 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[search/extra@master] Deploy to central via Sonatype's Central Portal

https://gerrit.wikimedia.org/r/1148793

Change #1148794 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[search/extra-analysis@master] Deploy to central via Sonatype's Central Portal

https://gerrit.wikimedia.org/r/1148794

Change #1148817 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[search/glent@master] Update parent POM

https://gerrit.wikimedia.org/r/1148817

Change #1148821 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[search/highlighter@master] Deploy to central via Sonatype's Central Portal

https://gerrit.wikimedia.org/r/1148821

Change #1148277 merged by jenkins-bot:

[wikidata/query/rdf@master] Update parent POM

https://gerrit.wikimedia.org/r/1148277

Change #1148821 merged by jenkins-bot:

[search/highlighter@master] Deploy to central via Sonatype's Central Portal

https://gerrit.wikimedia.org/r/1148821

Change #1148793 merged by jenkins-bot:

[search/extra@master] Deploy to central via Sonatype's Central Portal

https://gerrit.wikimedia.org/r/1148793

Change #1148794 merged by jenkins-bot:

[search/extra-analysis@master] Deploy to central via Sonatype's Central Portal

https://gerrit.wikimedia.org/r/1148794

Change #1148817 merged by jenkins-bot:

[search/glent@master] Update parent POM

https://gerrit.wikimedia.org/r/1148817

Change #1147774 merged by jenkins-bot:

[wikimedia-event-utilities@master] Update parent POM

https://gerrit.wikimedia.org/r/1147774

Change #1148304 merged by DCausse:

[wikidata/query/deploy@master] Use GitLab package registry to fetch dist archive

https://gerrit.wikimedia.org/r/1148304

In order to migrate the refinery-source project, we need to update this script: https://github.com/wikimedia/analytics-refinery/blob/master/bin/update-refinery-source-jars#L101
I'll prioritize this so that this patch can be merged and applied: https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1147771

Change #1178895 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery@master] Update artifacts download script to use gitlab

https://gerrit.wikimedia.org/r/1178895

Change #1147771 merged by jenkins-bot:

[analytics/refinery/source@master] Update parent POM

https://gerrit.wikimedia.org/r/1147771

Change #1178895 merged by Joal:

[analytics/refinery@master] Update artifacts download script to use gitlab

https://gerrit.wikimedia.org/r/1178895

Change #1180854 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery/source@master] Update project to release v0.3.0

https://gerrit.wikimedia.org/r/1180854

Change #1180854 merged by jenkins-bot:

[analytics/refinery/source@master] Update project to release v0.3.0

https://gerrit.wikimedia.org/r/1180854