Page MenuHomePhabricator

Move Wmfdata-Python from Github to Gitlab
Open, HighPublic

Description

Wmfdata-Python's repo currently lives at https://github.com/wikimedia/wmfdata-python. However, other Data Platform Engineering repos are moving to Gitlab (T368927) so Wmfdata-Python should move as well. The new location should be https://gitlab.wikimedia.org/repos/data-engineering/wmfdata-python.

Plan

  • Update metadata.py to point to the GitLab repo.
  • Release v2.4 with the changes to metadata.py. Users will see the update notification as normal, with the update command still pointing to the GitHub repo.
  • Import the repository to GitLab.
    • We should do this relatively soon after releasing v2.4. Until we do, users who have updated to v2.4 will see a notification on each import that the "The check for a newer release of Wmfdata failed to complete. Consider checking manually."
  • Update the readme in the GitHub repo with a prominent notice that development has moved to GitLab and then archive the repo.
    • We cannot blank the GitHub repo at this point, because users will continue to use it to fetch the v2.4.
  • Update links across Wikitech and Phabricator
  • Update the Conda-Analytics conda-environment.yml to point to the GitLab repo.

Users will not get an update notification pointing to the GitLab repo until they (1) are running v2.4 and (2) we release a subsequent version (e.g. v2.4.1 or v2.5) on GitLab. For that reason, we don't need to send announcement to users until we do 2.

Event Timeline

nshahquinn-wmf updated the task description. (Show Details)

+GitLab (Project Migration) (please add appropriate tags so people can find tasks - thanks!)

@fkaelin is interested in co-hosting a Wmfdata work session during the Research and Data Science offsite next month. Getting this done will make the session easier and help preserve its energy afterwards (by avoiding a disruptive migration soon after we've hopefully onboarded a bunch of new contributors).

nshahquinn-wmf raised the priority of this task from Medium to High.Oct 25 2024, 7:56 PM

I assume I do not have the permissions to create a Data Engineering repository on GitLab. I can probably get someone from Data Products to create the desired repo and give me permissions to manage it.

Blank the GitLab repo, leaving only a readme pointing to the new repository and metadata.py.

Did you mean GitHub?

Also wanted to offer an alternative: archiving it.

@fkaelin is interested in co-hosting a Wmfdata work session during the Research and Data Science offsite next month. Getting this done will make the session easier and help preserve its energy afterwards (by avoiding a disruptive migration soon after we've hopefully onboarded a bunch of new contributors).

I would discourage such a big change just before an offsite where you will depend on this library. Things break, plans change, and we are assuming the DE team can do a release of conda-analytics within a strict deadline.

From the point of view of users, they shouldn't care where a python package comes from, be it Gitlab or Github.

(regardless of _WHEN_ you actually do that) Suggesting you go to https://gitlab.wikimedia.org/projects/new and from there to *Import Project* -> https://gitlab.wikimedia.org/projects/new#import_project

Then either select GitHub and use a personal access token or select "Repository by URL" and simply paste the git clone URL for it.

Both ways should import your repo history and once all that is done then you can make one last change on the GitHub side to just leave a README that things have moved.

No need to delete your history and start over on the new system.

Blank the GitLab repo, leaving only a readme pointing to the new repository and metadata.py.

Did you mean GitHub?

Yes, thanks!

Also wanted to offer an alternative: archiving it.

What I've done previously is to archive and blank the repo (like I did with this one) just to make sure people don't keep reading the old code, but if there's a downside to that, I'm happy to just archive it instead.

I would discourage such a big change just before an offsite where you will depend on this library. Things break, plans change, and we are assuming the DE team can do a release of conda-analytics within a strict deadline.

A new release of conda-analytics isn't required; if it was, I promise I would not be trying to get it done on this timeline! Since the existing version of conda-analytics is already built, it won't be affected if the hosting location changes.

Similarly, the worst-case impact on users who already have Wmfdata installed (if we accidentally remove metadata.py from the tip commit on GitHub, for instance) is that Wmfdata will print the startup message:

The check for a newer release of Wmfdata failed to complete. Consider checking manually.

rather than the intended:

You are using Wmfdata v.X.X.X, but v.2.4.0 is available. To update run pip install --upgrade git+https://gitlab.wikimedia.org/repos/data-engineering/wmfdata-python.git@release

From the point of view of users, they shouldn't care where a python package comes from, be it Gitlab or Github.

In this case, the session is about contributing to the library, so hopefully people will be filing merge requests and working on CI and such. That will be easier if the repo is already in its permanent home.

(regardless of _WHEN_ you actually do that) Suggesting you go to https://gitlab.wikimedia.org/projects/new and from there to *Import Project* -> https://gitlab.wikimedia.org/projects/new#import_project

No need to delete your history and start over on the new system.

Yup, that's exactly what I'm planning to do! I already did it with one repo and was impressed with how thorough GitLab was in terms of pulling over old PRs and such 😁

A new release of conda-analytics isn't required; if it was, I promise I would not be trying to get it done on this timeline! Since the existing version of conda-analytics is already built, it won't be affected if the hosting location changes.

Got it.

In this case, the session is about contributing to the library, so hopefully people will be filing merge requests and working on CI and such. That will be easier if the repo is already in its permanent home.

Perfect then. Go for it!

@xcollazo thank you very much for the review and the go-ahead!

I just revised the plan because I had made a mistake about how the update notification works. It generates the update notification using the repo URL stored in the local code, so users will only see an update notification pointing to the GitLab repo until they (1) are running v2.4 and (2) we release a subsequent version (e.g. v2.4.1 or v2.5).

This actually makes the user impact of the migration smaller, because moving repos will not, in itself, trigger a update notification.

nshahquinn-wmf opened https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/merge_requests/55

Update location of Wmfdata-Python repo

Shifted to a new MR, but still waiting for the merge.