Page MenuHomePhabricator

import subversion repos from Phabricator into Gitlab
Open, Needs TriagePublic

Description

Recently we tried to remove subversion from the Phabricator machines (https://gerrit.wikimedia.org/r/c/operations/puppet/+/785149)

But even though the remaining svn repos hosted on it have now been disabled it caused user visible errors (T307889).

The remaining 4 repos, that are now inactive but still exist readonly, are (created a global saved query):

https://phabricator.wikimedia.org/diffusion/query/6WUBLfM9eS2R/

https://phabricator.wikimedia.org/diffusion/TSVN/
https://phabricator.wikimedia.org/diffusion/SVNP/
https://phabricator.wikimedia.org/diffusion/SVN/
https://phabricator.wikimedia.org/diffusion/SVNM/

There are upstream gitlab docs about 2 different methods to migrate SVN repos to gitlab.

https://docs.gitlab.com/ee/user/project/import/svn.html

The better one uses https://subgit.com/

Let's try to migrate those 4 repos over to gitlab, then adjust all links from wikis to the old svn repos and finally actually delete them from Phabricator and as the last step remove the subversion package from phabricator machines.

Event Timeline

(Whoever recently created the saved Diffusion query called "svn-repos" at https://phabricator.wikimedia.org/diffusion/query/6WUBLfM9eS2R/ , it was the "default" at https://phabricator.wikimedia.org/diffusion/query/edit/ being shown to everyone going to https://phabricator.wikimedia.org/diffusion/ which was highly confusing. I removed it as default.)

Subversion got removed from Phabricator by mistake in https://gerrit.wikimedia.org/r/c/operations/puppet/+/785149 which has lead to T307889 and now this task to try to remove subversion. The Puppet patch was wrong: we do need subversion to host the archived repositories.

Given:

  • Gitlab does not support hosting Subversion repositories
  • converting from Subversion to Git is not trivial (I have been involved in such conversions previously)
  • those repositories are kept for archival purposes
  • changing the canonical hosting place requires adjusting a lot of tempates, links etc

Therefore I don't think we should spend any time converting the repo to git or migrating them to Gitlab. Keeping them on Phabricator Diffusion is good enough,

I propose to decline this task, and not to care about these SVN repos. Not every ancient code base must be archived by us forever.{{Citation needed}}

I don't think we need the Subversion data itself -- definitely don't need to go through the hassle of porting it all over to Gitlab. I proposed we just tar+gzip the tree as it was on the final commit, find a decent-ish place to stash it for perpetuity, then call it a day. Then we can kill the SVN repos from Phab & then drop the subversion package from puppet.

I would also say we can make a .tar.gz and then I would suggest to put it here, next to these:

https://dumps.wikimedia.org/other/misc/

.oh..look..there is already an SVN dump in there?

There are some dumps yes. For toolserver I previously traced the history to T60801.

My point above was that subversion was erroneously removed and I would rather keep the old subversion repos hosted on Phab. This way they can still be easily be browsed, we do not have to update all the templates/wiki links pointing to old subversion revisions and we can forget about this task.

The removal wasn't an accident or really erroneous. It just didn't quite consider that some docs for old config variables and stuff doesn't point to SVN and that someone might actually still use them.

I'm not quite sure what's best to do with old docs or how easy mass updating them is but I don't think we should just keep SVN around forever too. Having all repos in a single canonical place (eventually gitlab) will be a good thing. The dumps can be kept forever if anyone ever needs to explore SVN data directly.

I would rather keep the old subversion repos hosted on Phab

I would not. I'd personally like to see repo hosting in Phab to be shut down. It's hard to write useful docs for new technical contributors when we use at least 4 different code repo hosting venues. I support anything that reduces this number.

This way they can still be easily be browsed

But are they being actively browsed beyond a stray link to them? Or is this largely just a hypothetical?

we do not have to update all the templates/wiki links pointing to old subversion revisions and we can forget about this task.

Maybe we should just update them? If a page of documentation has links SVN repos I'd posit that it's either completely obsolete or at the very least could use a refresh. Why not just update the SVN links to the equivalent git sha1s and call it a day? Then we can forget about the repos forever -- instead of having this conversation again in N years when somebody kills SVN again.

For what it is worth, rTSVN toolserver-svn is ancient now (read-only since 2014) and archived at https://archive.org/details/toolserver-svn. I don't think we need it anywhere in a "live" svn service these days.

we do not have to update all the templates/wiki links pointing to old subversion revisions and we can forget about this task.

Maybe we should just update them? If a page of documentation has links SVN repos I'd posit that it's either completely obsolete or at the very least could use a refresh. Why not just update the SVN links to the equivalent git sha1s and call it a day? Then we can forget about the repos forever -- instead of having this conversation again in N years when somebody kills SVN again.

The report at T307889: sh: 1: svn: not found on phabricator was for a footnote on Manual:$wgWhitelistRead. The link there is made using Template:Rev. That template is transcluded a lot on mw.o. It would be cool for someone to write a bot that replaces those old svn links with git links, but I think we have other ways around this that will be faster. I think we can update the mw.o template to point to pages in https://static-codereview.wikimedia.org/MediaWiki/ instead of pointing to [[phabricator:rSVN{{{1}}}]].