Page MenuHomePhabricator

Git access to wiki pages
Open, LowestPublic

Description

Some developers prefer to deal with code using Git rather than editing directly on a wiki. This is a problem because a lot of code needs to be stored directly on a wiki.

I'm thinking a good solution to this would be if developers could clone groups of pages on a Mediawiki site to a local repository through Git.

Example:

git clone https://en.wikipedia.org/w/api.php?action=gitaccess&pages=Mediawiki:Gadget-foo.js|Mediawiki:Gadget-foo.css|Mediawiki:Gadget-foo

...which could then be edited locally and pushed directly to the wiki. For some userscripts and such, instead of each page being listed, a parent page (eg User:Foo/gadget, with contained pages being User:Foo/gadget/main.js) could be given, and each cloned page would strip the prefix, and each new page created locally in the repository would be pushed to the wiki as a subpage.

(Really, a better option for giving groups of pages would be T39865, which would solve both having to list each page and avoiding clunky titles.)

I'm unsure whether this would require T40795 to be resolved first, or whether it might be possible to just have "lossy" cloning.

Event Timeline

I doint think this will be supported as git requires there be a repository. Which the pages on Wikipedia are not. Wikipedia also uses a db not git.

Aklapper triaged this task as Lowest priority.Nov 2 2017, 11:31 AM

MediaWiki already offers an API to edit pages.
Which actual problem is not already solved by this API, apart from specific technology preferences (git), which would justify adding further code complexity?

I doint think this will be supported as git requires there be a repository. Which the pages on Wikipedia are not. Wikipedia also uses a db not git.

The idea is to generate a repository based off a wiki page on-demand, which would be reflected in both directions.

Which actual problem is not already solved by this API, apart from specific technology preferences (git), which would justify adding further code complexity?

The problems being caused by those specific technology preferences are enormous. We already had the most prominent page on Wikimedia seized by some WMF developers who wanted to work on it but refused to edit it directly on the wiki. (Wiki pages are apparently "hard for engineers to maintain".) The page was literally taken off-wiki and moved to gerrit, in defiance of one of the most basic principles of the Wikimedia.

If certain people absolutely must use git, I think it would be much better to have to deal with a lot of otherwise-unnecessary code complexity, than to have to deal with this kind of threat against how the projects are supposed to function.

More generally: The gap between developers using gerrit/differential and everyone else is getting dangerously large. We need a way for the different groups to contribute in the same areas, using roughly the same tools they're used to if need be. We need some level of overlap.

This would have to be a lossy conversion. Git has a lot of features not supported by MediaWiki, and stores a lot of metadata that MediaWiki pages don't have – in addition to the non-linear history you mentioned (merge commits), we also have commits which can change several files at the same time (MediaWiki edits only affect a single page), identifying authors of a commit by immutable name/email (while MediaWiki uses just a name, which can be changed), allowing people to create commits in the name of a different user (author/committer), commit messages which can be a lot longer than MediaWiki edit summaries, storing Unix file access permissions, and probably a few more. And in turn, MediaWiki has features that Git does not have, like edit tags (which are entirely different than Git tags) or content models. Converting between MediaWiki pages and files would create further incompatibilities (for example, "Foo//bar" is a valid page title but not a valid file path, and "Foo[]bar" is a valid file name but not a valid page title).

Because a push from Git to MediaWiki would be a lossy operation, Git users would have to "resynchronize" their local repository every time after doing it; forgetting to do that would potentially create a very confusing situation (commit hashes not matching). Probably not a big deal for software engineers, but newbie developers might not be able to fix their local repository afterwards (other than deleting it and cloning again). Although I guess this isn't that different from the git-review workflow we currently require.

I think we would have to actually store the resulting repository somewhere. For large gadgets with rich history (e.g. Popups or Twinkle), the repository would probably be several megabytes, which might not be practical to generate on-demand. (And then we need some throttling/cleanup code to avoid DOS attacks by someone generating a repository for every possible combination of pages, or requiring a whitelist in site configuration, or adding a user permission…)

All of the above does not mean this would be impossible, it would just be more difficult than you're making it seem.

Which actual problem is not already solved by this API, apart from specific technology preferences (git), which would justify adding further code complexity?

The problems being caused by those specific technology preferences are enormous. We already had the most prominent page on Wikimedia seized by some WMF developers who wanted to work on it but refused to edit it directly on the wiki. (Wiki pages are apparently "hard for engineers to maintain".) The page was literally taken off-wiki and moved to gerrit, in defiance of one of the most basic principles of the Wikimedia.

I'm not sure what this is about. The wikipedia.org portal? That's the only such situation I can think of.

I will agree though, that from my perspective, wiki pages are more annoying to maintain. They lack anything resembling code review (I've seen many typos break important gadgets that could have been easily avoided if only a second person looked at the change before it went live), it is usually impossible to test changes without deploying them "in production" (TemplateSandbox is a recent godsend, but this is still an issue for gadgets, or indeed things like the portal), and it is impossible to make matching changes to several pages at the same time (e.g. changing a dependency in a gadget definition while simultaneously updating the code to use the new one).

I am unfamiliar with how the portal was maintained before, but now that I'm thinking about it, I'm not sure if your proposal would be satisfactory. Presumably the portal site would still be deployed from the Git repository, so changes to any "matching" MediaWiki pages would not be immediately reflected, and they would have to be pulled into the master Git repository and reviewed before being deployed.

To clarify/summarize my thoughs: I think this would be absolutely cool, but I'm afraid that it would require the users to be familiar with the internals of both MediaWiki and Git to use this system (and especially to make sense of the situation when something inevitably goes wrong).

I think @MarkAHershberger is working on something like this?

I'm planning to work on T187749: Make it possible to use code from an external repository for editor-controlled Javascript/CSS which is a somewhat different approach to the same problem.

Making the MediaWiki API talk through the git protocol seems very hard. You need to implement some transfer protocol (there are a couple PHP libraries for implementing git client functionality, I don't think there are any for server-side), create a new authentication plugin, figure out how to convert git's dual author + committer identity to MediaWiki's single identity and back, store enough git metadata to be able to create valid commit chains (brings up all kinds of fun questions, e.g. what happens on revision deletion?)... I don't mean to be negative, but this is just not going to happen in the shape described here.

If we think so then there should be an external site in media-wiki for developers. Media-wiki should have a store house for some important code. So that any developers can modify them by editing or collect them from this wiki. Thank you

@Md_Tanbir_Islam: This ticket is specifically about Git and MediaWiki interaction; see the task description. MediaWiki itself already offers creating and storing on-wiki gadgets and user scripts for JavaScript as wiki pages; this already exists and is off-topic for this task (please see the task description). Thanks.

Using https://github.com/Git-Mediawiki/Git-Mediawiki I can at least get a backup of all wiki pages for wiki's I am working on in plaintext format. Unfortunately, it's unmaintained but would be a great feature if maintained upstream at Wikimedia!

Anyone is very welcome to fork it and maintain it; this does not require any "upstream" (which also sometimes struggles to maintain its upstream stuff)...

Anyone is very welcome to fork it and maintain it; this does not require any "upstream" (which also sometimes struggles to maintain its upstream stuff)...

Anyone is welcome to submit pull requests, as well. There is some "new" life on that repo (see https://github.com/Git-Mediawiki/Git-Mediawiki/issues/33) and I may have time to begin contributing again.