Technology to transclude git content into wiki pages
Open, LowestPublic

Description

We have content in git repositories that we want to show in wiki pages:

Documentation in git

Sample code from git

Minimum viable product

  1. Develop an extension such that existing wiki page can transclude latest content from git using some syntax, probably like Git2Page's {{snippet}} parser function, with similar functionality.
  2. Using this editors can
    • transclude wikitext into the page
    • It can provide source code to the <source> tag or parser function to transclude highlighted source code into the page.
  3. The extension is deployed on the production wiki for mediawiki.org:
    • caching strategy
    • continuous integration
    • security review
    • etc.

Possible/desirable features

After the MVP!

  • Transcluded content is available to Lua modules to operate on, e.g. transform hooks.txt into something useful.
  • It should be possible to invoke this functionality from a wikitext template, so that e.g. a wiki editor can write a Template:Show_extension_README or `Template:PHP_code_snippet
  • Render Markdown files (e.g. using PHP Markdown)
  • Render .txt files with some minimal formatting (like the <poem> tag?)
  • Render other formats if agreed (e.g. pandoc).

For sample code, editors need to transclude parts of Git files into a wiki page (see example). Possible approaches:

  • In the parser tag, specify a range of lines from the file, like Git2pages' {{snippet}} parser function:
    • {{#snippet:repository=mediawiki/extensions/examples | filename=examples/ContentAction/ContentAction.php |startline=24 |endline=36}}
  • Could also/instead identify sections of code with starting and ending markers. Developers would annotate files in git with something like // begin name=simple-text-field and a matching // end name=simple-text-field , and then in the parser tag wiki editors specify {{#snippet: ... |section=simple-text-field` }}.
    • Obviously developers might mistype, delete, or duplicate these section marker names, but it's developer error; the extension can set a hidden category "Pages with garbled Git section transclusion", or warn, or try to do the right thing.
  • Could leave it up to a Lua module to get sections of the file using either technique.
Unlikely/non features
  • Create sets of wiki pages from git files (e.g. doc/hooks.txt)
    • if you want that, probably publish HTML to http://doc.wikimedia.org; or develop a Lua module that invokes this task's functionality and grabs the right part of the file.
  • Push git content to wiki page using some kind of src->destWikiPage mapping file
    • instead just have a wiki bot insert the parser function into pages
  • Explore alternative syntax?: develop a special page e.g. Special:FromGit, and transclude that in other pages, e.g. {{[[Special:FromGit/project/git/path/path/to/file]]}} (This is a similar approach to {{Special:ApiHelp/query+categoryinfo}} which transcludes action API documentation.)
    • it's hard to specify all the parameters. What's the actual use case for this syntax?

Prior art

Information for Possible-Tech-Projects

  • Primary mentor:
  • Co-mentor: @Tgr
  • Other mentors:

(optional, Phabricator username)

See also

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
Laucon added a comment.Oct 6 2015, 1:46 AM

For this project, must the applicants be able to program in PHP or can they be the kind of people who are excellent at documentation and just generally understand PHP and how it functions and what it does?

For this project, must the applicants be able to program in PHP

Yes. To get MediaWiki to do this will require writing new extension code in PHP; enough to adapt/rewrite the code in existing extensions Git2Pages and Github, and to understand @Tgr's critique of the former above.

or can they be the kind of people who are excellent at documentation and just generally understand PHP and how it functions and what it does?

That's me :-) WMF wants to actually use this code... once someone writes it.

Tgr added a comment.Oct 6 2015, 2:15 AM

The intern will need to write software in PHP. This is a programming task, not a documentation task, the software that needs to be written just happens to have documentation-related functionality.

If you have experience programming in another language, then learning PHP as part of the internship would IMO be fine. If this would be the first time you write code, then I don't think this task is good for that. (Looking at the list of other projects, maybe T91335 is good for someone just beginning to code?)

Tgr added a comment.Oct 6 2015, 2:18 AM

A random list of skills that will probably be needed for this task:

  • MediaWiki wikitext syntax
  • object-oriented PHP programming
  • database fundamentals (MySQL/MariaDB)
  • basic familiarity with caching (memcached)
  • unit testing
  • git rebase workflow

I would say you should be familiar with at least half of them, the other half can be learned along the way.

Laucon added a comment.Oct 6 2015, 2:16 PM

Thanks for explaining. After hearing this, to be honest, I don't think I'm completely suited for this project. I'm sorry, Are there any documentation projects in Wikimedia where the intern should understand how code functions but doesn't necessarily need to write code? If there isn't anything here like that, that's ok. I just wanted to ask. Thank you guys so much!

I find this project very interesting and like to propose for the Outreachy internship program. What would be the next step?
Do I need to do the second step "create a new Phabricator task for your proposal" as mentioned on https://www.mediawiki.org/wiki/Outreachy/Round_11#Candidates . Thanks.

Tgr added a comment.Oct 6 2015, 6:45 PM

@Akangupt, yes, that would be the first step.

Spage edited the task description. (Show Details)Oct 6 2015, 9:10 PM

@Tgr, @Krinkle, and I talked about this (and generated documentation, improving doc.wikimedia.org, etc...). Although it's often better for wiki pages to link to files in git or to generated doc, there are use cases for the functionality in this task, so OK to proceed.

Extension sketch
  • HTTP request the raw file from a git server
  • and cache it, with a suitable expiration
  • and set an expiration in the parser cache
  • "do the right thing" with retrieved content ( .mediawiki, .php, .md)

This is just a generic extension to request page content over HTTP, with limitations.
What's the best existing Extension:GetStuffFromHTTP ? Maybe Extension:GitHub is already it, and we're nearly done.

Security concerns
  • whitelist the requests with a $wgGitServerUrl config
  • ping/rate limiter on requests to prevent abuse
  • sanitize HTML
  • sanitize if just handing wikitext to parser
Unknowns
  • how do we translate a wiki page with transcluded content ({{gitsnippet | file=database_tables.mediawiki}} in it?
  • Provide the raw text to Lua for parsing and extraction
Possible
  • way to purge old
    • cron job to notify/update when git updates?
    • Note: often wiki pages shouldn't be transcluding master HEAD, they should be transcluding a fixed version "Here's a sample skin.json in MediaWiki 1.26"
Qgil changed the title from "technology to transclude git content into wiki pages" to "Technology to transclude git content into wiki pages".Oct 7 2015, 7:47 PM
Qgil edited the task description. (Show Details)

So, first I should start working on T114719: Convert Git2Pages to use extension registration? Thanks.

Tgr added a comment.Oct 8 2015, 8:34 AM

Probably not since it has two people working on it already. We can find another microtask.

Spage edited the task description. (Show Details)Oct 9 2015, 5:46 AM

@Tgr, I want to apply for this project for the next round of outreachy, kindly specify which microtask I should work on. Should I work on this T115388: Convert hooks.txt to YAML [Outreachy microtask] ?

@Tgr, I want to apply for this project for the next round of outreachy, kindly specify which microtask I should work on. Should I work on this T115388: Convert hooks.txt to YAML [Outreachy microtask] ?

@Shrutika719, since it's listed as a microtask and is yet unresolved, feel free to work on it.

Tgr added a comment.Oct 16 2015, 6:10 PM

That task is already assigned; I'm not sure it is a good idea to have multiple people working on the same microtask in parallel as the time of one of them will be wasted. In this case the task is flexible enough that they would probably come up with different solutions and both can be accepted, but it I would prefer to give people real tasks.

That said, I can't really think of an easy task related to this project. I'll ask around...

@Tgr, Did you look for any other microtask?

Tgr added a subscriber: bd808.Oct 18 2015, 9:32 PM

Yes; you could change maintenance/findHooks.php to work with a YAML file instead of the current textfile. That kind of presupposes T115388 but you could just convert a few hook definitions by hand, delete the rest, and use that for testing.

@bd808 can help with figuring out which YAML parser to use.

01tonythomas added a subscriber: 01tonythomas.

I am shifting this to Outreachy-Round-11 as the project description has two mentors, micro-tasks and looks ready for the 11th edition of Outreachy ( Dec 2015 - Mar 2016 ) . Potential candidates should start by submitting their proposals as a blocker for this task, by November 02.

Feel free to revert it back, if this task has some relevant issues which might block its completion in this term of Outreachy.

Tgr added a comment.Oct 20 2015, 12:21 AM
In T91626#1733600, @Tgr wrote:

Yes; you could change maintenance/findHooks.php to work with a YAML file instead of the current textfile. That kind of presupposes T115388 but you could just convert a few hook definitions by hand, delete the rest, and use that for testing.

Turned it into a proper task: T115959

Spage edited the task description. (Show Details)Oct 20 2015, 9:49 AM
Devirk added a subscriber: Devirk.Oct 21 2015, 7:25 AM

Hello! I am interested to work on this project for the upcoming Outreachy Round 11. I have decent knowledge in php , git and database fundamentals.Hope that's enough to get started! Can anybody point out any microtasks so that i can I can start working on it. Thanks.

Hello Everyone! I am interested in this project to work on the 11th round of Outreachy internship. I have reasonable knowledge in php and Git. Can someone direct me on a possible path to get started?

@Tgr I am almost done with the microtask assigned to me. Is there any other microtask that needs to be done before submitting the proposal? If yes, kindly specify one.

Spage edited the task description. (Show Details)Oct 26 2015, 7:03 AM
Ankitashukla added a subscriber: Ankitashukla.
Tgr added a comment.Oct 27 2015, 8:56 PM

@Devirk, @Hansika11: sorry for the slow answer. There are already 3 people applying to this task; unless you have a very strong preference for it, I would suggest finding another one less competition.

@Shrutika719 no more microtask required (you are of course always welcome to do more and it might help us in judging your coding skills), but you need to create a proposal as described in https://www.mediawiki.org/wiki/Outreachy/Round_11#Candidates.

It can provide source code to the <source> tag or parser function to transclude highlighted source code into the page.

Does this project demand to have all the functionalities of SyntaxHighlight_GeSHi ?

It can provide source code to the <source> tag or parser function to transclude highlighted source code into the page.

Does this project demand to have all the functionalities of SyntaxHighlight_GeSHi ?

It can require that extension and take advantage of it and even propose changes to it. We certainly don't want to reinvent the wheel.
It seems there are two ways to get syntax highlighting.

  1. Add parameters to this extension's parser function (or parser tag) to specify syntax highlighting, lang, etc., then somehow invoke SyntaxHighlight_GeSHi's processing.
  1. Let wiki editors embed this extension's parser function (or parser tag) inside the source tag.

For 2, I played around with Git2Pages on my local wiki and with some changes got this invocation to work:

{{#tag:source|{{#snippet:repository=https://gerrit.wikimedia.org/r/mediawiki/extensions/BoilerPlate |filename=BoilerPlate.php}} |lang=php}}

I.e. I invoked <source> as a parser function using {{#tag:}} (Manual:Tag extensions says "All tag extensions can also be called as a parser function using {{#tag:tagname|input|attribute_name=value}} which will have pre save transform applied). For its input text to highlight I invoked Git2Page's {{#snippet:...}} parser tag which pulled in code from a repo. Git2Pages output has to change when supplying plain text for SyntaxHighlight_GeSHi, the changes are in P2252.

This comment was removed by Akangupt.
Spage edited the task description. (Show Details)Nov 3 2015, 3:59 AM
Spage edited the task description. (Show Details)Nov 5 2015, 5:21 AM
Devirk removed a subscriber: Devirk.Nov 5 2015, 5:57 PM

@Akangupt and others,
This task's description, under Possible/desirable features, mentioned:

  • Transcluded content appears in a tag or template that identifies source, so that users can edit the text around it.
    • for simple transclusion MVP, simply invoke it from a wiki template similar to Template:Api_help).

I think the first bullet is a holdover from when this task was titled " "technology to push or pull remote text content into wiki pages"". If a system could push git content into a wiki page then editors would obviously want to know where it came from, e.g.

blah blah blah
<!-- the following wikitext came from extensions/SemanticResultFormats/README.wiki, inserted by MyMagicPushContent on 2015-11-06 -->
Semantic Result Formats (a.k.a. SRF) is an extension to MediaWiki that ...
The individual formats can be added 
...
<!-- end of wikitext from extensions/SemanticResultFormats/README.md -->
blah blah blah

Now that this task has been refined and narrowed to a pull technology, that doesn't apply. Wiki editors will see

blah blah blah
{{#MyMagicGitInclude: project=mediawiki/extensions/Wikibase| file=README.wiki}}
blah blah blah

So I will remove the mis-feature. I apologize for the confusion!

As for the sub-bullet, it should be possible to invoke this functionality from a wikitext template, so that e.g. a wiki editor can write a Template:Show_extension_README or Template:PHP_code_snippet. I assume that is doable, it probably just requires fiddling around to quote {{, |, =, etc. I will add that to the description.

Spage edited the task description. (Show Details)Nov 5 2015, 10:40 PM
Spage edited the task description. (Show Details)Nov 5 2015, 11:09 PM
Sumit added a subscriber: Sumit.Feb 19 2016, 8:15 PM
NOTE: Outreachy round 12 applications are now open and GSoC 2016 is round the corner. This project was featured for Outreachy round 11 and has a well defined scope. Are you ready to mentor the project this season? If yes, then we'll feature this for Outreachy round 12 and GSoC 2016 as well. Please reply back in comments.
Restricted Application added a subscriber: JEumerus. · View Herald TranscriptFeb 19 2016, 8:15 PM
This comment was removed by nikhil_yadala.

Hello developers,
I am interested in taking this project in the GSOC,2016. I am a cse sophomore at IIT Guwahati(india). I have enough expertise in C,C++,PHP,Python,GIT Currently i am developing a python to c++ translator.
Would any one guide me to get acquainted with enough knowledge regarding this project?

thanks,
Nikhil Yadala

Hello @nikhil_yadala . If you're new to MediaWiki, start by going through https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker to get acquainted with it. Then look for and try to solve bugs marked Easy here on Phabricator. After you gain a basic idea of MediaWiki, you can try to understand what the above task is about. Feel free to ask questions, where you don't follow, here or on irc. Thanks for your interest!

Kc5vcx added a subscriber: Kc5vcx.Feb 22 2016, 2:23 AM

Hi,

I'm interested in doing this for GSoC.
The microtasks mentioned on this page are either closed or have a patch submitted. Could someone add a few more ?

Qgil added a comment.Feb 22 2016, 8:48 PM

For what is worth, @Spage cannot mentor this project in this round.

@Tgr, @Ankitashukla, are you volunteering as mentors for this project idea in the upcoming round?

Hi sumit,
thanks a lot. I have cloned the project and set it up successfully on my
ubuntu expect for the skins(which i guess i would be able to with out any
problem). I have even gone through the links that you had referred to. Now
iam trying to solve some beginner level bugs.
Meanwhile, is there any other thing that i should be knowing(like
understanding the function of certain module) particularly in reference to
this projec?

thanks,
Nikhil Yadala.

@Qgil @nikhil_yadala Unfortunately I won't be able to mentor during this round of GSOC and Outreachy.

@Tgr would you be willing to mentor this project for this round?
@Qgil @Ankitashukla @Spage any suggestions on who might be interested in mentoring this project for the upcoming GSoC/Outreachy rounds?

aaron removed a subscriber: aaron.Feb 24 2016, 6:15 PM

@Bawolff would you be willing to mentor this project for this round of GSoC/Outreachy?

Galorefitz edited the task description. (Show Details)Mar 1 2016, 2:57 PM

I'm keen to work on this project. Kindly brief me about where to start from. Your help is deeply appreciated. Thank-you.

Another query I have is that this "Bug fixing" is in order to facilitate a comprehensive understanding of mediawiki or a selection criteria. Because I've been working on mediawiki for a month now on my local machine and don't feel the need for a "basic understanding". Thank-you.

Qgil added a comment.Mar 2 2016, 10:15 AM

My gut feeling tells me that this is not a good project idea candidate for this round. It is a very low priority in the Wikimedia tech context, and I don't expect new mentors to show up and volunteer. I will move it from the Featured column to Missing mentors, but in fact I believe Check in the next round would be more accurate.

GSoC and Outreachy candidates are strongly encouraged to look somewhere else for this round.

Qgil lowered the priority of this task from "Normal" to "Lowest".

Another query I have is that this "Bug fixing" is in order to facilitate a comprehensive understanding of mediawiki or a selection criteria.

Hi @Khannaanant262129! Please see https://www.mediawiki.org/wiki/Outreach_programs/Life_of_a_successful_project#Answering_your_questions for information on best audiences and successful questions. Thank you!

ZhouZ moved this task from Backlog to Assigned on the WMF-Legal board.Apr 14 2016, 1:16 AM
ZhouZ added a subscriber: ZhouZ.
Sumit added a comment.Sep 11 2016, 5:00 PM

My gut feeling tells me that this is not a good project idea candidate for this round. It is a very low priority in the Wikimedia tech context, and I don't expect new mentors to show up and volunteer. I will move it from the Featured column to Missing mentors, but in fact I believe Check in the next round would be more accurate.

GSoC and Outreachy candidates are strongly encouraged to look somewhere else for this round.

@Qgil although we have a good scope here, do you think Wikimedia tech priority on this has changed since then to warrant this for an Outreachy internship project for the current round?

@Bawolff would you be willing to mentor this project for this round of GSoC/Outreachy?

Sorry, I don't want to be a mentor this round (Or I guess last round which was one you asked).

Qgil added a comment.Sep 12 2016, 9:13 AM

Right now we don't have a clear use case of this, but we might have it for the next round. Let's keep this task open just in case.

Sumit added a comment.Mar 1 2017, 5:10 AM

@Qgil is there a use case for this project now? We're looking for its feasibility for current round of GSoC/Outreachy. If no, can we decline this?