==Project
====Possible Mentors
- Primary Mentor: @Spage
- Co-mentor: @Tgr
- Other mentor(s): @Ankitashukla
====Synopsis
A lot of the documentation for MediaWiki lives on git. People resort to copy-pasting of this content into wiki pages, resulting in copies of the same content lying around in multiple locations, tending to drift away from one another as some get updated, and some don't. A solution to this is to develop an extension that can transclude content (snippets of sample code, documentation, etc.) from the files on a git server in wiki pages as and when needed., and allow for users to preprocess this transcluded content (using lua modules, for example).
====Prior Art
[[https://www.mediawiki.org/wiki/Extension:Git2Pages | Extension:Git2Pages]] - An extension to get snippets of text from a file on Github using start and end line numbers. This extension parse clone’s the file needed locally and then appends this text to the wiki page. Parse cloning a file is a major security concern when dealing with the WMF cluster.
[[https://www.mediawiki.org/wiki/Extension:GitHub | Extension:GitHub]] - An extension that gets a file from GitHub via a HTTP request for embedding in wiki pages. This extension does not allow for partial transclusion (snippets) from a file.
====Workflow of the extension
# Wiki page editor requests for a snippet of documentation or code using a magic word that allows the user to specify the git repository source, possibly a start and end point for the snippet (something similar to `{{#snippet:}}` in [[https://www.mediawiki.org/wiki/Extension:Git2Pages#Usage | Extension:Git2Pages]])
# The extension queries (HTTP) the git server for the requested snippet, caches it with a suitable expiration date. A cron job deletes the cache when transcluded text is updated on git (in the case where the text is transcluded from `HEAD`). The functionality for this would be something similar to the parser function for [[https://www.mediawiki.org/wiki/Extension:GitHub | Extension:GitHub]]’s magic word `{{#github:}}`.
The git servers for which support will be provided are GitHub, GitLabThe git servers for which support will be provided are GitHub, BitbucketDiffusion and Gitblit (git.wikimedia.org).
# The extension receives the snippet of text that it requested for, and converts it to (if required, according to the ‘raw’ parameter setting):
- HTML - For normal text and markdown.
- contents of a `<syntaxhighlight>` tag - for sample code snippets (see [[https://www.mediawiki.org/wiki/Extension:SyntaxHighlight_GeSHi | SyntaxHighlight]]).
- nothing, render as is - for wikitext.
# The extension then sanitizes the HTML and/or wikitext , and renders it onto the wiki page.
====Phases
- **Phase I** : Write the parser function (gets called with a magic word) to query the git server for the required code/text snippet
- Wiki page to the extension: A magic word that calls the parser function.
- This will have the following parameters:
- **source** (repository): Depends on the technology that hosts the repository. No default.
- **filename:** Name of the file from which the text is to be transcluded. No default.
- **commit-id:** (optional) The commit ID the file is supposed to be pulled from. Defaults to HEAD.
- **raw:** (optional) Accepts values yes or no. This parameter allows users to get plain text (with none of the extension’s formatting based on file extensions) that will allow them to format text as they like or use it for further processing (eg. processing text through a lua module). Defaults to no.
- **file-extension:** (optional) Overrides the detected extension of the file provided, or allows the user to specify one in case it has no extension. No default.
- **startline:** (optional) The line number that tranclusion begins from. Defaults to start of the file.
- **endline:** (optional) The line number that tranclusion end at. Defaults to end of file.
- **start marker:** (optional) The start marker (something like //start-section: blah, or any other string/word that marks the beginning of the piece of text that is to be transcluded) for the piece of text that is to be transcluded. This is to make sure we are transcluding the intended content. No default.
- **end marker:** (optional) The end marker (something like //end-section: blah, or any other string/word that marks the end of the piece of text that is to be transcluded) for the piece of text that is to be transcluded. This is to make sure we are transcluding the intended content. No default.
- **Phase II** : Querying the git server (HTTP) and rendering recieved text
- Provide support various git hosting technologies and servers (GitHub, Bitbucket, GitLab, Gitblit, Diffusion (using the conduit [[https://phabricator.wikimedia.org/conduit/method/diffusion.filecontentquery/ | API method: diffusion.filecontentquery]])). The received text will, for now, be fed into the database as is.
- All text is to be converted to its rendering equivalent, as and when requested by user (see point 3 in workflow of the extension), sanitized (if needed), and rendered.
- Rendering:
All the transcluded text is going to be within a div in order to tell the wiki user explicitly that the content they are seeing has been transcluded. This div will also be mentioning what file and repository the text has been transcluded from. Let's say the string for the same is:
```
$transclusionBox = "<div class=transclusion-box>" . $metaInfo;
```
Here is how some formats are going to be transcluded:
- For wikitext,
```
$output = $tranclusionBox . $parser->recursiveTagParseFully($contents) . '</div>';
return array( $output , 'nowiki' => true, 'noparse' => true, 'isHTML' => true );
```
- For plain text,
```
$output = $tranclusionBox . '<poem><nowiki>' . htmlspecialchars( $content ) . '</nowiki></poem>' . '</div>' ;
return array( $output, 'nowiki' => false, 'noparse' => false, 'isHTML' => true );
```
or
```
$output = $tranclusionBox . '<pre>' . htmlspecialchars( $content ) . '</pre></div>';
return array( $output, 'nowiki' => true, 'noparse' => true, 'isHTML' => true );
```
depending on preference. A `<pre>` tag might be a good idea since code is generally seen in monospace. On the other hand, the `<poem>` tag is prettier.
- For code,
```
$output = $tranclusionBox . '<syntaxhighlight lang =' . $lang . '>' . $content . '</syntaxhighlight></div>';
return array( $output, 'nowiki' => false, 'noparse' => false, 'isHTML' => true );
```
- **Phase III** : Caching and updating transcluded text
- Caching will be done using BagOStuff, or the MediaWiki parser cache, with an expiration time (probably a day). with a set expiration time (probably 3 days)All transcluded content will be cached.
- A cron job to update/purge cache in case the file has been updated for commit-id set to `HEAD`( using a DB table for maintaining a record of the source and parameters).s a very probable feature, The cases where a specific commit ID is given,the transcluded content will be stored in a database and cached. the snippet gets stored in a database and is never updated.A cron job would then update the database and purge the cache in case the file from which the content has been transcluded gets updated.
- **Phase IV** : Deployment of the extension on MediaWiki
- Create the required documentation page for the extension.
- Announce the deployment on `wikitech-l` mailing list
===Deliverables
- **The MVP**
- Parser function with the parameters source, filename, commit-id, raw, file-extension, startline and endline.
- HTTP requests to GitHub and Gitblit.
- Sav- Caching thfile content to the databaset and rendering wikitext (.mediawiki), plain text (.txt) and code (.php, .py, .json, etc).
- **The Extension**
- Parser function with all 8 parameters parameters.
- HTTP requests to GitHub, GitLab, Bitbucket, Diffusion and Gitblit.
- Caching and saving content to the database (probable), and rendering wikitext, markdown, plain text, code
- **Documentation**: For the extension’s usage and setup, on the extension’s wiki page.
=== Probable Features
- Setup webhooks for different git servers that are being supported. Gitblit instances will still be handled by the cron job.
- Add support for commit-ish, by converting the branch and commit-id parameters to one commit-ish parameter. This, though, will be done only if it is agreed upon that shifting to this parameter will not affect user-friendliness (Not many people understand what a commit-ish is).
- Enable the cron job to update the database and cache in case the file has been updated for commit-id set to `HEAD`( using a DB table for maintaining a record of the source and parameters). The cases where a specific commit ID is given, the snippet gets stored in a database and is never updated.
- Fetch extension.json and parse it to feed it as input to the infobox template. This will need additional parameters in the parser function:
- **params:** TemplateParameter( CorrespondingFileKey ); a list of such parameters, separated by commas. No default.
For example,
```{{#TranscludeGit: source=wikimedia/mediawiki-extensions-MultimediaViewer | branch=master | filename=extension.json | params= author(author), license(license-name), version(version))}}```
will return `author=<author> |license=<license-name> |version=<version>`. This can then be used as `{{TNT|Extension |{{#TranscludeGit …}} }}`.
This, though, will not resolve the problem of nested JSON/YAML objects. Further ideation on this will be done after feedback from and discussion with wiki editors.
====Timeline
| **Time Period** | **Task(s)** |
| Nov 17 - Dec 7 | Community Bonding Period, request for a gerrit repository and a Wikimedia labs instance for the extension. Go through existing art (Extension:GitHub, Extension:Git2Pages) and decide structure of the source code. |
| **Weeks 1, 2** (Dec 7 - Dec 20) | Prepare the parser function such that it can fetch a file from github and git.wikimedia.org (Gitblit) given the repo, branch, filename, `raw` parameter, file-extension and commit hash. No caching yet.Cache the transcluded text. (Phases I, (Phase II & III) |
| **Week 3, 4** (Dec 21 - Jan 3) | Implement startline and endline parameters. Render wikitext and code files.|
| **Weeks 5** (Jan 4 - Jan 10) | Deploy MVP on a labs instance. Open for testing, announce on wikitech-l for community review. Basic documentation on extension’s wiki page. |
| **Weeks 6** (Jan 11 - Jan 17) | Work on start-marker and end-marker parameters of the parser. |
| **Weeks 7** ( Jan 18 - Jan 24 ) | Fix major issues that come along with community reviews. Add support for markdown|
| **Weeks 8, 9** (Jan 25 - Feb 7) | Work on supporting other git servers (GitLab, Bitbucket, GitblitDiffusion, Diffusion) (Phase II)|
| **Week 10** ( Feb 8 - Feb 14) | Implement caching using BagOStuffand any other whose need is felt) (Phase III)|
| **Weeks 11 10, 12 **11** ( Feb 158 - Feb 28 1) | Work on probable features. Enable extension to be called as a template. |
| **Week 13** (s 12, 13 **( Feb 292 - March 7 7 ) | Fix minor issues, file bugs. Complete documentation on extension’s wikipage. Deploy extension. (Phase IV) |
==Profile
**Name:** Smriti Singh
**Email:** smritis.31095@gmail.com
**IRC handle (freenode):** galorefitz
**Internet Presence** : [[https://www.mediawiki.org/wiki/User:Galorefitz | MediaWiki user page]]
**Location:**
1. Manama, Kingdom of Bahrain (December) (+03:00 GMT)
2. Hyderabad, India (January - March) ( +05:30 GMT)
**Typical Working Hours:**
1. **December**: 6-8 hours a day, between 6:30 a.m - 8:30 p.m (GMT) (at least 40 hours a week)
2. **January - March** : 6-8 hours a day, between 10:30 a.m - 7:30 p.m (GMT) (at least 40 hours a week)
====Communication
I will be submitting reports weekly on a phabricator task, tracking the progress of the project, on Phabricator. I will be available on `#wikimedia-dev` and `#wikimedia-tech` on IRC (freenode) during my working hours, so I can be reached there. I will also be responding on the relevant phabricator tasks. The source code will be pushed to Gerrit, and will be accessible, viewable and reviewable there.
====Contributions
The [[https://phabricator.wikimedia.org/T114719 | microtask]] I completed for this project helped me find a [[https://phabricator.wikimedia.org/T115206 | bug]] in the extension (Git2Pages). I investigated it, but couldn't completely resolve it, and so posted my findings on the filed task. My contributions to the community can be viewed [[https://gerrit.wikimedia.org/r/#/q/owner%3A%22Galorefitz%22,n,z | here]] and [[https://phabricator.wikimedia.org/p/Galorefitz/ | here]].
====About Me
**Education:** In progress. I am a Computer Science Student (currently in my third year), studying at the International Institute of Information Technology (IIIT), Hyderabad, India.
**How did you hear about this program?** From senior year students in college.
**Why MediaWiki?** I started contributing to MediaWiki in May, 2015. The people here were quick to respond, very encouraging, and amazingly helpful. Contributing, seeing my work make a difference to the community, felt great. Not many communities have all of the above, and this is what encouraged me to choose MediaWiki.
**Why this project?** Well, documentation is important. Even more so for newcomers, who are just beginning to maneuver their way through **so much code**. If the documentation that they (and for that matter, anyone) have access to has several versions, that say different things, it gets confusing. It’s our responsibility, as a community, to ensure that the documentation being provided by us is consistent and up-to-date.
===Additional Information (as mandated by [[https://wiki.gnome.org/Outreachy#Application_Form | Gnome]])
**Do you meet the eligibility requirements outlined [[https://wiki.gnome.org/Outreachy#Eligibility | here]]** ? - Yes.
**Preferred pronoun** - she
**Prior Commitments** -
December, 2015 : None.
January, 2016 - March, 2016 : College, will take approximately 30 hours a week (including examination times)
**Course details**
My college has a system of electives for every semester, and so I am not sure right now what courses I’ll be taking and how many credits they’ll be. The total number of credits, though, will be either 16 or 20. For more information, please refer to https://www.iiit.ac.in/academics/curriculum/undergraduate/BTech-CSE (Year III, Semester II)