Page MenuHomePhabricator

Extension to Transclude Git content into Wiki Pages (Outreachy proposal)
Closed, DeclinedPublic

Description

Project

Possible Mentors
Synopsis

​A lot of the documentation for MediaWiki lives on git. People resort to copy-pasting of this content into wiki pages, resulting in copies of the same content lying around in multiple locations, tending to drift away from one another as some get updated, and some don't. A solution to this is to develop an extension that can transclude content (snippets of sample code, documentation, etc.) from the files on a git server in wiki pages as and when needed, and allow for users to preprocess this transcluded content (using lua modules, for example).

Prior Art

Extension:Git2Pages - An extension to get snippets of text from a file on Github using start and end line numbers. This extension parse clone’s the file needed locally and then appends this text to the wiki page. Parse cloning a file is a major security concern when dealing with the WMF cluster.

Extension:GitHub - An extension that gets a file from GitHub via a HTTP request for embedding in wiki pages. This extension does not allow for partial transclusion (snippets) from a file.

Workflow of the extension
  1. Wiki page editor requests for a snippet of documentation or code using a magic word that allows the user to specify the git repository source, possibly a start and end point for the snippet (something similar to {{#snippet:}} in Extension:Git2Pages)
  2. The extension queries (HTTP) the git server for the requested snippet, caches it with a suitable expiration date. A cron job deletes the cache when transcluded text is updated on git (in the case where the text is transcluded from HEAD). The functionality for this would be something similar to the parser function for Extension:GitHub’s magic word {{#github:}}. The git servers for which support will be provided are GitHub, Diffusion and Gitblit (git.wikimedia.org).
  3. The extension receives the snippet of text that it requested for, and converts it to (if required, according to the ‘raw’ parameter setting):
    • HTML - For normal text and markdown.
    • contents of a <syntaxhighlight> tag - for sample code snippets (see SyntaxHighlight).
    • nothing, render as is - for wikitext.
  4. The extension then sanitizes the HTML and/or wikitext , and renders it onto the wiki page.
Phases​
  • Phase I : Write the parser function (gets called with a magic word) to query the git server for the required code/text snippet
    • Wiki page to the extension: A magic word that calls the parser function.
    • This will have the following parameters:
      • source (repository): Depends on the technology that hosts the repository. No default.
      • filename: Name of the file from which the text is to be transcluded. No default.
      • commit-id: (optional) The commit ID the file is supposed to be pulled from. Defaults to HEAD.
      • raw: (optional) Accepts values yes or no. This parameter allows users to get plain text (with none of the extension’s formatting based on file extensions) that will allow them to format text as they like or use it for further processing (eg. processing text through a lua module). Defaults to no.
      • file-extension: (optional) Overrides the detected extension of the file provided, or allows the user to specify one in case it has no extension. No default.
      • startline: (optional) The line number that tranclusion begins from. Defaults to start of the file.
      • endline: (optional) The line number that tranclusion end at. Defaults to end of file.
      • start marker: (optional) The start marker (something like //start-section: blah, or any other string/word that marks the beginning of the piece of text that is to be transcluded) for the piece of text that is to be transcluded. This is to make sure we are transcluding the intended content. No default.
      • end marker: (optional) The end marker (something like //end-section: blah, or any other string/word that marks the end of the piece of text that is to be transcluded) for the piece of text that is to be transcluded. This is to make sure we are transcluding the intended content. No default.
  • Phase II : Querying the git server (HTTP) and rendering recieved text
    • Provide support various git hosting technologies and servers (GitHub, Gitblit, Diffusion (using the conduit API method: diffusion.filecontentquery)). The received text will, for now, be fed into the database as is.
    • All text is to be converted to its rendering equivalent, as and when requested by user (see point 3 in workflow of the extension), sanitized (if needed), and rendered.
    • Rendering: All the transcluded text is going to be within a div in order to tell the wiki user explicitly that the content they are seeing has been transcluded. This div will also be mentioning what file and repository the text has been transcluded from. Let's say the string for the same is:
$transclusionBox = "<div class=transclusion-box>" . $metaInfo;

Here is how some formats are going to be transcluded:

  • For wikitext,
$output = $tranclusionBox . $parser->recursiveTagParseFully($contents) . '</div>';
return array( $output , 'nowiki' => true, 'noparse' => true, 'isHTML' => true );
  • For plain text,
$output = $tranclusionBox . '<poem><nowiki>' . htmlspecialchars( $content ) . '</nowiki></poem>' . '</div>' ; 
return array( $output, 'nowiki' => false, 'noparse' => false, 'isHTML' => true );

or

$output = $tranclusionBox . '<pre>' . htmlspecialchars( $content ) . '</pre></div>'; 
return array( $output, 'nowiki' => true, 'noparse' => true, 'isHTML' => true );

depending on preference. A <pre> tag might be a good idea since code is generally seen in monospace. On the other hand, the <poem> tag is prettier.

  • For code,
$output = $tranclusionBox . '<syntaxhighlight lang =' . $lang . '>' . $content . '</syntaxhighlight></div>';
return array( $output, 'nowiki' => false, 'noparse' => false, 'isHTML' => true );
  • Phase III : Caching and updating transcluded text
    • Caching will be done using BagOStuff or the MediaWiki parser cache, with an expiration time (probably a day). All transcluded content will be cached.
    • As a very probable feature, the transcluded content will be stored in a database and cached. A cron job would then update the database and purge the cache in case the file from which the content has been transcluded gets updated.
  • Phase IV : Deployment of the extension on MediaWiki
    • Create the required documentation page for the extension.
    • Announce the deployment on wikitech-l mailing list

Deliverables

  • The MVP
    • Parser function with the parameters source, filename, commit-id, raw, file-extension, startline and endline.
    • HTTP requests to GitHub.
    • Caching file content and rendering wikitext (.mediawiki), plain text (.txt) and code (.php, .py, .json, etc).
  • The Extension
    • Parser function with all 8 parameters parameters.
    • HTTP requests to GitHub, Diffusion and Gitblit.
    • Caching and saving content to the database (probable), and rendering wikitext, markdown, plain text, code
  • Documentation: For the extension’s usage and setup, on the extension’s wiki page.

Probable Features

  • Setup webhooks for different git servers that are being supported. Gitblit instances will still be handled by the cron job.
  • Add support for commit-ish, by converting the branch and commit-id parameters to one commit-ish parameter. This, though, will be done only if it is agreed upon that shifting to this parameter will not affect user-friendliness (Not many people understand what a commit-ish is).
  • Enable the cron job to update the database and cache in case the file has been updated for commit-id set to HEAD( using a DB table for maintaining a record of the source and parameters). The cases where a specific commit ID is given, the snippet gets stored in a database and is never updated.
  • Fetch extension.json and parse it to feed it as input to the infobox template. This will need additional parameters in the parser function:
    • params: TemplateParameter( CorrespondingFileKey ); a list of such parameters, separated by commas. No default.

For example,

{{#TranscludeGit: source=wikimedia/mediawiki-extensions-MultimediaViewer | branch=master | filename=extension.json | params= author(author), license(license-name), version(version))}}

will return author=<author> |license=<license-name> |version=<version>. This can then be used as {{TNT|Extension |{{#TranscludeGit …}} }}.
This, though, will not resolve the problem of nested JSON/YAML objects. Further ideation on this will be done after feedback from and discussion with wiki editors.

Timeline

Time PeriodTask(s)
Nov 17 - Dec 7Community Bonding Period, request for a gerrit repository and a Wikimedia labs instance for the extension. Go through existing art (Extension:GitHub, Extension:Git2Pages) and decide structure of the source code.
Weeks 1, 2 (Dec 7 - Dec 20)Prepare the parser function such that it can fetch a file from github and git.wikimedia.org (Gitblit) given the repo, branch, filename, raw parameter, file-extension and commit hash. Cache the transcluded text. (Phases I, II & III)
Week 3, 4 (Dec 21 - Jan 3)Implement startline and endline parameters. Render wikitext and code files.
Weeks 5 (Jan 4 - Jan 10)Deploy MVP on a labs instance. Open for testing, announce on wikitech-l for community review. Basic documentation on extension’s wiki page.
Weeks 6 (Jan 11 - Jan 17)Work on start-marker and end-marker parameters of the parser.
Weeks 7 ( Jan 18 - Jan 24 )Fix major issues that come along with community reviews. Add support for markdown
Weeks 8, 9 (Jan 25 - Feb 7)Work on supporting other git servers (Diffusion, and any other whose need is felt) (Phase II)
Weeks 10, 11 ( Feb 8 - Feb 21)Work on probable features. Enable extension to be called as a template.
Weeks 12, 13 ( Feb 22 - Mar 7 )Fix minor issues, file bugs. Complete documentation on extension’s wikipage. Deploy extension. (Phase IV)

Profile

Name: Smriti Singh
Email: smritis.31095@gmail.com
IRC handle (freenode): galorefitz
Internet Presence : MediaWiki user page
Location:

  1. Manama, Kingdom of Bahrain (December) (+03:00 GMT)
  2. Hyderabad, India (January - March) ( +05:30 GMT)

Typical Working Hours:

  1. December: 6-8 hours a day, between 6:30 a.m - 8:30 p.m (GMT) (at least 40 hours a week)
  2. January - March : 6-8 hours a day, between 10:30 a.m - 7:30 p.m (GMT) (at least 40 hours a week)
Communication

I will be submitting reports weekly on a phabricator task, tracking the progress of the project, on Phabricator. I will be available on #wikimedia-dev and #wikimedia-tech on IRC (freenode) during my working hours, so I can be reached there. I will also be responding on the relevant phabricator tasks. The source code will be pushed to Gerrit, and will be accessible, viewable and reviewable there.

Contributions

The microtask I completed for this project helped me find a bug in the extension (Git2Pages). I investigated it, but couldn't completely resolve it, and so posted my findings on the filed task. My contributions to the community can be viewed here and here. I haven't contributed to any other FOSS organisations yet, but plan to in the future.

About Me

Education: In progress. I am a Computer Science Student (currently in my third year), studying at the International Institute of Information Technology (IIIT), Hyderabad, India.

How did you hear about this program? From senior year students in college.

Why MediaWiki? I started contributing to MediaWiki in May, 2015. The people here were quick to respond, very encouraging, and amazingly helpful. Contributing, seeing my work make a difference to the community, felt great. Not many communities have all of the above, and this is what encouraged me to choose MediaWiki.

Why this project? Well, documentation is important. Even more so for newcomers, who are just beginning to maneuver their way through so much code. If the documentation that they (and for that matter, anyone) have access to has several versions, that say different things, it gets confusing. It’s our responsibility, as a community, to ensure that the documentation being provided by us is consistent and up-to-date.

Additional Information (as mandated by Gnome)

Do you meet the eligibility requirements outlined here ? - Yes.

Preferred pronoun - she

Prior Commitments -
December, 2015 : None.
January, 2016 - March, 2016 : College, will take approximately 25 hours a week (including examination times)

Course details
My college has a system of electives for every semester, and so I am not sure right now what courses I’ll be taking and how many credits they’ll be. The total number of credits, though, will be 12 or 16 (normal, full-time course load is of 20-24 credits). For more information, please refer to https://www.iiit.ac.in/academics/curriculum/undergraduate/BTech-CSE (Year III, Semester II)

Event Timeline

Galorefitz updated the task description. (Show Details)
Galorefitz raised the priority of this task from to Normal.
Galorefitz claimed this task.
Galorefitz moved this task from Backlog to Proposals Submitted on the Outreachy-Round-11 board.
Galorefitz updated the task description. (Show Details)Oct 22 2015, 1:07 PM
Galorefitz set Security to None.
revi added a subscriber: revi.Oct 22 2015, 2:30 PM
Galorefitz updated the task description. (Show Details)Oct 22 2015, 4:56 PM
Galorefitz updated the task description. (Show Details)Oct 22 2015, 5:02 PM
Galorefitz updated the task description. (Show Details)

Hi! We noticed you're a student. How much time do you think you can commit to the project per week? And would you be taking time off for exams or other commitments? A rough estimate of hours per week you can put in would be good.

Galorefitz updated the task description. (Show Details)Oct 26 2015, 3:06 PM

Updated a per-week estimate too. Thank you. :)

It might be good to add why Git2Pages doesn't meet current needs (Is it because its transcluding the snippet from a local git repo instead of some http server somewhere? If so, a paragraph about why we don't want to do that would be good).

It would be nice if we could have automatic cache deletion when the git repo is updated, instead of just a timeout.

Git2Pages doesn't meet the current needs because:

  • It sparse clones the git repository files that it needs, which could be a security hazard for the WMF cluster, as @Tgr already pointed out.
  • Other issues with the extension can be seen here (@Bawolff's awesomely written review)

I'll add a para about that to the proposal soon. :)

About cache deletion, yes. @Spage has already suggested on the task about adding a cron job to notify/update when git updates. I'll add that right away :)

Also, what do you think about asking the user for the commit hash they want to pick the file from? Because as @Spage again has already pointed out, sometimes people do not want to pick from HEAD. So would it be a good idea to default the extension to saving the present commit hash (when the text is first transcluded) or to set it to HEAD?

Comments?

Galorefitz updated the task description. (Show Details)Oct 28 2015, 2:53 PM

I think HEAD is the most common case, so it probably makes sense to default to it (With the code auto-updating the snippet when stuff changes). However, ideally one would be able to specify any sort of git branchy thing (e.g. You can specify HEAD, HEAD^, HEAD~, HEAD~3, or a branch master, REL1_24, a random sha1 hash, or a random tag) [But that's probably not a critical feature].

Tgr added a comment.Oct 28 2015, 8:12 PM

I think HEAD is the most common case, so it probably makes sense to default to it (With the code auto-updating the snippet when stuff changes). However, ideally one would be able to specify any sort of git branchy thing (e.g. You can specify HEAD, HEAD^, HEAD~, HEAD~3, or a branch master, REL1_24, a random sha1 hash, or a random tag) [But that's probably not a critical feature].

Indeed, just use a commitish (which defaults to HEAD), the extension doesn't have to understand what it means, just plug it in at the right place in the request. Git servers usually understand any safe commitish: https://github.com/wikimedia/mediawiki/tree/HEAD@%7B2+weeks+ago%7D or https://git.wikimedia.org/tree/mediawiki%2Fextensions%2FTwitterLogin/HEAD~3

We are approaching the Outreachy'11 application deadline, and if you want to have your proposal considered to be part of this round, do sign up and add your proposal at https://outreachy.gnome.org/ before November 02 2015, 07:00 pm UTC. You can copy-paste the above proposal to the Outreachy application system, and keep on polishing it over here. Keep in mind that your mentors and the organization team will be evaluating your proposal here in Phabricator, and you are free to ask and get more reviews complying https://www.mediawiki.org/wiki/Outreach_programs/Life_of_a_successful_project#Answering_your_questions

Tgr added a comment.Oct 31 2015, 12:46 AM

Synopsis:

  • It might be worth mentioning the structured data use case (e.g. pulling extension.json and providing the data therein to the wiki template that creates the extension infobox) since it might take some dedicated functionality to do that. Of course if you decide that does not fit into the roadmap that's fine.

Workflow:

  • You should give some more thought to how the source repo can be specified. There is no uniform way of putting a base URL + branch name + file name together into a full URL. (Have a look at how GitHub and git.wikimedia.org do it. You could check some other large git hosts, too.) Also worth considering what in your extension is specific to git - if it's just fetching some URL via HTTP, could it work with a mercurial/svn/whatever repo as well?

Phases:

  • Step I includes most of the work and because of that it's a bit vague (e.g. no mention of caching). Might be worth breaking it up.
  • Security is something to keep in mind from start. If you leave it to the end of the project to add it, you will end up with an insecure architecture or will have to do huge rewrites.
  • Testing helps your work, when done right; no point in leaving it to the end. Also if you don't write tests at the same time you write your classes, you might write code that's hard to test and be in trouble later. You should do it in parallel with other work.
  • IMO converting other formats to wikitext is not really useful.

Deliverables/timeline:

  • the "Sources of transcluded text" part is a bit unclear. Do you mean about adding format conversions (e.g. markdown -> HTML)?
  • "The Extension": not clear how that is a deliverable. Presumably when the other three items have been delivered you already have an extension that can transclude snippets of code into a wiki page.
  • I don't understand what the task for Week 2 is, can you rephrase that?

Profile:

  • It's a good idea to commit to a specific schedule (say, weekly reports on some wiki page).
  • For January-March, you are committing to 70 hours of study + work per week. That's pretty hardcore. Are you sure that's achievable?

It might be worth mentioning the structured data use case (e.g. pulling extension.json and providing the data therein to the wiki template that creates the extension infobox) since it might take some dedicated functionality to do that. Of course if you decide that does not fit into the roadmap that's fine.

Oh.. Having the extension template being fed from git would be sooo cool.

Synopsis:

  • It might be worth mentioning the structured data use case (e.g. pulling extension.json and providing the data therein to the wiki template that creates the extension infobox) since it might take some dedicated functionality to do that. Of course if you decide that does not fit into the roadmap that's fine.

Yes, that would be really cool :D
I could definitely add this in as an additional feature and implement it if time permits.

Workflow:

  • You should give some more thought to how the source repo can be specified. There is no uniform way of putting a base URL + branch name + file name together into a full URL. (Have a look at how GitHub and git.wikimedia.org do it. You could check some other large git hosts, too.) Also worth considering what in your extension is specific to git - if it's just fetching some URL via HTTP, could it work with a mercurial/svn/whatever repo as well?

Yes, that's correct. I will edit the proposal to specify what cases I would be accounting for specifically.

Phases:

  • Step I includes most of the work and because of that it's a bit vague (e.g. no mention of caching). Might be worth breaking it up.

Yes, I agree, it probably should be broken up. But I will not be implementing caching in step I. I'll add to a later step though.

  • Security is something to keep in mind from start. If you leave it to the end of the project to add it, you will end up with an insecure architecture or will have to do huge rewrites.
  • Testing helps your work, when done right; no point in leaving it to the end. Also if you don't write tests at the same time you write your classes, you might write code that's hard to test and be in trouble later. You should do it in parallel with other work.
  • IMO converting other formats to wikitext is not really useful.

True. I will remove that right away.

Deliverables/timeline:

  • the "Sources of transcluded text" part is a bit unclear. Do you mean about adding format conversions (e.g. markdown -> HTML)?

Clarifying :)

  • "The Extension": not clear how that is a deliverable. Presumably when the other three items have been delivered you already have an extension that can transclude snippets of code into a wiki page.

I thought it does count as a deliverable, the extension itself. That is, in fact, the main deliverable. But yes, the others (and more) add up to it.

  • I don't understand what the task for Week 2 is, can you rephrase that?

Yup. On it.

Profile:

  • It's a good idea to commit to a specific schedule (say, weekly reports on some wiki page).

I plan to upload weekly reports on phab, as was done for GSoC this year. (T100998: [tracking] GSoC 2015 and Outreachy 10 Weekly reports)

  • For January-March, you are committing to 70 hours of study + work per week. That's pretty hardcore. Are you sure that's achievable?

Yes. I will be picking my courses depending upon my experience in December (work load, will I be able to do it, etc). So, yes, I'll be able to give the project 40 hours a week. If I see the need, I'll reduce course load. :)

We find that you are having university/school during the Outreachy round 11 internship period ( Dec 2015 - March 2016 ). Please fill in the answers for the following questsions too in your proposal description so that we stick to the Outreachy norms. Thank You!

https://wiki.gnome.org/Outreachy#Application_Form

Will you have any other time commitments, such as school work, exams, research, another job, planned vacation, etc., between December 7, 2015 and March 7, 2016? How many hours a week do these commitments take? If a student, please list the courses you will be taking between December 7, 2015 and March 7, 2016, how many credits you will be taking, and how many credits a full-time student normally takes at your school:

Galorefitz updated the task description. (Show Details)Nov 8 2015, 9:43 PM
Galorefitz removed a subscriber: Devirk.

We find that you are having university/school during the Outreachy round 11 internship period ( Dec 2015 - March 2016 ). Please fill in the answers for the following questsions too in your proposal description so that we stick to the Outreachy norms. Thank You!

https://wiki.gnome.org/Outreachy#Application_Form

Will you have any other time commitments, such as school work, exams, research, another job, planned vacation, etc., between December 7, 2015 and March 7, 2016? How many hours a week do these commitments take? If a student, please list the courses you will be taking between December 7, 2015 and March 7, 2016, how many credits you will be taking, and how many credits a full-time student normally takes at your school:

I have already mentioned this under "Prior Commitments" and "Course details". :)

Nicely done! A few comments:

  • Phabricator's diffusion needs to be one of the supported repositories. We are moving away from GitBlit (git.wikimedia.org). I would argue it even needs to be the first supported, though there's a big issue with its "call signs" that needs to be resolved.
  • What's the name of the extension? What's the name of the parser tag? Are both TranscludeGit ?

"Saving the content to the database...

  • What do you mean, an actual DB table? Do we need this for first version? I believe to start this extension should just rely on the page cache and maybe the parser cache. If not, spell out what you're doing with a database.

... and rendering wikitext (.mediawiki), plain text (.txt) and code (.php, .py, .json, etc)."

  • Those three formats each hide a lot of work, I would split into separate steps. I would pick rendering wikitext first since as I understand it it's the simplest.
    • How does the wiki editor specify which format she wants? Does the parser function infer it from the file extension? It seems you need a parameter to override.

Fetch extension.json and parse it to feed it as input to the infobox template

  • Why is that part of this extension? Surely it is part of "use [raw text] for further processing (eg. processing text through a lua module)". This extension would make the file contents available to Lua modules, which can do stuff like parse extension.json, process the docs/hooks.txt file, etc. If you just mean this is a use case for the extension, or that you propose to build such a Lua module, say so.
This comment was removed by Spage.

I am sorry for the late reply. I had some hardware issues with my laptop.

  • Phabricator's diffusion needs to be one of the supported repositories. We are moving away from GitBlit (git.wikimedia.org). I would argue it even needs to be the first supported, though there's a big issue with its "call signs" that needs to be resolved.

Oh, alright. I'll add diffusion to the final set of supported git servers. It can be done using the conduit API method: diffusion.filecontentquery. I'll add that to the proposal, but there are issues with this ( T117621: Diffusion support for viewing the raw file in the browser ) that will probably be fixed in time. This method is unstable, though, and the extension will probably require an update later:

Unstable Method: See T2784 - migrating Diffusion working copy calls to conduit methods. Until that task is completed (and possibly after) these methods are unstable.
  • What's the name of the extension? What's the name of the parser tag? Are both TranscludeGit ?

That is a dummy name. I haven't thought of a name for the extension yet, neither the parser function.

"Saving the content to the database...

  • What do you mean, an actual DB table? Do we need this for first version? I believe to start this extension should just rely on the page cache and maybe the parser cache. If not, spell out what you're doing with a database.

Yes, I mean an actual DB table. I think we do. If we have only caching as our first strategy, then that would mean fetching content via HTTP requests even in cases where it might not be needed (i.e, where a specific commit id is being transcluded from (which is not HEAD), and the transcluded content does not change).

We can use only caching for storing the transcluded content when it is being transcluded from HEAD, but even then we would need a DB table for maintaining a record of the source and parameters of the transcluded content for cron/web hooks to update/purge the cache.

... and rendering wikitext (.mediawiki), plain text (.txt) and code (.php, .py, .json, etc)."

  • Those three formats each hide a lot of work, I would split into separate steps. I would pick rendering wikitext first since as I understand it it's the simplest.

All the transcluded text is going to be within a div in order to tell the wiki user explicitly that the content they are seeing has been transcluded. In this div, I will also be mentioning what file and repository the text has been transcluded from. Let's say the string for the same is:

$transclusionBox = "<div class=transclusion-box>" . $metaInfo;

Here is what I am going to do in order to transclude the 3:

  • For wikitext,
$output = $tranclusionBox . $parser->recursiveTagParseFully($contents) . '</div>';
return array( $output , 'nowiki' => true, 'noparse' => true, 'isHTML' => true );
  • For plain text,
$output = $tranclusionBox . '<poem><nowiki>' . htmlspecialchars( $content ) . '</nowiki></poem>' . '</div>' ; 
return array( $output, 'nowiki' => false, 'noparse' => false, 'isHTML' => true );

or

$output = $tranclusionBox . '<pre>' . htmlspecialchars( $content ) . '</pre></div>'; 
return array( $output, 'nowiki' => true, 'noparse' => true, 'isHTML' => true );

depending on preference. A <pre> tag might be a good idea since code is generally seen in monospace. On the other hand, the <poem> tag is prettier. Comments?

  • For code,
$output = $tranclusionBox . '<syntaxhighlight lang =' . $lang . '>' . $content . '</syntaxhighlight></div>';
return array( $output, 'nowiki' => false, 'noparse' => false, 'isHTML' => true );
  • How does the wiki editor specify which format she wants? Does the parser function infer it from the file extension? It seems you need a parameter to override.

Oh, yes. The parser function primarily guesses from the file extension. I'll add a new parameter that allows the user to specify a supported file format right away :)

Fetch extension.json and parse it to feed it as input to the infobox template

  • Why is that part of this extension? Surely it is part of "use [raw text] for further processing (eg. processing text through a lua module)". This extension would make the file contents available to Lua modules, which can do stuff like parse extension.json, process the docs/hooks.txt file, etc. If you just mean this is a use case for the extension, or that you propose to build such a Lua module, say so.

The "raw" parameter already provides raw text to the wiki editor, in case they want to feed the transcluded text to a lua module for further processing. I, on the other hand, do not intend to write a lua module to form the extension template (the infobox from extension.json). As @Tgr has already pointed out, doing things with lua is difficult. We can do almost the same things using just php. I thought it would be a cool additional feature if the extension did some basic text processing on its own, and gave out the parameters for the extension template when fed with extension.json. :)

Galorefitz updated the task description. (Show Details)Nov 12 2015, 8:11 PM
Tgr added a comment.Nov 13 2015, 12:47 AM

It can be done using the conduit API method: diffusion.filecontentquery.

Using the API requires a token and is unlikely to be the simplest solution.

there are issues with this ( T117621: Diffusion support for viewing the raw file in the browser ) that will probably be fixed in time.

The browser downloading the content instead of displaying it usually means either a Content-Disposition: attachment header or a Content-Type that the browser does not consider displayable. A HTTP fetch done from PHP code won't care about either.

I haven't thought of a name for the extension yet, neither the parser function.

There are only two hard things in Computer Science: cache invalidation and naming things. This project will have plenty of both :)

If we have only caching as our first strategy, then that would mean fetching content via HTTP requests even in cases where it might not be needed (i.e, where a specific commit id is being transcluded from (which is not HEAD), and the transcluded content does not change).

The right way of looking at this is what's the effect on the usability of the extension (an extra second delay on every one in a million page views? an extra second delay on every page save?) and how that compares to other usability improvements/fixes that you could do with the same development effort. In a real project you rarely have the opportunity to get everything right; you need to identify the things that have the largest impact so you can choose those to get right. (Not saying DB storage can't be one of those, but you should be able to back that up with some kind of argument about impact.)

We can use only caching for storing the transcluded content when it is being transcluded from HEAD, but even then we would need a DB table for maintaining a record of the source and parameters of the transcluded content for cron/web hooks to update/purge the cache.

Or you can just make the cache expiration time short, and ignore that content might be outdated for a short time. Again, depends on how much users care about the occasional small(?) delay in rendering and the slightly outdated content vs. other features.

All the transcluded text is going to be within a div in order to tell the wiki user explicitly that the content they are seeing has been transcluded. In this div, I will also be mentioning what file and repository the text has been transcluded from.

I'd recommend doing generic tasks on the extension side and leaving specific tasks (which are simple but hard to predict) to the wiki side, otherwise the extension might end up inflexible. Adding a caption is for example very easy in wikitext, and hard to predict how exactly it should work (should it link to the file? how can it be styled? etc).

It can be done using the conduit API method: diffusion.filecontentquery.

Using the API requires a token and is unlikely to be the simplest solution.

I'll investigate this further, but I can't think of any other way right now.

If we have only caching as our first strategy, then that would mean fetching content via HTTP requests even in cases where it might not be needed (i.e, where a specific commit id is being transcluded from (which is not HEAD), and the transcluded content does not change).

The right way of looking at this is what's the effect on the usability of the extension (an extra second delay on every one in a million page views? an extra second delay on every page save?) and how that compares to other usability improvements/fixes that you could do with the same development effort. In a real project you rarely have the opportunity to get everything right; you need to identify the things that have the largest impact so you can choose those to get right. (Not saying DB storage can't be one of those, but you should be able to back that up with some kind of argument about impact.)

We can use only caching for storing the transcluded content when it is being transcluded from HEAD, but even then we would need a DB table for maintaining a record of the source and parameters of the transcluded content for cron/web hooks to update/purge the cache.

Or you can just make the cache expiration time short, and ignore that content might be outdated for a short time. Again, depends on how much users care about the occasional small(?) delay in rendering and the slightly outdated content vs. other features.

You're right, but I don't think putting database in place would take a significant amount of time, so I would definitely try to implement this later incase I am left with time, at least for text transcluded from commit ids.

All the transcluded text is going to be within a div in order to tell the wiki user explicitly that the content they are seeing has been transcluded. In this div, I will also be mentioning what file and repository the text has been transcluded from.

I'd recommend doing generic tasks on the extension side and leaving specific tasks (which are simple but hard to predict) to the wiki side, otherwise the extension might end up inflexible. Adding a caption is for example very easy in wikitext, and hard to predict how exactly it should work (should it link to the file? how can it be styled? etc).

I have already provided a "raw" parameter, so if the user does not want any of the formatting provided by the extension, they can set that parameter to yes, and put their own formatting around it. But do you want the extension to give the user raw text by default? Maybe shift from "raw" to a "prettify" parameter?

Galorefitz updated the task description. (Show Details)Nov 14 2015, 1:51 PM
Galorefitz updated the task description. (Show Details)
Tgr added a comment.Nov 14 2015, 7:37 PM

Using the API requires a token and is unlikely to be the simplest solution.

I'll investigate this further, but I can't think of any other way right now.

You can view the text of a file in the browser without any use of the API. There is no reason the extension couldn't do the same.

I have already provided a "raw" parameter, so if the user does not want any of the formatting provided by the extension, they can set that parameter to yes, and put their own formatting around it. But do you want the extension to give the user raw text by default? Maybe shift from "raw" to a "prettify" parameter?

Personally I would avoid dealing with formatting at all, it can easily be done on the wiki side.

@Galorefitz

  • When exactly do your classes start in January? (Google wouldn't say :) ).
  • You left out Past experience from your proposal. You made some fixes in gerrit in May (thanks!), have you been active in any other FOSS projects as a user and a contributor?

@Tgr: Personally I would avoid dealing with formatting at all, it can easily be done on the wiki side.
Well, we need something. I would like the extension to support

  1. HTML of parsed wikitext — transclude file=tables.wiki
  2. HTML by running markdown through a converter — transclude file=README.md
  3. a raw format that works in the syntax highligher — {{#tag:source| {{transclude file=Setup.php }} |lang=php}}
  4. HTML of plain text on the page — transclude file=UPDATING
  5. (plus tgr's idea of being able to supply parsed JSON to a Lua module that sounds amazing :) )

Perhaps plain text could be produced by invoking {{#tag:poem...}} the same way as source code, but that's making wiki editors do a lot of work. It's the lowest priority.

Tgr added a comment.Nov 14 2015, 10:03 PM

Parsing is hard but generic and should happen on the extension side - that includes parsing JSON/YAML to nested arrays, Markdown to HTML, PHP to some kind of reflection object (although that last one is well beyond the scope of the project). Everything else can be done conveniently on the wiki side with #tag:source, <pre> and similar as long as the extension returns raw escaped wikitext.

You can view the text of a file in the browser without any use of the API. There is no reason the extension couldn't do the same.

The hash generated in the url for viewing the file is very long and unintelligible. I am not sure how we could create that using just the filename and and commit-id/branch, when one part of the link changes everytime you load the same file in raw, and the link expires on reload. I was planning to use the API for this reason. Ofcourse, if we can create the hash, then it'll be doable, but right now I don't see how we will be able to do that.

@Galorefitz

  • When exactly do your classes start in January? (Google wouldn't say :) ).

My classes officially start on 31st December (yes, we don't get a holiday for new year), but we'll have add-drop (attending classes, deciding what courses to take) for the next 15 days or so, so technically speaking they start around January 15th.

  • You left out Past experience from your proposal. You made some fixes in gerrit in May (thanks!), have you been active in any other FOSS projects as a user and a contributor?

No, I haven't contributed to any other FOSS organisations. It said here that I "can respond literally to the questions or find other ways to provide equivalent information", so I thought the links to my phabricator and gerrit account would suffice. I'll add the other FOSS Organisations part though. :)

Galorefitz updated the task description. (Show Details)Nov 15 2015, 1:24 PM
Galorefitz updated the task description. (Show Details)Nov 16 2015, 7:48 AM
Galorefitz updated the task description. (Show Details)Nov 16 2015, 8:00 AM
01tonythomas closed this task as Declined.Nov 23 2015, 8:45 AM

Thank you for your proposal. Sadly, the Outreachy administration team made it strict that candidates with any kind of academic/other commitments are not approved for this round. Please consider talking with your mentors about further steps, and we hope to see your application ( the same too, if the consensus still exist ) in the next round of GSoC/Outreachy. Closing the same as declined, and all the very best for the next round!