OCG should download resourceLoader js/css dependencies
Open, HighPublic

Description

Each Mediawiki page has CSS/JS dependencies and the list of dependencies is not always the same (mostly depending on which extensions are used in a page). Within Mediawiki, the solution which provides this feature is called the "ResourceLoader".

Currently, the ResourceLoader dependencies are not retrieved by the OCG crawler module. This was until now not needed by the OCG PDF backend. But, if we want to generate ZIM files, which are based on an HTML output, a correct handling of the CSS/JS dependencies is mandatory. This ticket is about retrieving the list of dependencies necessary for each page and their retrieving within OCG & MWoffliner.

Steps:
1 - Start a small NodeJS demo program for the new functionality
2 - Given an article and an API URL, code a function able to download the list of JS and CSS dependencies
3 - Merge this code to allow OCG download of these dependencies
4 - Merge this code to allow MWoffliner to download these dependencies and render them correctly in the HTML files

Hint:
The list of "default" modules is at https://github.com/wikimedia/parsoid/blob/master/lib/mediawiki.DOMPostProcessor.js#L322-L329
An api query to get the set of modules for the page is:
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=Egyptian_hieroglyphs&prop=modules%7Cjsconfigvars

Internship project details

  • Primary mentor: (Phabricator username)
  • Co-mentor: (Phabricator username)
  • Other mentors: (optional, Phabricator username)
  • Skills: (Phabricator tags are welcome)
  • Estimated project time for a senior contributor: (must be 2-3 weeks)
  • Microtasks: (links to Phabricator tasks that must be completed in order to become a strong candidate)
Kelson created this task.Oct 6 2015, 6:15 PM
Kelson updated the task description. (Show Details)
Kelson raised the priority of this task from to High.
Kelson added subscribers: Galorefitz, Niharika, Adishaporwal and 15 others.
GWicke added a subscriber: GWicke.Oct 6 2015, 8:20 PM

T105845 is about defining per-page / component metadata, which will include per-page RL modules.

Qgil updated the task description. (Show Details)Oct 7 2015, 3:03 PM
Qgil set Security to None.
Qgil moved this task from Backlog to Need Discussion on the Possible-Tech-Projects board.
cscott updated the task description. (Show Details)Oct 19 2015, 5:46 PM

From IRC, some hints:
(12:18:56 PM) adisha: now I want to ask how to proceed to next microtask that is " Code a nodejs program/javascript function able, given a wikipedia article URL, to list&download resourceLoader js+css dependencies"
(12:22:26 PM) cscott-free: well, the first micro-micro-task would probably be just "add a patch to mw-ocg-bundler that does *something*"
(12:26:59 PM) cscott-free: adisha: take a look at https://gerrit.wikimedia.org/r/108033 perhaps.
(12:27:23 PM) cscott-free: that's a smallish patch (86 lines), but for your first step could be even smaller.
(12:28:23 PM) cscott-free: look at the changes there to index.js and think about adding a new stage to fetch css+js.
(12:29:02 PM) cscott-free: create a a new file named 'modules.js', creating a new Modules(metabook.wikis) class, and a new modules.db database.
(12:29:30 PM) cscott-free: for the first step you can leave the database empty and don't worry about doing the actual query to fetch stuff yet. just concentrate on the infrastructure to add a new stage to the bundler.
(12:30:50 PM) cscott-free: most of the work will just be familiarization stuff: getting the latest bundler, figuring out how to make a new class, getting it uploaded to gerrit.
(12:31:25 PM) cscott-free: don't be afraid to do too little, as soon as you've made any interesting change at all, upload it to gerrit so we can have a look and make hopefully helpful suggestions.
(12:32:30 PM) cscott-free: basically this is the stage where i'd print out a bunch of source code and read through it on paper until i have some basic idea of what it's doing. your technique may vary, of course!
(12:35:26 PM) cscott-free: adisha: you might also want to google "javascript promise tutorial" and have a read around if you've never used them before. http://www.html5rocks.com/en/tutorials/es6/promises/ is the first hit I get for that, and it looks like a reasonable place to start.
(12:38:25 PM) cscott-free: you might want to take a step back and read about how node.js handles asynchronous or blocking operatings.
(12:39:48 PM) cscott-free: http://blog.ometer.com/2011/07/24/callbacks-synchronous-and-asynchronous/ might help with the basic terminology, http://blog.izs.me/post/59142742143/designing-apis-for-asynchrony describes "zalgo", which is semi-humorous but tends to come up from time to time.
(12:40:52 PM) cscott-free: adisha: no worries, this is the hard part of getting your head around a new codebase. it's worth taking your time to understand stuff.
(12:41:04 PM) cscott-free: and of course, don't be afraid to ask questions here if that helps

Change 247614 had a related patch set uploaded (by Adishaporwal):
WIP: OCG should download resourceLoader js/css dependencies

https://gerrit.wikimedia.org/r/247614

Niharika removed a subscriber: Niharika.Oct 20 2015, 6:35 PM

Change 247614 merged by jenkins-bot:
Adding module.js and creating modules.db to download css/js dependencies.

https://gerrit.wikimedia.org/r/247614

rjlabs removed a subscriber: rjlabs.Nov 2 2015, 2:19 AM

Change 251719 had a related patch set uploaded (by Adishaporwal):
WIP: Get the list of unique modules required to download in bundler

https://gerrit.wikimedia.org/r/251719

Sumit added a subscriber: Sumit.Mar 1 2016, 5:37 PM
IMPORTANT: This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Wikimedia has been accepted as a mentor organization for GSoC '16. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.
Qgil added a comment.Mar 2 2016, 10:45 AM

This task has patches for review. Is it still a good Possible-Tech-Projects ?

Sumit added a comment.Sep 10 2016, 9:35 PM

Outreachy-13 is about to start. Anyone willing to mentor this project?

At Kiwix we are going to pay a developer to fix this in mwoffliner. Could be then easily mockup-ed in OCG. For this reason, maybe not the best topic for Outreachy.

removing Possible-Tech-Projects feel free to add, if need develops again.

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.

Qgil removed a subscriber: Qgil.Sep 14 2017, 10:33 AM