Each Mediawiki page has CSS/JS dependencies and the list of dependencies is not always the same (mostly depending on which extensions are used in a page). Within Mediawiki, the solution which provides this feature is called the "ResourceLoader".
Currently, the ResourceLoader dependencies are not retrieved by the OCG crawler module. This was until now not needed by the OCG PDF backend. But, if we want to generate ZIM files, which are based on an HTML output, a correct handling of the CSS/JS dependencies is mandatory. This ticket is about retrieving the list of dependencies necessary for each page and their retrieving within OCG & MWoffliner.
Steps:
1 - Start a small NodeJS demo program for the new functionality
2 - Given an article and an API URL, code a function able to download the list of JS and CSS dependencies
3 - Merge this code to allow OCG download of these dependencies
4 - Merge this code to allow MWoffliner to download these dependencies and render them correctly in the HTML files
Hint:
The list of "default" modules is at https://github.com/wikimedia/parsoid/blob/master/lib/mediawiki.DOMPostProcessor.js#L322-L329
An api query to get the set of modules for the page is:
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=Egyptian_hieroglyphs&prop=modules%7Cjsconfigvars
Internship project details
- Primary mentor: (Phabricator username)
- Co-mentor: (Phabricator username)
- Other mentors: (optional, Phabricator username)
- Skills: (Phabricator tags are welcome)
- Estimated project time for a senior contributor: (must be 2-3 weeks)
- Microtasks: (links to Phabricator tasks that must be completed in order to become a strong candidate)