Add ZIM format support to OCG
Open, HighPublic

Tokens
"Love" token, awarded by Pabouk."Love" token, awarded by pyroshroom."Heartbreak" token, awarded by Tadaa3x."Love" token, awarded by Arjunaraoc."Love" token, awarded by Gmehn."Heartbreak" token, awarded by Oznogon.
Assigned To
None
Authored By
Kelson, Oct 5 2014

Description

Mediawiki is the wiki engine behind Wikipedia, all Wikimedia projects and thousands other Web sites. It's a cutting edge free software providing high featured web sites that anybody can edit. Mediawiki hosted content can be made available for offline usage through the Collection extension (written in PHP). The Collection extension allows to easily create collection/selection of articles: so called books ; here is how it works on the Wikipedia in English. One time created, books can be exported in the PDF format. The PDF exporting backend itself is not provided by the Collection extension, it's done with a JavaScript based solution called OCG. OCG is a NodeJS daemon able to transform a book definition in a PDF and it should be able to do the same in the ZIM format. The ZIM format allows to store web pages (with images, videos, etc...) in one extremely compressed file, these pages are then available to read everywhere with a reader like Kiwix. A stub of solution has already been written and the MWOffline is already functional. This task is mostly about merging this two pieces of code.

  • Primary mentor: @cscott
  • Co-mentor:
  • Other mentors: (optional, Phabricator username)
  • Skills: NodeJs , HML, PHP, packaging
  • Estimated project time for a senior contributor: 2-3 weeks
  • Microtasks: T113736

Details

Reference
bz71660
There are a very large number of changes, so older changes are hidden. Show Older Changes
bzimport set Reference to bz71660.Nov 22 2014, 3:45 AM
Kelson created this task.Oct 5 2014, 9:44 AM

CC'ing the OCG team - please take a look at this.

I cannot even get there to reproduce:
Clicking any of the "EPUB · ODT · PDF (A4) · PDF (Letter) · ZIM" links reloads and reloads the page again for me hence not possible to use for me (FF32).

mehranwiki wrote:

Yes, I cannot create a ZIM file from Book namespace too.

Kelson renamed this task from No ZIM support to Bring back books export in ZIM format in extension:Collection/OCG.
Kelson added a project: Collection.
Kelson set Security to None.
Kelson added a subscriber: Nemo_bis.
Krenair renamed this task from Bring back books export in ZIM format in extension:Collection/OCG to Add ZIM format support to OCG.Mar 29 2015, 6:37 PM
Krenair removed a project: Collection.
Oznogon added a subscriber: Oznogon.
rjlabs added a subscriber: rjlabs.Apr 15 2015, 5:41 PM

About time to get .zim (and epub) options back on line for the general audience.

mw-ocg-bundler
and
mw-ocg-zimwriter
(seem to work. Thanks cscott!)

Why not bring back to the UI?

mw-ocg-zimwriter doesn't work yet. Help wanted!

rjlabs added a comment.EditedApr 16 2015, 6:06 PM

CScott - If I were a good programmer, I'd dive right in at the coding level to help because its a HUGE gap here, not having .zim output of WikiPedia, and not having easy Admin level .zim output from MediaWiki. I'm happy to do the legwork of trying to find you some qualified help. What programming language are your working in (or what are the choices) and do you need help understanding the .zim file format? I've toured http://www.openzim.org and http://www.kiwix.org trying to find active participants. Emmanuel Engelhart kelson at kiwix.org & Tommi Maekitalo tommi at tntnet.org might be able to help get the ball rolling.

Personally, I'm looking to pack Kiwix and .zim files as the primary off-line help system for OSMand (the extremely popular GPS / mobile mapping system that uses Open Street Maps off-line). The help files for all this will be authored on the Open Street Map MediaWiki and spun out to be the off-line help system on the Android, IOS and Destop (java) versions of the software. The global / any language capabilities are really attractive. All projects are free/open source. Kiwix is an obvious solution as long as zim files are easily produced on the fly by MediaWiki admins. Since maps are huge, and cell coverage spotty or completely non existent in many areas, off-line use of maps (and the help system) is critical. So our need dovtails nicely with the whole zim "philosophy".

Some scribbled notes below to perhaps uncover what the next best steps might be? I can't imagine the Kiwix people are happy about having zero zim output from WikiPedia and every other MediaWiki system.... PDF the only alternative? I cringe :)

This is the temporary solution suggested for the casual MediWiki admin to create .zim files?

http://www.openzim.org/wiki/Build_your_ZIM_file has some tool suggestions:

I also value being able to create zim files directly from mediawiki sites. I'm already spread too thin to be able to help practically unless my circumstances change materially. However I am involved in kiwix and willing to help test and debug whatever's involved. Emmanuel knows how to wake me up if/when the time is right :)

Thanks

Julian
PS: I'm also helping create a wikibook for educational use (Computing aimed at 11 to 14 year old pupils in the UK schooling system) where it'd be great to make the book available as a zim file.

greg removed a subscriber: greg.Apr 24 2015, 3:36 AM
Gmehn awarded a token.May 10 2015, 5:49 PM
Arjunaraoc added a subscriber: Arjunaraoc.
pyroshroom added a subscriber: pyroshroom.
Kelson updated the task description. (Show Details)Sep 24 2015, 8:27 PM

Here's what we have:

IIRC, last time I looked at the code, some tweaks to https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler might be needed as well. I already added one in 55aa6bea33e29053b76b2043d2c96bcb2f4f1964 since the zimwriter backend needs to rewrite redirects. I believe there were other minor issues involving stylesheets & etc -- for example, the Parsoid DOM includes a stylesheet URL, but we don't actually fetch it in the bundler. (And in this case a better solution would be to use the API to query the actual style modules necessary, instead of just stashing the result of ResourceLoader; see T69540: Produce/preserve the metadata about additional ResourceLoader modules required by extension tags). I'm happy to do the mw-ocg-bundler side of this work; just create phab tickets for specific items and link them here as blockers.

Qgil updated the task description. (Show Details)Sep 28 2015, 10:01 AM

If you want to feature this project idea at https://www.mediawiki.org/wiki/Outreachy/Round_11 please edit the description adding the mentors, skills required, and microtasks for candidates. Thank you!

@cscott, would you be interested in mentoring this as an internship project for Outreachy?

Kelson updated the task description. (Show Details)Sep 28 2015, 6:32 PM
cscott updated the task description. (Show Details)Oct 6 2015, 6:24 PM
Adishaporwal updated the task description. (Show Details)Oct 6 2015, 6:25 PM
Adishaporwal updated the task description. (Show Details)Oct 6 2015, 6:31 PM
Kelson updated the task description. (Show Details)Oct 6 2015, 6:35 PM

@cscott, would you be interested in / would time allow mentoring this as an internship project for Outreachy?

01tonythomas added a subscriber: 01tonythomas.

I am shifting this to Outreachy-Round-11 as the project description has two mentors, micro-tasks and looks ready for the 11th edition of Outreachy ( Dec 2015 - Mar 2016 ) . Potential candidates should start by submitting their proposals as a blocker for this task, by November 02.

Feel free to revert it back, if this task has some relevant issues which might block its completion in this term of Outreachy.

rjlabs removed a subscriber: rjlabs.Nov 23 2015, 1:40 PM
Sumit added a subscriber: Sumit.Feb 19 2016, 8:16 PM
NOTE: Outreachy round 12 applications are now open and GSoC 2016 is round the corner. This project was featured for Outreachy round 11 and has a well defined scope. Are you ready to mentor the project this season? If yes, then we'll feature this for Outreachy round 12 and GSoC 2016 as well. Please reply back in comments.
Niharika removed a subscriber: Niharika.Feb 20 2016, 5:19 AM
Qgil removed a subscriber: Qgil.Feb 22 2016, 9:31 PM
Sumit added a comment.EditedMar 2 2016, 1:55 PM

@cscott , @Kelson , you were listed as mentors for this project during Outreachy-11, are you willing to do the same for this round of GSoC/Outreachy ?

Kelson added a comment.Mar 2 2016, 4:43 PM

@Sumit
Thx for proposing but I have unfortunately no time to do that currently.

Sumit updated the task description. (Show Details)Mar 2 2016, 5:03 PM
Sumit added a comment.Mar 2 2016, 5:06 PM
IMPORTANT: Moving to missing mentors as we do not have two mentors confirmed for this at this moment. Interested in co-mentoring ? Do add your name in the task description. A Possible-Tech-Projects task requires a minimum of one primary mentor and a co-mentor to be featured for GSoC/Outreachy. Prospective students ? Do take a look at the Wikimedia mentors pool at https://www.mediawiki.org/wiki/Outreach_programs/Possible_mentors, and try connecting this project with a co-mentor, to get featured for this round. A co-mentor needn't necessarily be from a technical background, as per https://www.mediawiki.org/wiki/Outreach_programs/Life_of_a_successful_project#Coming_up_with_a_proposal. Feel free to change status accordingly, if both mentors are agreed.
Aashaka added a subscriber: Aashaka.Mar 7 2016, 5:32 PM

I would like to work on this project as a part of Outreachy round 12/ GSoC 2016. I am fairly good at PHP and know some Node.JS. I have read about the ZIM format and OCG. I intend to look at the present stub of solution implemented in the next couple days, and in parallel solve the microtasks. @cscott, will you be willing to work as a mentor for this project?

Sumit added a comment.Sep 10 2016, 5:10 PM

This looks like quite a discussed project. @cscott , would you be willing to mentor this project for Outreachy-13(Dec 6-March 6) ?

Eugene233 added a subscriber: Eugene233.EditedMar 22 2017, 4:38 AM

Hi all,
I am a software engineering student and i am quite new to WIkimedia.

While browsing the possible projects, i read through this project and It seems very interesting. I am willing to take this project during this GsoC '17 Please @cscott if you agree with that I can move ahead directly with looking deeply at the project. Thanks

Uhm, sorry for the late notice here: There are ideas to replace OCG by Electron which might turn this task into something to better not spend time on. :-/

@Aklapper @Eugene233 On our side this is still pretty important even if we have no focus on this due to lack of resources. I have posted a comment in that direction here https://phabricator.wikimedia.org/T146757#2959943. That said, to the contrary to the OCG, the electron-renderer (effort) seems be self-focused an to offer little opportunities to be reused for other formats.

Electron will never support ZIM, AFAIK. I think OCG is still the only option for actual *offline* collection creation.

@cscott Thx for confirming my feeling.

"mwoffliner" is not available as a npm module, so it can be directly/easily used in OCG.

We are currently fixing the problem with mocking the resourceLoader for offline usage in mwoffliner and use also the mobile layout. This should be finished in a few weeks.

Then, it would be smart to move away from zimwriterfs binary call, and use directly node-libzim. One time that's made, it should be relatively easy to bring ZIM export in OCG.

"New Reader" and "global reach" teams are pretty supportive to that feature AFAIK and this is important to Kiwix project too. Looks like we just need to gather supportive people to get enough support to get dev resources to "finish the job".

Nz-jon added a subscriber: Nz-jon.May 8 2017, 5:00 PM
Restricted Application added a subscriber: jeblad. · View Herald TranscriptAug 25 2017, 1:49 AM
jeblad removed a subscriber: jeblad.Aug 25 2017, 11:35 PM

I'm a dev resource and willing to work on this task but I cannot work out whether it's currently parked. I am new to this community and would appreciate being steered in the right direction. @cscott can you help?

@Inveteratransmog: Hi and welcome! :) Wikimedia plans to replace OCG on its servers and OCG might get archived. Hence I would not recommend spending time on this task. I'm not sure about the exact state of ZIM plans - the Kiwix folks or the WMF Readers team might know best? :-/

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.