Page MenuHomePhabricator

Add ZIM format support to OCG
Closed, DeclinedPublic

Assigned To
None
Authored By
Kelson
Oct 5 2014, 9:44 AM
Referenced Files
None
Tokens
"Like" token, awarded by geraki."Love" token, awarded by Cxbrx."Love" token, awarded by deltonio2."Love" token, awarded by MichaelSchoenitzer."Love" token, awarded by Pabouk."Love" token, awarded by pyroshroom."Heartbreak" token, awarded by Tadaa3x."Love" token, awarded by Arjunaraoc."Love" token, awarded by Gmehn."Heartbreak" token, awarded by Oznogon.

Description

Mediawiki is the wiki engine behind Wikipedia, all Wikimedia projects and thousands other Web sites. It's a cutting edge free software providing high featured web sites that anybody can edit. Mediawiki hosted content can be made available for offline usage through the Collection extension (written in PHP). The Collection extension allows to easily create collection/selection of articles: so called books ; here is how it works on the Wikipedia in English. One time created, books can be exported in the PDF format. The PDF exporting backend itself is not provided by the Collection extension, it's done with a JavaScript based solution called OCG. OCG is a NodeJS daemon able to transform a book definition in a PDF and it should be able to do the same in the ZIM format. The ZIM format allows to store web pages (with images, videos, etc...) in one extremely compressed file, these pages are then available to read everywhere with a reader like Kiwix. A stub of solution has already been written and the MWOffline is already functional. This task is mostly about merging this two pieces of code.

  • Primary mentor: @cscott
  • Co-mentor:
  • Other mentors: (optional, Phabricator username)
  • Skills: NodeJs , HML, PHP, packaging
  • Estimated project time for a senior contributor: 2-3 weeks
  • Microtasks: T113736

Details

Reference
bz71660

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Here's what we have:

IIRC, last time I looked at the code, some tweaks to https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler might be needed as well. I already added one in 55aa6bea33e29053b76b2043d2c96bcb2f4f1964 since the zimwriter backend needs to rewrite redirects. I believe there were other minor issues involving stylesheets & etc -- for example, the Parsoid DOM includes a stylesheet URL, but we don't actually fetch it in the bundler. (And in this case a better solution would be to use the API to query the actual style modules necessary, instead of just stashing the result of ResourceLoader; see T69540: Produce/preserve the metadata about additional ResourceLoader modules required by extension tags). I'm happy to do the mw-ocg-bundler side of this work; just create phab tickets for specific items and link them here as blockers.

If you want to feature this project idea at https://www.mediawiki.org/wiki/Outreachy/Round_11 please edit the description adding the mentors, skills required, and microtasks for candidates. Thank you!

@cscott, would you be interested in mentoring this as an internship project for Outreachy?

@cscott, would you be interested in / would time allow mentoring this as an internship project for Outreachy?

01tonythomas subscribed.

I am shifting this to Outreachy-Round-11 as the project description has two mentors, micro-tasks and looks ready for the 11th edition of Outreachy ( Dec 2015 - Mar 2016 ) . Potential candidates should start by submitting their proposals as a blocker for this task, by November 02.

Feel free to revert it back, if this task has some relevant issues which might block its completion in this term of Outreachy.

NOTE: Outreachy round 12 applications are now open and GSoC 2016 is round the corner. This project was featured for Outreachy round 11 and has a well defined scope. Are you ready to mentor the project this season? If yes, then we'll feature this for Outreachy round 12 and GSoC 2016 as well. Please reply back in comments.

@cscott , @Kelson , you were listed as mentors for this project during Outreachy-11, are you willing to do the same for this round of GSoC/Outreachy ?

@Sumit
Thx for proposing but I have unfortunately no time to do that currently.

IMPORTANT: Moving to missing mentors as we do not have two mentors confirmed for this at this moment. Interested in co-mentoring ? Do add your name in the task description. A Possible-Tech-Projects task requires a minimum of one primary mentor and a co-mentor to be featured for GSoC/Outreachy. Prospective students ? Do take a look at the Wikimedia mentors pool at https://www.mediawiki.org/wiki/Outreach_programs/Possible_mentors, and try connecting this project with a co-mentor, to get featured for this round. A co-mentor needn't necessarily be from a technical background, as per https://www.mediawiki.org/wiki/Outreach_programs/Life_of_a_successful_project#Coming_up_with_a_proposal. Feel free to change status accordingly, if both mentors are agreed.

I would like to work on this project as a part of Outreachy round 12/ GSoC 2016. I am fairly good at PHP and know some Node.JS. I have read about the ZIM format and OCG. I intend to look at the present stub of solution implemented in the next couple days, and in parallel solve the microtasks. @cscott, will you be willing to work as a mentor for this project?

This looks like quite a discussed project. @cscott , would you be willing to mentor this project for Outreachy-13(Dec 6-March 6) ?

Hi all,
I am a software engineering student and i am quite new to WIkimedia.

While browsing the possible projects, i read through this project and It seems very interesting. I am willing to take this project during this GsoC '17 Please @cscott if you agree with that I can move ahead directly with looking deeply at the project. Thanks

Uhm, sorry for the late notice here: There are ideas to replace OCG by Electron which might turn this task into something to better not spend time on. :-/

@Aklapper @Eugene233 On our side this is still pretty important even if we have no focus on this due to lack of resources. I have posted a comment in that direction here https://phabricator.wikimedia.org/T146757#2959943. That said, to the contrary to the OCG, the electron-renderer (effort) seems be self-focused an to offer little opportunities to be reused for other formats.

Electron will never support ZIM, AFAIK. I think OCG is still the only option for actual *offline* collection creation.

@cscott Thx for confirming my feeling.

"mwoffliner" is not available as a npm module, so it can be directly/easily used in OCG.

We are currently fixing the problem with mocking the resourceLoader for offline usage in mwoffliner and use also the mobile layout. This should be finished in a few weeks.

Then, it would be smart to move away from zimwriterfs binary call, and use directly node-libzim. One time that's made, it should be relatively easy to bring ZIM export in OCG.

"New Reader" and "global reach" teams are pretty supportive to that feature AFAIK and this is important to Kiwix project too. Looks like we just need to gather supportive people to get enough support to get dev resources to "finish the job".

I'm a dev resource and willing to work on this task but I cannot work out whether it's currently parked. I am new to this community and would appreciate being steered in the right direction. @cscott can you help?

@Inveteratransmog: Hi and welcome! :) Wikimedia plans to replace OCG on its servers and OCG might get archived. Hence I would not recommend spending time on this task. I'm not sure about the exact state of ZIM plans - the Kiwix folks or the WMF Readers team might know best? :-/

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.

Is this still a valid task or be tagged as Possible-Tech-Projects as per @Aklapper's comment above?

Aklapper lowered the priority of this task from High to Low.Jan 18 2018, 10:49 AM

@srishakatux: OCG is dead, see T150871. This very task awaits a decision in T161312.

Aklapper lowered the priority of this task from Low to Lowest.Apr 22 2019, 6:04 PM
Cxbrx rescinded a token.
Cxbrx awarded a token.
Cxbrx subscribed.

Declining this task as OCG has been dead for years.