Page MenuHomePhabricator

Outreachy proposal for Add ZIM format support to OCG
Closed, DeclinedPublic


This is a proposal for outreachy(Round-11) to add ZIM support to OCG.


Public URL: T73660

Name and contact information

Name: Adisha Porwal
IRC Nick: adisha
Mediawiki User: Adishaporwal
Location: India
Time Zone: UTC+5:30
Typical working hours: 3:00 PM to 12:30 AM (Indian Standard Time)

Internet Presence

Github Profile: adishap
LinkedIn Profile: Adisha Porwal
Twitter : @AdishaPorwal


Mediawiki is the wiki engine behind Wikipedia, all Wikimedia projects and thousands other Web sites. Mediawiki hosted content can be made available for offline usage through the Collection Mediawiki extension (written in PHP). The Collection extension allows to easily create collection/selection of articles: so called books, and is already installed on all Wikimedia projects an many other one. One time created, books can be exported in the PDF format. The PDF exporting backend itself is not provided by the Collection extension, it's done with a Node.js (JavaScript) backend solution called Offline Content Generator (OCG). Presently, OCG only supports PDF format. PDF is a great format but does not alow a web-near experience, for that we have the the ZIM file format. The ZIM format allows to store huge amount of web pages (with images, videos, etc...) in one extremely compressed file. This project will provide functionality to support ZIM to OCG.


  1. Node.JS
  2. HTML
  3. PHP
  4. Debian packaging

How it will benefit MediaWiki or Wikimedia projects?

  1. Mediawiki hosted content can be availed offline in ZIM format to read everywhere with a reader like Kiwix. Any visitor will be able to build its own collection of article for offline usage, even with many hundreds of articles.
  2. The project will also help to integrate the functionalities of OCG and MWOffliner, an already existing standalone solution to export Mediawiki content to ZIM.


  1. C. Scott Ananian
  2. Kelson

Milestones and Deliverables

Milestone DescriptionDurationDeliverable
Milestone 1Envision PhaseBefore 17 November 2015Development environment setup
Milestone 2Community Bonding Period17 November - 7 December 2015WorkFlow of project
Milestone 3Downloading ResourceLoader css/javascript dependencies7 December 2015 - 9 December 2015Modules export functionality by OCG
Milestone 4Generation of standalone HTML tree on filesystem10 December 2015 - 27 December 2015Standalone HTML tree generation code
Milestone 5Build a Custom Loader for css/javascript dependencies28 December 2015 - 15 January 2016Custom loader script to work HTML tree offline correctly
Milestone 6ZIM file generation (zimwriterfs integration)16 January 2016 -3 February 2016ability to convert HTML tree to ZIM format
Milestone 7Debian Package creation for zimwriterfs4 Febuary 2016 - 25 February 2016Debian package of zimwriterfs
Milestone 8Final Code Review and Documentation26 February 2016 - 7 March 2016Source code, Project Report


Before 17 November - Milestone 1

  • Remain in constant touch with mentor(s) and community.
  • Getting myself familiar with development environment.
  • Getting myself familiar with working of Node.Js and packaging.
  • Study required docs.
  • Fix some bugs along the way and get my hands dirty with code.

17-November-2015 to 6 December 2015 - Milestone 2

  • Discuss, prepare and finalize workflow for development phase with mentors and community.
  • Get myself familiar with architecture and implementation of OCG and MWOffliner.

7 December 2015
Actual Coding period begins.

7 December 2015 to 9 December 2015 - Milestone 3
This Milestone can be achieved in following steps:

Step 1:
1.1 - A "do nothing" stage in mw-ocg-bundler, protected by a command-line flag, to new code.
1.2 - Turn on this flag when running tests.
Step 2:
Add code to the stage to download the set of modules required for each page in the collection, and save it to 'modulesDb'.
Step 3:
3.1 - Add the list of modules to get the list of *unique* modules required to download.
3.2 - To the above list, add the list of "default modules" required on every page to this set.
Step 4:
Download this list of modules obtained above from resourceLoader

Step first and second is tried to achieve in the patch.

It is also a microtask for project, So, it is expected to be done before that. If completed, will begin with research on implementation of milestone 4.

10 December 2015 to 27 December 2015 - Milestone 4

  • Build standalone HTML tree ( "self-sufficient" HTML content with images, javascript, stylesheets) using pre-existing ZIP file (known as bundle) that is generated by mw-ocg-bundler (mediawiki article spider tool).

28 December 2015 to 15 January 2016 - Milestone 5

  • Develop custom loader (javascript tool) that will load modules when the HTML article is displayed.
  • Rewrite/clean HTML to work offline properly.

16 January 2016 to 20 January 2016

  • Test the portion of project completed.
  • Document the achieved milestones.
  • Get familiar with working of MWOffliner.

20 January 2016 to 26 January 2016

  • Invoke 'zimwriterfs' (console tool to create ZIM files from a locally stored directory containing an HTML tree) using OCG.
  • Discuss with mentor(s) about the future work on related milestone.

26 January 2016
Mid Term Evaluation

27 January 2016 to 3 February 2016 - Milestone 6

  • Convert HTML tree to ZIM format using 'zimwriterfs'.

4 February 2016 to 22 February 2016 - Milestone 7

  • Get familiar with packaging.
  • Create Debian package of zimwriterfs as it is required for easy/controlled installation in production.

23 February 2016 to 29 February 2016 - Milestone 8

  • Code Review by me and mentors.
  • Inact information from code review.
  • Conduct several tests.
  • Document the project.

1 March 2016 to 7 March 2016

  • A buffer period required for final polishing of work.

7 March 2016
Firm Pen Down


Communication of progress
  • IRC channel: I'll stay online on IRC at #kiwix, #mediawiki-parsoid, #wikimedia-dev in freenode, in my working hours.
  • Email: I shall keep mentors updated of my regular work through direct updates via emails.
  • The project progress will be updated weekly on the sub-namespace of my user page.
  • Mailing List: I will use it to communicate progress regularly.
  • Blog : I shall keep my blog updated with regular updates of my work, ideas and helpful posts.
Where I would turn for help?

Search by myself

  • Self-research (initially) through available documentation, articles, blog posts and forums.

Seek help from community

  • Ask the community at IRC channel for help.
  • Post queries to the relevant mailing lists or through direct emails to the mentor and related developers.
Source Code
  • Source code will be pushed to Gerrit.

About Me

I am Adisha Porwal, a fifth year student of an integrated masters program at IIPS-DAVV with computer science major.

I am enthusiastic and active member of Development Center (DC), a part of my college. Development Center aims at bringing people closer to open source technologies. As a DC member, I have taken workshops on Python, HTML, CSS for college students and went to village for making the girls aware of computer basics.

Programming is something I like. I use Python, PHP, Javascript, CSS, HTML, Mysql for my projects. My projects can be found at my GitHub profile.

For contributing to open source world, version control systems are something one should be aware of. I am familiar with git internals. For my projects, I use GitHub to version control and share my code with the world. While contributing for Wikimedia, I got aware of Gerrit code review system.

During Outreachy internship, I promise to work for at least 40 hours a week.

Other Commitments:
I have winter break from 1 December 2015 to 5 January 2016. My tenth semester will start from 6 January 2016. Schedule for tenth semester is not announced yet but it is expected that I don't have end semester exams till 1 March 2016. My college will take maximum of 15 hours per week.Other than internship and my college, I don't have any other commitments that can interfere.

Courses I will taking during internship period in my college :

Formal Language Theory4
Parallel Processing4
Research in computing6
Comprehensive Viva4

Current Experience with Mediawiki

I really enjoy contributing to MediaWiki. The support from community made my contributions possible. Throughout the time period of making my intial contributions, I have learnt something new every day and sure will learn something new in future also.

Till now I have:

  • Set-up the development environment of MediaWiki core, OCG, MWOffliner and Zimwriterfs.
  • Basic familiarity with code and coding conventions.
  • Understood the process of submitting a patch and review (Phabricator, Gerrit and Git).
Microtasks and Bugs
  • Currently working on T114788: OCG should download resourceLoader js/css dependencies.
  • T98829: Search input cut off in no JavaScript mode(merged).
  • T103727: Empty message on watchlists is not center aligned (merged).

Past Experience

FOSS Projects

My first encounter with FOSS was Linux.

As being a FOSS user and a huge fan of FOSS, I use Ubuntu 14.04 as my operating system, Mozilla Firefox for browsing, VLC for media files, PHP, Python and other open source languages for programming and Wikipedia for reading about anything new.

However, I am a newbie to Open source community and this is my first-hand effort to contribution in a FOSS and really excited about it. I began with MediaWiki few months ago and submitted some patches in extension MobileFrontend and mw-ocg-bundler.

Other Projects

  • Alumni Portal for institute (Github Link)
  • Complaint Management System for an organisation (GitHub Link)
  • Team and Score Management System for a college event (GitHub Link)

My more work can be found at my GitHub profile.

Other Information

Do you meet the eligibility requirements outlined?: Yes
Preferred pronoun: she
Education: Student at International Institute of Professional Studies,DAVV graduating in December 2016
How did you hear about this program: From a friend Shaifali Agarwal, who was past intern of Outreachy (round 9) and GSoC 2015.

Event Timeline

Hi! We noticed you're a student. How much time do you think you can commit to the project per week? And would you be taking time off for exams or other commitments? A rough estimate of hours per week you can put in would be good.

We are approaching the Outreachy'11 application deadline, and if you want to have your proposal considered to be part of this round, do sign up and add your proposal at before November 02 2015, 07:00 pm UTC. You can copy-paste the above proposal to the Outreachy application system, and keep on polishing it over here. Keep in mind that your mentors and the organization team will be evaluating your proposal here in Phabricator, and you are free to ask and get more reviews complying

Adishaporwal lowered the priority of this task from High to Medium.Nov 1 2015, 4:41 AM

This looks good to me.

To break up the Milestone 3 task a little more:

  1. Create a new "do nothing" stage in mw-ocg-bundler, protected by a command-line flag, which will hold our new code.

1-prime) Turn on this flag when running tests.

  1. Add code to the stage to download the set of modules required for each page in the collection, and save it in a db.
  2. Add the list of modules to get the list of *unique* modules required to download.

3-prime) Add the list of "default modules" required on every page to this set.

  1. Download this list of modules from resourceloader

And step 4 could be split up by the type of module, for example, 4a) download the JavaScript, and 4b) download the CSS.

Your patch in does #1 and #2 above (but not 1-prime, which can be a follow-up patch). I encourage writing small patches, so the rest of the milestone can be 4 or 5 or 6 additional separate small patches, getting one step closer to the goal each time.

I also think milestones 4 and 5 might be better swapped. Once you get a basic HTML tree, you'll have something useful to look at, even though it will be a little ugly because it won't have the correct JS and CSS. Writing the loader script should be easier then because you'll have all the JS and CSS and HTML there on disk, you just need to write a little bit of code to get them loaded appropriately, and you'll be able to see right away when you have it working.

(The "rewrite/clean HTML to work offline properly" part of milestone 5 could become part of the "write the loader task" milestone, since they are both incremental steps to improve the output once you basically have all the bits on disk and are just hooking them up and tweaking them.)

@Adishaporwal, every single edit you make sends out a notification/email to a bunch of people. Try restricting your edits, please.

We find that you are having university/school during the Outreachy round 11 internship period ( Dec 2015 - March 2016 ). Please fill in answers for the following questions too in your proposal description so that we stick to the Outreachy norms. Thank You!

Will you have any other time commitments, such as school work, exams, research, another job, planned vacation, etc., between December 7, 2015 and March 7, 2016? How many hours a week do these commitments take? If a student, please list the courses you will be taking between December 7, 2015 and March 7, 2016, how many credits you will be taking, and how many credits a full-time student normally takes at your school:

Thank you for your proposal. Sadly, the Outreachy administration team made it strict that candidates with any kind of academic/other commitments are not approved for this round. Please consider talking with your mentors about further steps, and we hope to see your application ( the same too, if the consensus still exist ) in the next round of GSoC/Outreachy. Closing the same as declined, and all the very best for the next round!