This is a proposal for outreachy(Round-11) to add ZIM support to OCG.
Proposal
Public URL: T73660
Name and contact information
Name: Adisha Porwal
Email: porwaladisha@gmail.com
IRC Nick: adisha
Mediawiki User: Adishaporwal
Location: India
Time Zone: UTC+5:30
Typical working hours: 3:00 PM to 12:30 AM (Indian Standard Time)
Internet Presence
Github Profile: adishap
LinkedIn Profile: Adisha Porwal
Twitter : @AdishaPorwal
Synopsis
Mediawiki is the wiki engine behind Wikipedia, all Wikimedia projects and thousands other Web sites. Mediawiki hosted content can be made available for offline usage through the Collection Mediawiki extension (written in PHP). The Collection extension allows to easily create collection/selection of articles: so called books, and is already installed on all Wikimedia projects an many other one. One time created, books can be exported in the PDF format. The PDF exporting backend itself is not provided by the Collection extension, it's done with a Node.js (JavaScript) backend solution called Offline Content Generator (OCG). Presently, OCG only supports PDF format. PDF is a great format but does not alow a web-near experience, for that we have the the ZIM file format. The ZIM format allows to store huge amount of web pages (with images, videos, etc...) in one extremely compressed file. This project will provide functionality to support ZIM to OCG.
Skills
- Node.JS
- HTML
- PHP
- Debian packaging
How it will benefit MediaWiki or Wikimedia projects?
- Mediawiki hosted content can be availed offline in ZIM format to read everywhere with a reader like Kiwix. Any visitor will be able to build its own collection of article for offline usage, even with many hundreds of articles.
- The project will also help to integrate the functionalities of OCG and MWOffliner, an already existing standalone solution to export Mediawiki content to ZIM.
Mentors
Milestones and Deliverables
| Milestone | Description | Duration | Deliverable |
|---|---|---|---|
| Milestone 1 | Envision Phase | Before 17 November 2015 | Development environment setup |
| Milestone 2 | Community Bonding Period | 17 November - 7 December 2015 | WorkFlow of project |
| Milestone 3 | Downloading ResourceLoader css/javascript dependencies | 7 December 2015 - 9 December 2015 | Modules export functionality by OCG |
| Milestone 4 | Generation of standalone HTML tree on filesystem | 10 December 2015 - 27 December 2015 | Standalone HTML tree generation code |
| Milestone 5 | Build a Custom Loader for css/javascript dependencies | 28 December 2015 - 15 January 2016 | Custom loader script to work HTML tree offline correctly |
| Milestone 6 | ZIM file generation (zimwriterfs integration) | 16 January 2016 -3 February 2016 | ability to convert HTML tree to ZIM format |
| Milestone 7 | Debian Package creation for zimwriterfs | 4 Febuary 2016 - 25 February 2016 | Debian package of zimwriterfs |
| Milestone 8 | Final Code Review and Documentation | 26 February 2016 - 7 March 2016 | Source code, Project Report |
Schedule
Before 17 November - Milestone 1
- Remain in constant touch with mentor(s) and community.
- Getting myself familiar with development environment.
- Getting myself familiar with working of Node.Js and packaging.
- Study required docs.
- Fix some bugs along the way and get my hands dirty with code.
17-November-2015 to 6 December 2015 - Milestone 2
- Discuss, prepare and finalize workflow for development phase with mentors and community.
- Get myself familiar with architecture and implementation of OCG and MWOffliner.
7 December 2015
Actual Coding period begins.
7 December 2015 to 9 December 2015 - Milestone 3
This Milestone can be achieved in following steps:
Step 1:
1.1 - A "do nothing" stage in mw-ocg-bundler, protected by a command-line flag, to new code.
1.2 - Turn on this flag when running tests.
Step 2:
Add code to the stage to download the set of modules required for each page in the collection, and save it to 'modulesDb'.
Step 3:
3.1 - Add the list of modules to get the list of *unique* modules required to download.
3.2 - To the above list, add the list of "default modules" required on every page to this set.
Step 4:
Download this list of modules obtained above from resourceLoader
Step first and second is tried to achieve in the patch.
It is also a microtask for project, So, it is expected to be done before that. If completed, will begin with research on implementation of milestone 4.
10 December 2015 to 27 December 2015 - Milestone 4
- Build standalone HTML tree ( "self-sufficient" HTML content with images, javascript, stylesheets) using pre-existing ZIP file (known as bundle) that is generated by mw-ocg-bundler (mediawiki article spider tool).
28 December 2015 to 15 January 2016 - Milestone 5
- Develop custom loader (javascript tool) that will load modules when the HTML article is displayed.
- Rewrite/clean HTML to work offline properly.
16 January 2016 to 20 January 2016
- Test the portion of project completed.
- Document the achieved milestones.
- Get familiar with working of MWOffliner.
20 January 2016 to 26 January 2016
- Invoke 'zimwriterfs' (console tool to create ZIM files from a locally stored directory containing an HTML tree) using OCG.
- Discuss with mentor(s) about the future work on related milestone.
26 January 2016
Mid Term Evaluation
27 January 2016 to 3 February 2016 - Milestone 6
- Convert HTML tree to ZIM format using 'zimwriterfs'.
4 February 2016 to 22 February 2016 - Milestone 7
- Get familiar with packaging.
- Create Debian package of zimwriterfs as it is required for easy/controlled installation in production.
23 February 2016 to 29 February 2016 - Milestone 8
- Code Review by me and mentors.
- Inact information from code review.
- Conduct several tests.
- Document the project.
1 March 2016 to 7 March 2016
- A buffer period required for final polishing of work.
7 March 2016
Firm Pen Down
Participation
Communication of progress
- IRC channel: I'll stay online on IRC at #kiwix, #mediawiki-parsoid, #wikimedia-dev in freenode, in my working hours.
- Email: I shall keep mentors updated of my regular work through direct updates via emails.
- The project progress will be updated weekly on the sub-namespace of my user page.
- Mailing List: I will use it to communicate progress regularly.
- Blog : I shall keep my blog updated with regular updates of my work, ideas and helpful posts.
Where I would turn for help?
Search by myself
- Self-research (initially) through available documentation, articles, blog posts and forums.
Seek help from community
- Ask the community at IRC channel for help.
- Post queries to the relevant mailing lists or through direct emails to the mentor and related developers.
Source Code
- Source code will be pushed to Gerrit.
About Me
I am Adisha Porwal, a fifth year student of an integrated masters program at IIPS-DAVV with computer science major.
I am enthusiastic and active member of Development Center (DC), a part of my college. Development Center aims at bringing people closer to open source technologies. As a DC member, I have taken workshops on Python, HTML, CSS for college students and went to village for making the girls aware of computer basics.
Programming is something I like. I use Python, PHP, Javascript, CSS, HTML, Mysql for my projects. My projects can be found at my GitHub profile.
For contributing to open source world, version control systems are something one should be aware of. I am familiar with git internals. For my projects, I use GitHub to version control and share my code with the world. While contributing for Wikimedia, I got aware of Gerrit code review system.
During Outreachy internship, I promise to work for at least 40 hours a week.
Other Commitments:
I have winter break from 1 December 2015 to 5 January 2016. My tenth semester will start from 6 January 2016. Schedule for tenth semester is not announced yet but it is expected that I don't have end semester exams till 1 March 2016. My college will take maximum of 15 hours per week.Other than internship and my college, I don't have any other commitments that can interfere.
Courses I will taking during internship period in my college :
| Course | Credit |
|---|---|
| Formal Language Theory | 4 |
| Parallel Processing | 4 |
| Research in computing | 6 |
| Comprehensive Viva | 4 |
Current Experience with Mediawiki
I really enjoy contributing to MediaWiki. The support from community made my contributions possible. Throughout the time period of making my intial contributions, I have learnt something new every day and sure will learn something new in future also.
Till now I have:
- Set-up the development environment of MediaWiki core, OCG, MWOffliner and Zimwriterfs.
- Basic familiarity with code and coding conventions.
- Understood the process of submitting a patch and review (Phabricator, Gerrit and Git).
Microtasks and Bugs
- Currently working on T114788: OCG should download resourceLoader js/css dependencies.
- T98829: Search input cut off in no JavaScript mode(merged).
- T103727: Empty message on watchlists is not center aligned (merged).
Past Experience
FOSS Projects
My first encounter with FOSS was Linux.
As being a FOSS user and a huge fan of FOSS, I use Ubuntu 14.04 as my operating system, Mozilla Firefox for browsing, VLC for media files, PHP, Python and other open source languages for programming and Wikipedia for reading about anything new.
However, I am a newbie to Open source community and this is my first-hand effort to contribution in a FOSS and really excited about it. I began with MediaWiki few months ago and submitted some patches in extension MobileFrontend and mw-ocg-bundler.
Other Projects
- Alumni Portal for institute (Github Link)
- Complaint Management System for an organisation (GitHub Link)
- Team and Score Management System for a college event (GitHub Link)
My more work can be found at my GitHub profile.
Other Information
Do you meet the eligibility requirements outlined?: Yes
Preferred pronoun: she
Education: Student at International Institute of Professional Studies,DAVV graduating in December 2016
How did you hear about this program: From a friend Shaifali Agarwal, who was past intern of Outreachy (round 9) and GSoC 2015.