Page MenuHomePhabricator

Create a production-ready zim content pack generation and upload service
Closed, InvalidPublic

Description

In response to New Readers research showing the need for better offline support, as well as a well-received Community Wishlist proposal, the Wikimedia Apps team is working on improving the offline user experience by adding support for loading Wikipedia articles from ZIM files. On the app side, the technical work is nearly complete; searching and loading articles from one or more ZIM files loaded onto the device works well, and only productization work (i.e., user onboarding and integration with the rest of the app's functionality) remains to be done.

In addition to developing our general knowledge and competency around this technology, there are a couple of areas for exploration and possible improvement around the content of the ZIM files over those currently available:

  • We'd like to expand beyond Kiwix’s library of existing Wikipedia ZIM files. (T169905)
  • The HTML content of the articles in the existing ZIM files has a lot of Kiwix-specific formatting, which the Wikipedia app needs to strip before displaying it.[1] Ideally the articles in the ZIM file shouldn't be adulterated in any way, and should be identical to the content received if it a network request were made to get the same article. (T172764)
  • We'd like to expand upon and and improve the metadata that is "baked into" the various ZIM files offered to our users. This metadata is what the user sees when deciding which compilation to download, so it must be worded very clearly and meaningfully. (T164760)

There are also concerns around hosting ZIM files to be downloaded in-app:

  • We need to use infrastructure that we can scale for hosting content that we serve to our apps. That means that we need to find production WMF hardware to host the ZIM files that we serve. After some internal discussion, Swift, the service used to host all of the media content uploaded to Wikimedia Commons, has emerged as a strong candidate for hosting these files, and we need to test Swift’s capacity for handling and serving files of this size. (T172123)
  • Wikimedia's production Swift service does not permit uploading files from Cloud VPS, and therefore if Swift is indeed used for hosting ZIM files, they'll have to be uploaded from a ZIM file generation service set up consistently with the requirements for running in the Wikimedia production environment. Therefore we're working on prototyping an mwoffliner instance set up as it would be if running in Wikimedia production. (T172769)

In this work we plan to leverage the excellent tools developed by OpenZIM/Kiwix, and to work with them on desired updates and contribute our work back upstream.

[1] E.g., the unexpected license footer here:

main-MainActivity-08112017154234.png (1×1 px, 240 KB)

Related Objects

StatusSubtypeAssignedTask
DeclinedDbrant
ResolvedRHo
Resolved Mholloway
InvalidNone
Invalid Fjalapeno
Resolved Mholloway
Invalid Fjalapeno
InvalidNone
Resolved Mholloway
Resolved Mholloway
InvalidNone
InvalidNone
InvalidNone
InvalidNone
Invalid Mholloway
Resolved Fjalapeno
Invalid JMinor
InvalidNone
InvalidNone
Resolved Tbayer
Resolved Fjalapeno
InvalidNone
Resolvedfgiunchedi
Invalid Fjalapeno

Event Timeline

Removing myself as assignee so I don't accidentally auto-assign myself stuff when creating subtasks...

Mholloway renamed this task from Create a page compilation file generation and upload service to Create a production-ready page compilation file generation and upload service.Aug 11 2017, 1:00 PM
Mholloway renamed this task from Create a production-ready page compilation file generation and upload service to Create a production-ready zim content pack generation and upload service.
Mholloway updated the task description. (Show Details)
Mholloway updated the task description. (Show Details)

We'd like to expand beyond Kiwix’s library of existing Wikipedia ZIM files

It should be noted that until 2014 people were able to produce their own (small) ZIM files with the Collection extension, until the OCG regression (T73660).

Thanks @Nemo_bis . Unfortunately with the need for OCG to be deprecated, it hasn't made sense to rebuild that functionality. Could you help me understand exactly how this worked? I hadn't seen it in action. Users were able to create ZIM files and download them themselves, right? Was there any sort of repository for ZIM files that had been created by others (and moderation to go with it)? Happy to look at documentation if something exists - I didn't see it linked in the task there.

Mholloway changed the task status from Open to Stalled.Nov 13 2017, 6:13 PM

This is stalled, possibly indefinitely. Consider reopening if and when this work picks back up.