A list of content packs will be available for users to select and downloaded in the Wikipedia Android app.
We need to determine an initial list of which content packs should be available for v1.
Initial List
Intial list of requested content packs is here:
https://docs.google.com/spreadsheets/d/1iz-eWiG5LpOCsEYzlrgkddn8Vf-BsC-ppul2MKke80M/edit?usp=sharing
*Note that languages will likely grow slightly before release.
Criteria for inclusion
- Building off existing ZIM collections or curation efforts
- Coverage over an initial set of priority languages. Language choices will come from 3 sources:
- New Readers target countries' primary language(s)
- Top 10 Wikipedia Android app countries' primary languages
- Languages under threat of censorship
- Coverage over a few sizes (esp. some at or under under 5GB)
Content
Because of the large number of variations for each package we're proposing we keep the content focused and simple, with 4 initial topics/themes:
- Medicine (same as Kiwix)
- Wikipedia 1.0 (same a Kiwix)
- Top 5000 articles in a language
- Top 50,000 articles in a language
Handling missing languages
For some combinations of topic+language no Kiwix collection or clean list of article URLs will exist (esp. Wikipedia 1.0). In those cases we'd like to pursue the proposal by Doc James to use the Wikidata identifier of items in the English version to find whatever content is available. That is, for each article in the English collection, determine if the article is available in the target language by using the Wikidata inter-language mapping. If the article exists in the target language, include it, if it doesn't, skip it.
Top N criteria
For "top N" collections we propose using the top N articles by pageviews in the 30 days prior to the generation of the collection.
Variations for media assets
For each content set, in each language, we also need to have three variants (similar to existing no-vid, no-pics, all variants). For our purposes we propose 3 levels:
- All content, including any embedded audio, video or images ("Complete")
- All content excluding video or audio, but including images ("No Media")
- All content excluding video, audio or images ("No Images or Media")
Finalizing descriptions and names
Lastly, we're gonna have Communications weigh in on the final names and descriptions, but placeholders are included in the initial table.
These will need to be translated. Question for app team: can we do these translations through translate wiki or should we find another path.