Page MenuHomePhabricator

Determine the list of content packs to make available in V1
Closed, InvalidPublic

Description

A list of content packs will be available for users to select and downloaded in the Wikipedia Android app.

We need to determine an initial list of which content packs should be available for v1.

Initial List

Intial list of requested content packs is here:
https://docs.google.com/spreadsheets/d/1iz-eWiG5LpOCsEYzlrgkddn8Vf-BsC-ppul2MKke80M/edit?usp=sharing

*Note that languages will likely grow slightly before release.

Criteria for inclusion

  • Building off existing ZIM collections or curation efforts
  • Coverage over an initial set of priority languages. Language choices will come from 3 sources:
    • New Readers target countries' primary language(s)
    • Top 10 Wikipedia Android app countries' primary languages
    • Languages under threat of censorship
  • Coverage over a few sizes (esp. some at or under under 5GB)

Content

Because of the large number of variations for each package we're proposing we keep the content focused and simple, with 4 initial topics/themes:

  • Medicine (same as Kiwix)
  • Wikipedia 1.0 (same a Kiwix)
  • Top 5000 articles in a language
  • Top 50,000 articles in a language

Handling missing languages

For some combinations of topic+language no Kiwix collection or clean list of article URLs will exist (esp. Wikipedia 1.0). In those cases we'd like to pursue the proposal by Doc James to use the Wikidata identifier of items in the English version to find whatever content is available. That is, for each article in the English collection, determine if the article is available in the target language by using the Wikidata inter-language mapping. If the article exists in the target language, include it, if it doesn't, skip it.

Top N criteria

For "top N" collections we propose using the top N articles by pageviews in the 30 days prior to the generation of the collection.

Variations for media assets

For each content set, in each language, we also need to have three variants (similar to existing no-vid, no-pics, all variants). For our purposes we propose 3 levels:

  • All content, including any embedded audio, video or images ("Complete")
  • All content excluding video or audio, but including images ("No Media")
  • All content excluding video, audio or images ("No Images or Media")

Finalizing descriptions and names

Lastly, we're gonna have Communications weigh in on the final names and descriptions, but placeholders are included in the initial table.

These will need to be translated. Question for app team: can we do these translations through translate wiki or should we find another path.

Event Timeline

Initial step will be to review existing Kiwix library. Ideally we'd also get some kind of download counts or indicators of demand, but I'll see what we can find out.

@JMinor let me know how/if I can support on this.

Thanks @atgo ! Lets add this to our agenda for our 1:1 sync up around offline stuff next week.

For the public record, I just started digging into this, but my thinking was:

  • Define some desired criteria for the shape of the overall available library (what languages, what size ranges, what topical areas?)
  • Review existing zim list to pick out what already exists that can be used
  • Figure out where gaps exist and file tasks to create any needed additional zims
  • Write a wiki with the initial list and short descriptions, and then mark which will need localization
  • Finalize and socialize...

@Fjalapeno we also wanted to file a task around tracking downloads and usage of these packs so we can determine where to focus or potential feedback for communities about what content people are using. Is there a task for that or should I open one on the reading-infrastructure board?

Mholloway renamed this task from Determine the list of compilations to make available in V1 to Determine the list of content packs to make available in V1.Aug 11 2017, 12:48 PM
Mholloway updated the task description. (Show Details)
Mholloway updated the task description. (Show Details)
Mholloway subscribed.

@Mholloway More like in carbonite. Refer to T195518 and the Wiki page for details.