Page MenuHomePhabricator

Get all the pages in Hebrew for which there are corresponding articles in other languages
Closed, ResolvedPublic

Description

Goal: Creating the list of pages with corresponding articles is a first step to find pages that were translated.

Points to think about:

  • We might want to create a list for every other language (EN, RU...).
  • make sure that we are not missing information that is recorded in text of the page itself.

Possible approaches to completing the task:

  1. Working with the API Sandbox:
    • An example for a query for one article: Michael Jordan
    • write a script that first gets all the Hebrew titles as ids and then calls the language-links query on all these titles.
    • It might be good to do a time slice and just look at edits within a particular date range.
  2. Working with Wikimedia dumps:
    • download "Wiki interlanguage link records" dump.
    • if we want to get the other language's ID then "Name/value pairs for pages" dump from that language should be downloaded.
    • build an SQL query that returns as the IDs.
  3. Working with Wikidata dumps:
    • download the dump wikidatawiki-latest-langlinks.sql.gz
    • build an SQL query that returns as the IDs.
NOTE: Decide what to do with inline interlanguage links and with contradictions between wikidata and local links

Event Timeline

Livnetata claimed this task.
Livnetata raised the priority of this task from to Needs Triage.
Livnetata updated the task description. (Show Details)
Livnetata changed Security from none to None.
Livnetata added subscribers: Livnetata, Jsahleen, Amire80.

Update:

  • By working with the Hebrew Wikipedia dumps, I can get a list of the Hebrew Page IDs (and if wanted - the titles) that have a corresponding page in English.
  • It seems that we should try to work with wikidata as the information there is solid.
  • Therefore, I tried to understand what Magnus is doing in "not-in-the-other-language" tool but right now, I can't find the script (his repository is a mess). I will try to contact him and ask for more information as I think understanding how this tool works might help us in the future.