Page MenuHomePhabricator

Create a tool to auto-populate categories through Wikidata/other wiki comparison
Open, Needs TriagePublic

Description

Currently, when a new category is created, even if it is linked to Wikidata and other language categories, there is no easy way to generate a list of articles that exist on a given wiki that should be populated with it. And even when someone has a list of such articles, they have to manually tag them. I'd like to see a tool that generates such lists, and allows populating categories with as much ease as Commons:Help:Gadget-Cat-a-lot.

See also a related discussion at w:Wikipedia:Village pump (technical)/Archive_141

--Piotrus (talk) 05:15, 10 November 2015 (UTC)

This card tracks a proposal from the 2015 Community Wishlist Survey: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey

This proposal received 23 support votes, and was ranked #38 out of 107 proposals. https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Categories#Create_a_tool_to_auto-populate_categories_through_Wikidata.2Fother_wiki_comparison

Existing tools that solve (part) of this proposal:

Event Timeline

DannyH raised the priority of this task from to Needs Triage.
DannyH updated the task description. (Show Details)
DannyH subscribed.
IMPORTANT: If you are a community developer interested in working on this task: The Wikimedia Hackathon 2016 (Jerusalem, March 31 - April 3) focuses on #Community-Wishlist-Survey projects. There is some budget for sponsoring volunteer developers. THE DEADLINE TO REQUEST TRAVEL SPONSORSHIP IS TODAY, JANUARY 21. Exceptions can be made for developers focusing on Community Wishlist projects until the end of Sunday 24, but not beyond. If you or someone you know is interested, please REGISTER NOW.

I submitted a task for AWB that would help with this at T125971.

valhallasw subscribed.

I have been working on a proof of concept for this. It's still rough around the edges (and not so fast). This is the current output:

> Kategoria:Polscy spadochroniarze                          : Category:Polish skydivers
> > Kategoria:Cichociemni                                   : Category:Cichociemni
      Maciej Kalenkiewicz                               : Maciej Kalenkiewicz
      Hieronim Dekutowski                               : Hieronim Dekutowski
      Jan Piwnik                                        : Jan Piwnik
      Stefan Bałuk                                      : Stefan Bałuk
      Stanisław Jankowski                               : Stanisław Jankowski
      Leopold Okulicki                                  : Leopold Okulicki
      Adam Boryczka                                     : Adam Boryczka
      Franciszek Koprowski                              : Franciszek Koprowski
      Bolesław Kontrym                                  : Bolesław Kontrym
      Adolf Pilch                                       : Adolf Pilch
      Władysław Kochański (oficer AK)                   : Władysław Kochański
      Kazimierz Iranek-Osmecki                          : Kazimierz Iranek-Osmecki
      Cichociemni                                       : Cichociemni
      Marian Gołębiewski (żołnierz)                     : Marian Gołębiewski (soldier)
      Elżbieta Zawacka                                  : Elżbieta Zawacka
      Jan Nowak-Jeziorański                             : Jan Nowak-Jeziorański
      Wacław Kopisto                                    : Wacław Kopisto
      Jan Rogowski (cichociemny)                        : Jan Rogowski
> > > Kategoria:Osoby związane z cichociemnymi              : Category:Cichociemni
        Jerzy Zubrzycki                                   : Jerzy Zubrzycki
        Lech Bądkowski                                    : Lech Bądkowski
        Tadeusz Chciuk-Celt                               : Tadeusz Chciuk-Celt
> > > > Kategoria:Kurierzy i emisariusze rządu RP (1939–1945): Category:Cichociemni
          Jan Nowak-Jeziorański                             : Jan Nowak-Jeziorański
          Tadeusz Chciuk-Celt                               : Tadeusz Chciuk-Celt
          Jerzy Lerski                                      : Jerzy Jan Lerski
          Józef Retinger                                    : Józef Retinger
          Jan Karski                                        : Jan Karski
          Kazimierz Iranek-Osmecki                          : Kazimierz Iranek-Osmecki

The general mechanism is as follows:

  1. build the category tree on the source wiki (in this case pl.wikipedia)
  2. try to map each source wiki category to the most specific target wiki category. For example, Kategoria:Polscy spadochroniarze maps to Category:Polish skydivers, but the subcategories of Kategoria:Cichociemni are all mapped to Category:Cichociemni
  3. for each page in the source category, figure out if there is a page on the target wiki
  4. check if the target page is already in the target category. If not, list it.

I have tried to summarize the progress on this task at https://meta.wikimedia.org/wiki/Wikimedia_Blog/Drafts/WIP_Wikimedia_Hackathon_2016_post#The_connection_with_the_Community_Wishlist. Is there any beautiful screenshot in Commons that we can reuse? Is there a place to test what was demoed in Jerusalem?

Unfortunately not more than the two screenshots in this Task; I have not been able to find the time to code up a web interface, unfortunately. The full output for the example category is at https://en.wikipedia.org/wiki/User_talk:Piotrus/Archive_55#Auto-populating_categories_with_wikidata ; maybe we should just copy that formatted list to a mw.org page as an example output?

@valhallasw would you welcome help from an Outreachy intern( Dec 6 to March 6 ) on this task? The application period is open until Oct - 17.

If yes, let us know and we'll feature this task.

cawiki and ruwiki use LUA to populate categories from Wikidata properties.

@valhallasw would you be interested in continuing to work on this task? Do you need some help to get this done? I'm asking as we are recruiting projects and mentors for the upcoming rounds of GSOC/Outreachy.