Page MenuHomePhabricator

Investigation: Check how many categories and articles would be affected by changes
Closed, ResolvedPublic

Description

Some solutions we're looking into would only work with changes on article or category pages. Lets find some number to get a rough idea on how many of them would be affected.

  • Get the number of articles having the Frau category on de.wikipedia.org
  • Get the number of categories that could be affected by gendered labels on de.wikipedia.org

Related Objects

Event Timeline

awight renamed this task from Investigation: Check how we many categories / articles would be affected by changes to Investigation: Check how many categories and articles would be affected by changes.Jul 17 2019, 8:55 AM

I'm going to estimate the number of gendered categories by producing a list of all occupations and country demonyms, and matching against category labels. So for example, the occupation "Arzt" will match the category "Altägyptischer Arzt".

Dump of all occupations (TODO: filter out entries without a label)

SELECT ?item ?itemLabel 
WHERE 
{
  ?item wdt:P31 wd:Q28640.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
}

Demonyms:

SELECT ?item ?itemLabel
WHERE 
{
  ?item wdt:P31 wd:Q217438.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
}

Categories:

mysql dewiki -e 'select page_title from page where page_namespace=14' > categories.csv

Update: the demonyms are incomplete, only ten or so labels in German. I'll skip for now.

A bit of code to generate a nice fgrep pattern file, P8758

Matching is done like,

grep -E -f grep_cats.txt categories.csv

34 711 categories match, here's the full list:

A sample of 20 for manual validation:

shuf -n 20 occupations_cats.txt

Hochschullehrer_(Landwirtschaftliche_Universität_Athen)
Politiker_(Fehring)
Bürgermeister_(Leutkirch_im_Allgäu)
Fußballspieler_(Aydınspor_1923)
Übersetzer_ins_Malayalam
Polnischer_Meister_(Fechten)
Hammerwerfer_(Usbekistan)
Hochschullehrer_(Lauenburg)
Politiker_(Baden-Württemberg)
Beamter_(Thailand)
Bürgermeister_(Erfurt)
Offizier_des_Oldenburgischen_Haus-_und_Verdienstordens_des_Herzogs_Peter_Friedrich_Ludwig
Kammerherr_(Russland)
Maler_des_Impressionismus
Hockeyspieler_(Frankfurter_Sportclub_Sachsenhausen_Forsthausstraße)
Fotograf_(Graz)
Fußballspieler_(AC_Juvenes/Dogana)
Fußballspieler_(Paradou_AC)
Jüngerer_Bürgermeister_(Reichsstadt_Frankfurt)
Bürgermeister_(Münster,_Tirol)
Lea_WMDE claimed this task.
Lea_WMDE moved this task from Demo to Done on the WMDE-QWERTY-Spike-2019-07-09 board.