Page MenuHomePhabricator

Clean up History and Society.Society in the topic taxonomy.
Closed, ResolvedPublic

Description

From @Isaac

For History and Society.Society, there is a mixture of ethnic groups, general human rights, and a few odd inclusions such as Forestry (yaml) -- I'm happy to work with others to clean that up a bit. Probably could move the different ethnic/language groups to Linguistics or appropriate geographies and let the topic focus on Sociology, Feminism, Human Rights, etc.

Event Timeline

My rough first pass on the existing WikiProjects with the huge caveat that I might get the geographic topics incorrect -- e.g., whether a WikiProject belongs in West Asia or Central Asia -- and ideally we'd have someone more familiar with these distinctions help clean that up. I rearranged the order below so it would be a bit easier to understand the recommendations. Very open to iteration.

Society:
 - WikiProject Gender Studies [Keep]
 - WikiProject LGBT studies [Keep]
 - WikiProject Sexology and sexuality [Keep]
 - WikiProject Ageing and culture [Keep]
 - WikiProject Animal rights [Keep]
 - WikiProject Corruption [Keep]
 - WikiProject Cultural Evolution [Keep]
 - WikiProject Disability [Keep]
 - WikiProject Globalization [Keep]
 - WikiProject Home Living [Keep]
 - WikiProject Human rights [Keep]
 - WikiProject Human Rights in Sri Lanka [Keep]
 - WikiProject Nonviolence [Keep]
 - WikiProject Ethnic groups [Keep]
 - WikiProject Anthropology [Keep]
 - WikiProject Sociology [Keep]
 - WikiProject Feminism [Keep]
 - WikiProject Indian caste system [Keep though there are only a few associated articles]

 - WikiProject African diaspora [This one is tough. I'd lean towards Keep though it's a very diverse set of articles (https://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:WikiProject_African_diaspora). If we know someone involved with the project, we could ask them where they see this fitting too]
 - WikiProject Awards [Could keep -- not really sure what to do, there are a massive number of articles associated with this WikiProject: https://en.wikipedia.org/wiki/Category:Awards_articles ]

 - WikiProject Environment [Move to STEM.Earth and environment]
 - WikiProject Fisheries and Fishing [Move to STEM.Earth and environment]
 - WikiProject Forestry [Move to STEM.Earth and environment]
 - WikiProject Agriculture [This one is hard as it touches on plants, animals, people, businesses, farming techniques. I think maybe the most sensible thing is either to remove or put in STEM.Earth and environment]

 - WikiProject Modern Western Europe [Can remove -- I don't think it tagged any articles because I can't find an associated template: https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Modern_Western_Europe ]
 - WikiProject Alternative views [Remove entirely -- this will look bad from a conspiracy-theory-standpoint but it's huge hodgepodge of topics that are only connected by being related to fringe theories -- see: https://en.wikipedia.org/wiki/Category:WikiProject_Alternative_Views_articles for a list of articles]
 - WikiProject Asian Americans [Can delete -- they use the WikiProject United States template with a field to indicate it's Asian-Americans, so any article tagged by this project is being mapped to Geography.North America]
 - WikiProject Franco-Americans [Similar to Asian Americans, uses WikiProject North America template so can be removed as it is currently being mapped to Geography.North America]

 - WikiProject Pakistani history [Move to History and Society.History]
 - WikiProject Russian history [Move to History and Society.History]

 - WikiProject Arab world [Could move to Geography.Asia.West Asia though frankly it probably makes more sense if there is a MENA (Middle East North Africa: https://en.wikipedia.org/wiki/MENA) topic -- see https://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:WikiProject_Arab_world ]
 - WikiProject Assyria [Move to Geography.Asia.West Asia (again plug for MENA as its own topic)]
 - WikiProject Israel Palestine Collaboration [Move to Geography.West Asia again with plug for MENA]

 - WikiProject Azerbaijan [Move to Geography.Asia.West Asia and also probably remove from Culture.Biography]
 - WikiProject Pashtun [Move to Geography.Asia.West Asia and also probably remove from Culture.Biography]
 - WikiProject Taiwan [Remove; already in Geography.Asia.East Asia and also probably remove from Culture.Biography]
 - WikiProject Tamil civilization [Move to Geography.Asia.South Asia]

 - WikiProject Basque [Move to Geography.Europe.Southern Europe and also probably remove from Culture.Biography]
 - WikiProject Clans of Scotland [Move to Geography.Europe.Northern Europe]

 - WikiProject Berbers [Move to Geography.Africa.Northern Africa and also probably remove from Culture.Biography]
 - WikiProject Igbo [Move to Geography.Africa.Western Africa and also probably remove from Culture.Biography]

@Isaac -- thanks for working on this. Do you need input from me (or anyone else)?

@MMiller_WMF thanks for chiming in.

  • Short-term: I spoke with @Halfak about this and he's comfortable with me making the pull request to make the changes outlined above. I will probably do that but I think we should be thinking more long-term too about how we want to handle changes to the taxonomy.
  • Long-term: A lot of the changes we are making right now aren't particularly controversial (e.g., WikiProject Forestry should be part of STEM.Earth and Environment), but we also are making changes that are much trickier (e.g., what topics should WikiProject African diaspora map too given that it doesn't cleanly align with any of our existing hierarchy) that then affect newcomer tasks and presumably more technologies as we expand this work out. I think there is a very legitimate discussion to be had too about whether the Middle East / North Africa should be a separate topic space in Geography or split up into regions of Asia/Africa as it currently is. Ideally this would be a mixture of seeking input from the individual WikiProjects and/or individuals who are more familiar with the topics but that's maybe a larger lift than what we can do right now. I would appreciate support if you could help us think about how to address this long-term challenge of giving the community more say over the taxonomy. If you know people we could bring in right now to not just evaluate the model predictions but to also take a step back and evaluate our taxonomy (and the changes I suggested above), that also is welcome.

As I was trying to figure out how to interpret what some of these WikiProjects covered too, I realized that I could go about it empirically. I went through our dataset of articles and their associated WikiProjects and calculated what topics would be applied to those articles if a given WikiProject were removed. In the spreadsheet below, I provide the top ten most prevalent topics for each WikiProject (that had at least 25 associated articles, so if you don't see it, it's a bug in our collection or that WikiProject barely tags any articles). For instance:

  • 30% of articles in WikiProject Biography are sports bios, 22% about Europeans, 15% North America, but then 13% would still have Culture.Biography* applied because e.g., WikiProject Women Scientists or WikiProject Living people tagged the article too.
  • For articles tagged by WikiProject Lepidoptera, the top topic that would otherwise be applied is only STEM.STEM* for 5.1% of articles, suggesting that most articles under WikiProject Lepidoptera don't have another WikiProject that has tagged them (so if we removed this WikiProject from our dataset, we'd lose a lot of data).
  • We can see that 100% of articles in WikiProject UK Parliament constituencies would get mapped to Politics/Government anyhow so we could remove that WikiProject and it would have minimal effect.
  • We can see that WikiProject Awards is mainly about media-related awards and so perhaps is best removed from general Society
  • We can see that WikiProject African Diaspora covers a lot of people, mainly in North America, and touches on a broad array of topics (Media, History, Society, Arts, Politics, etc.)

The data is here: https://docs.google.com/spreadsheets/d/1uXelaSPyR6EGmQoUKkf5rsXuMXBNxxqyHKFPdFEn_tc/edit?usp=sharing
Hopefully it's useful in thinking about where some of these WikiProjects should move!

Taxonomy updated -- changes would still need to be made in drafttopic Makefile (here and here) but that would require retraining and redeploying the existing models, which is largely on hold so I will leave that for future development. These updates will be incorporated into any new experimental models though as they don't affect the resulting taxonomy, just the quality of training data and predictions.