Page MenuHomePhabricator

Start developing metrics for content-diversity gaps
Closed, ResolvedPublic

Description

Tracking the work on developing metrics for the content gaps related to diversity with contractor.

Event Timeline

Update week 2021-01-25:

  • excited to start the work with contractor (Marc); first official day on 2021-01-26
  • we are getting set up in terms of email, meetings, documentation, meta-page
  • starting first discussions how to start:
    • we will start with a set of 3-4 gaps (gender, geography, age/recency) for which there are existing approaches how to map specific gap to content of articles (e.g. gender via biographies); for the other gaps this mapping is less clear
    • we will start to get an over view of academic literature to understand existing approaches
    • we will identify different stakeholders in the gaps (communities, affiliates, etc) to capture perspective from those impacted by the metrics of the gap (and knowing who to reach out to get feedback)

Update week 2021-02-01:

  • Marc started by revisiting resources and literature on content gaps
  • the main focus was to identify, and classify the different stakeholders in the corresponding gaps.
  • the main motivation is that this will serve as a starting point to reach out to the relevant communities for feedback on the metrics of the gaps

Update week 2021-02-08:

  • discussed in depth different stakeholders for gaps.
  • identified a smaller subset of stakeholders to reach out to in exploratory phase about metrics
  • started to define questions for initial interviews
  • main objective is to better understand stakeholder's priorities, process by which they try to bridge the gaps (e.g. how they content relevant to the gap), and take into account their expertise

Update week 2021-02-15:

  • we went through classification of different stakeholders from community (mostly related to gender), reached to several stakeholders from different countries and with different maturity levels. conducted the first interviews and scheduled one more for next week
  • extracted main themes and compared with existing research on measuring gender gaps. We identified considerable overlap serving as starting point to propose metrics for gender content gap. this was quite re-assuring that there seems to be a clear path to define metrics which could gain support from stakeholders.
  • started to look with more detail into the problem how to identify articles with relevant content (gender, geography, etc). The aim is that once we can identify relevant articles for different gaps, we can re-use (parts of) the metrics we developed for the gender gap.

Update week 2021-02-22:

  • discussed in more detail how we map the categories of each gap to the content to identify relevant articles. A general framework requires some set of annotations (e.g. wikidata-properties, categories, wikiprojects, etc). The main issues will be: i) how to find a set of annotations that work across different languages without much manual curating; ii) how to deal with incomplete annotations; for example gender-property of wikidata for biographies has high coverage but for other gaps this will probably be much lower and have systematic biases. for this we would probably have to apply methods to use existing labels to find further articles related to that category (similar to the approach isaac is developing for geography). the next steps will be in applying this general framework to a set of gaps we chose for the first round of analysis (gender, geography, recency)
  • Marc had several interviews with stakeholder about potential metrics for gaps. we are currently identifying the specific needs articulated in order to find candidate metrics that have a broad consensus and awareness but are still meaningful.

Update week 2021-03-01:

  • none (focus week)

Update week 2021-03-08:

  • we started to discuss in detail the awareness and support of the metrics for the gender gaps among the different stakeholders from the conducted interviews (link to sheet)
  • there is a wide variety of metrics that are used or wanted, but there are some main threads emerging around: selection (number of relevant articles in each project), extent (several metrics that could be associated with quality such as length, no. of images, no. of references, etc), power (how often articles enter deletion process), and visibility (how often they appear, e.g., on the main page)
  • the aim is to identify the main groups of metrics, prioritize them, and assess if and how they could be used as metrics for other content-gaps as well

A quick note to thank you and Marc for working on this and the weekly updates. I read them every week and they are very helpful for me to have a better sense of where the work is at, and also to learn as you move forward.

Update week 2021-03-15:

  • we identified 3 main facets of metrics for content gaps from the discussion with affiliates:
    • i) selection-score (counting the number of articles),
    • ii) extent-score which could be obtained from a compound score of the mntioned features (length, no. images, no. of references, etc) similar to the proposal in (Lewoniewski et al 2019) which can be applied across languages and is available via wikirank.
    • iii) visibility score capturing how much articles are featured on the main page
    • iv) framing-related metrics were considered important as well but are much harder to operationalize and will differ in their operationalization across the different gaps; thus, requiring additional future research (which is probably out of scope for this project)
  • these main themes were derived mostly from discussion around gender gap, the next step is to assess how much these metrics can be applied to the other gaps (geography, cultural context, recency, sexual orientation). if possible/available, identify stakeholders for these gaps.

Update week 2021-03-22:

  • we have a general framework to identify content relevant to a specific gap that we think will work for 5 content gaps: gender (biographies), geography (places), sexual orientation, cultural context/background, and time/recency. we have been discussing and coordinating with isaac and jaime around the geography gap to make this consistent with similar efforts for gaps related to editors and readers
  • for the moment we have paused discussions around "Important topics" such as medicine as this is often not clearly defined what should be captured; we will revisit this at a later stage; this will also allow us to capture other aspects related to the narrow operationalization of the 5 content-gaps above (e.g. instead of gender-biographies, the interviews revealed interest in coverage of gender-related topics such as fenimism more generally)
  • metrics: we continued discussions around metrics and it seems that selection and extent scores as mentioned above seem good candidates for the first set of metrics because they: i) have high degree of awareness among community, ii) are mature in terms of having been used in multiple publications, iii) are actionable, iv) are straighforward to apply to all languages, v) are (mostly) straightforward to apply across the different gaps.
  • the plan is to prepare a more polished and detailed written summary of this conclusion to also identify missing steps. in the long-run, we will then aim to start working on a prototype-implementation of these metrics for the 5 gaps

Update week 2021-03-29:

  • polishing the write-up of the general procedure to identify relevant content and choosing relevant metrics
  • discussed some of the specifics around some of the gaps, specifically Time/recency and cultural context. For example, there are different possibilities for how to choose which articles containing time references; or how we can assess the accuracy of identifying the set of articles we associate with a gap. for gender, both, precision and recall is known to be very high; for gaps such as time or sexual orientation, we do not know. this will likely require some form of manual assessment of a smaller random subsample. ensuring high recall (or at least an absence of systematic bias) is important for making sure that metric such as selection are trustworthy
  • we started to plan the work related for the prototype-implementation of metrics for 5 gaps: gender, geography, cultural background, time, and sexual orientation. the plan is to be able to calculate the relevant metrics for each gap for one Wikipedia for one snapshot. we are confident at the moment we can do this in the next 4-6 weeks.
  • Marc started to do literature review on works that tried to measure gaps in Wikipedia related to geography, time, and sexual orientation (there are not many compared to gender but we want to make sure we have not missed anything important)

Update week 2021-04-05:

  • discussed strategies for validation of the set of articles relevant for the gap. Based on a set of labels (in most cases from Wikidata to scale across all languages), what is the precision (number of selected articles with the correct label) and the recall (how many relevant articles are captured with the used labels). The recall is important when comparing the total number of articles with respect to a gap in a project. While for the gender gap the Wikidata-labels related to gender have a high coverage of biography-articles, we know that other labels in Wikidata for identifying other gaps have a much lower and biased coverage (such as P172); this in turn would make estimates of selection unreliable.
  • reviewed literature around gaps of sexual orientation and age/recency. We discussed in more detail what is the best way to identify relevant articles

Update week 2021-04-12:

  • started this week a write-up of the work in this document.
    • this focuses on 5 content gaps and lays out a strategy/proposal for measuring those gaps
    • the aim was to bring together the individual pieces into a coherent story; this will serve as a comprehensive guideline for implementation containing the justification for the choices made along the way.
    • we will probably spend the next week filling in the gaps

Update week 2021-04-19:

  • continued to work on the write-up
  • prepared slides for the presentation of initial results for next tuesday-meeting

Update week 2021-04-26:

@MGerlach Thank you for the update, and my thanks to Marc for the presentation on Tuesday. While I had to miss it sadly, I caught up on the slides today and I find this deeper view to the work very fascinating. And I mentioned this to you already but will put it here for visibility, too: the kind of paper the two of you are working on will be really valuable and can help further shape the conversation around participatory approaches for metrics definition which can benefit Wikimedia and the broader ecosystem.

Update week 2021-05-04:

  • this week we started the hands-on work for the code to implement the metrics for the 5 gaps, specifically pre-processing the dumps and retrieving relevant information

Update week 2021-05-17:

  • we were able to finish a first iteration on processing a snapshot of the dumps to retrieve all articles of a wiki relevant to a specific gap, assigning them to one of the gaps' groups, and retrieve relevant features to calculate selection and extent scores; this was done for one snapshot of all Wikipedia-projects for the 5 gaps under consideration
  • the next step is to derive a lighter table only containing the selection and extent metrics for the respective gaps to be used for generating the histograms for each project/gap
  • next priority is to finish the technical documentation for generating tables and scores for easier replication in follow-up tasks

Update week 2021-05-24:

  • finished processing of data to generate relevant tables with scores for the different gaps
  • created a first set of visualizations for the selection and extent metric for all 5 gaps for different wikis; we will likely make some iterations since this will be important when getting final round of feedback from community
  • we are finishing the technical documentation on how to calculate the gaps (including the visualization)
  • we are preparing a final round of community feedback now that we have calculated a set of metrics for different wikis

Update week 2021-05-31:

  • spent more time iterating on the technical documentation to make it easier to replicate metrics with the current code (repo: https://github.com/marcmiquel/WDO/tree/wcdo/src_data )
  • revising example visualizations of the metrics. for each gap, we have 3 metrics (selection, extent, visibility). below is an example for the geography-gap comparing different projects. We have similar plots where we compare the same project with itself at different points in time to follow the time evolution.

geography-gap: selection-metric (number of articles)

geography_selection.png (735×793 px, 96 KB)

geography-gap: extent-metric (length/images/references/etc)

geography_extent.png (745×796 px, 122 KB)

geography-gap: visibility-metric (number of featured articles)

geography_visibility.png (626×796 px, 102 KB)

Update week 2021-06-07:

  • we are closing the technical documentation document and will upload to meta
  • we are closing the example-set of visualizations for all the metrics of the 5 gaps
  • we are preparing the final round of community engagement

Update week 2021-06-14:

  • created short summary-presentation of results for final round of community engagement (to be sent out next week)
  • submitted proposal for session at Wikimania around indicators for wikimedia projects (in collaboration with work around indicators for knowledge integrity and community health
  • plan for the remaining weeks is to finish the non-technical documentation and upload to the meta project-page )

Update week 2021-06-21:

Update week 2021-06-28:

Closing this task as the project is completed.