Quality Model: Streamline
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Isaac
	Feb 1 2022, 8:16 PM

Description

Work with Knowledge Gaps project to implement quality model for computing extent scores.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		Isaac	T293478 Content Tagging Models
		Resolved		Isaac	T300670 Quality Model: Streamline

Event Timeline

Isaac created this task.Feb 1 2022, 8:16 PM

Isaac moved this task from Backlog to FY2021-22-Research-Jan-March on the Research board.Feb 4 2022, 2:16 PM

Isaac edited projects, added Research (FY2021-22-Research-Jan-March); removed Research.

Weekly updates:

New quality model: https://meta.wikimedia.org/wiki/Research:Prioritization_of_Wikipedia_Articles/Language-Agnostic_Quality#V2
- This release has expanded features (categories + links), just wikitext as a dependency, was evaluated on not just English but also Arabic and French, and just in general I think is better fit to the actual distribution of article quality. I also updated the APIs for the quality model UI and user script and published a dump.
- I also discovered a bug that had been driving up quality score predictions in the V1 model that I fixed where though the feature normalization thresholds vary by wiki, I had e.g., set a minimum of 10 references for a high-quality article in each wiki even if that the highest quality articles in that wiki had fewer than 10 references. The bug however though was instead of enforcing that as a minimum, I accidentallly enforced it as a maximum, effectively driving up the predicted quality of articles in the bigger wikis with more developed articles. I should have caught this but it only happened in the bulk prediction pipeline, not the API I set up or model development aspects. I definitely will suggest that anyone using the scores update to the V2 model!

Weekly updates:

No updates -- waiting for work to start on porting to Gitlab which will undoubtedly lead to some other improvements

Weekly updates:

No major updates to the model but I presented work to Campaigns and they were interested in how to also surface what work is most needed on an article and which sections would benefit most from improvement.
I updated the interface to display the information about the quality of each article component (it was always in the model just not surfaced): https://wiki-topic.toolforge.org/quality?lang=en
I also created a simple user script for English Wikipedia that shows the predicted quality score and a simple suggestion of where needs the most improvement. As a bonus it doesn't use the API (all the features come from processing the HTML on the page) so it's super fast / lightweight and has no external dependencies. Can be tested out via enabling the script a la: https://en.wikipedia.org/wiki/User:Isaac_(WMF)/common.js#L-4

Weekly updates: none

leila moved this task from FY2021-22-Research-Jan-March to FY2021-22-Research-April-June on the Research board.Apr 8 2022, 2:35 AM

leila edited projects, added Research (FY2021-22-Research-April-June); removed Research (FY2021-22-Research-Jan-March).

Weekly updates:

Next step will be to try to write model card per https://meta.wikimedia.org/wiki/Machine_learning_models

Weekly updates:

Draft model card available: https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language-agnostic_Wikipedia_article_quality_model_card#Ethical_considerations%2C_caveats%2C_and_recommendations
As part of preparing that model card, I also did a more formal evaluation of model performance: https://public.paws.wmcloud.org/User:Isaac_(WMF)/Quality/Quality_Model_Evaluation.ipynb
I also updated the thresholds I use to convert decimal scores (0-1) to class labels (stub, start, etc.) to better align with the groundtruth data.

Weekly updates: continued fine-tuning of model card based on feedback from Hal / Pablo: https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language-agnostic_Wikipedia_article_quality_model_card

Weekly updates: none

Weekly updates: worked with DS to make sure his implementation of the model for all of wikitext history made sense. discussed how to set the right thresholds for each feature (use only current version of wikitext to determine 'top quality' articles not every revision).

Weekly updates: none

working with DS and PD on getting historical evaluation of quality working. sent them patch for images in templates/galleries but main issue still assumed to be selecting the expected features for high quality articles based only on current snapshot and not all of history (which overweights high-quality, highly-edited articles).

will close this in a week as the current model has continued to perform as expected even for larger wikitext snapshots. a future V3 model will likely address a few additional bugs (ignore empty sections but include level 4 and deeper ones, perhaps remove commented-out wikitext first) but it's likely worth waiting for a few additional features before doing that (wikidata item completeness, penalize for maintenance templates, language-agnostic readability metric).

Closing this out. Met with DS and PD this week to discuss historical quality and their models are working as expected now that they are using the current features and updated media extraction.

Quality Model: StreamlineClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Quality Model: Streamline
Closed, ResolvedPublic
Actions

Related Objects
Search...