Page MenuHomePhabricator

Add ORES article quality predictions to the WDQS
Closed, DuplicatePublic

Description

From @Spinster

We have started (experimentally) tracking content progress of the Dutch (multilingual) WikiProject [https://nl.wikipedia.org/wiki/Wikipedia:Wikiproject/Wiki_goes_Caribbean Wiki goes Caribbean] via Wikidata. Can ORES article quality for supported Wikipedia languages be added to this process of assessment, and if so, what would be the best way to get there?
Topics related to the WikiProject are (manually) tracked via a Wikidata P5008 statement (query: https://w.wiki/WJW )
Do note: this set of topics is dynamic - Wikidata items can be added or removed as the project progresses
We've started (experimentally) tracking coverage of these topics on various relevant Wikipedias and on Commons, see [https://docs.google.com/spreadsheets/d/1c_RYfqwPGuRiO2MJ38iO5_Ibg6c2MzfasbjnrfBdY6A/edit#gid=0 this spreadsheet] which is used for measuring coverage progress over time on nlwiki, enwiki, papwiki, eswiki and Commons.
Question: can average ORES article quality for these topics on (at least) English Wikipedia, and later also Dutch Wikipedia, be included in measurements as well?
Would this produce 'meaningful' numbers/scores that are 'legible'/'interpretable' by laypeople (with some explanation if needed) and indeed indicate general quality development over time?
If so, are there already (non-coder friendly) tools with which relevant ORES article quality scores can be retrieved for a given set of Wikidata items / a Wikidata query / a set of Wikipedia articles?
If not, does it make sense to e.g. submit a feature request for tools like PetScan to provide ORES article quality scores as output?
All other tips and input very welcome.

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We already store article quality predictions in the ores_classification table on the wikis where we have support.

We store some prediction in Elastic Search related to topic (see the "articletopic:foo" keyword). I'm not sure about how much Elastic Search and WDQS infra overlap, but that might be relevant.

Spinster removed a subscriber: SandraF_WMF.
Spinster subscribed.
dcausse subscribed.

As we try to split the graph as much as we can I think the proper approach to this would be store this data into a dedicated graph exposed through its own sparql endpoint and connected to wdqs through sparql federation.
This is not something the Search Team may have time to work on in the near term so if someone has the bandwidth to setup such endpoint we'd be happy to update the federation endpoinds whitelist with such service.