Page MenuHomePhabricator

On en.wikipedia, ref tags inserted by the shortened footnote template, {{sfn}}, are not counted in ORES features
Closed, ResolvedPublic

Description

On en.wikipedia, many articles — especially well-developed ones — may use the "sfn" template or similar, in place of a ref tag. This does not contribute to the ref_tags count, even though it is used to replace a ref tag and the end result for a reader is essentially the same as using a ref tag.

For example, this article with more than 100 footnotes has only 4 ref tags:

https://en.wikipedia.org/w/index.php?title=Cymbeline&oldid=902872698
https://ores.wmflabs.org/v3/scores/enwiki/?models=articlequality&revids=902872698&features

Relatedly, the 'paragraphs without refs total length' covers nearly the whole article content, even though a majority of the prose is in paragraphs with references.

ORES models would likely be more accurate in quality predictions if ref-tag-equivalent templates were factored in. This would also improve the utility of the extracted features for downstream use (ie, it would help me out with outreachdashboard.wmflabs.org which now uses ORES features to show how many references get added/removed in any given revision).

Event Timeline

Ragesoss created this task.Jul 2 2019, 11:39 PM
Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald Transcript

Looks like we're merged. Next step is to retrain the models.

https://github.com/wikimedia/articlequality/pull/88

I also included a demonstration that our new "sfn + <ref>" tags feature counts 135 instances in https://en.wikipedia.org/w/index.php?title=Cymbeline&oldid=902872698

There was an issue with pytest version compatibility. I've pinned an old version of pytest to fix this for now. I think long term, we'll want to update the bad dependency (more-itertools) to the current version (7.2.0).

OK looks like we're passing tests now. Also, I've made a PR for bumping this requirement in revscoring. See https://github.com/wikimedia/revscoring/pull/444 So we can deal with this better in the next release of revscoring.

Halfak closed this task as Resolved.Aug 7 2019, 3:53 PM