Page MenuHomePhabricator

articlequality scores a very short article really high in Persian Wikipedia
Open, LowPublic

Description

This pretty short article is scored 4.39 (FA) in ORES. This result seems wrong.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
awight renamed this task from articeqaulity scores a very short article really high in Persian Wikipedia to articlequality scores a very short article really high in Persian Wikipedia.Sep 26 2018, 6:27 PM

I bet this has something to do with our training set including Redirects or other types of short wikitext articles rated highly. We should look through our training set to see if we have many example of highly-rated short articles.

Here are our FA quality observations. It looks like the first two are excessively short. Could these be mis-labeled?

Qualitytext lengthrev_id
FA269822209198
FA278322931379
FA926821595797
FA926821595797
FA1792722790480
FA1857219544614
FA1925521422786
FA2031322332851
FA2059322236347
FA2170922271457
FA2275421265916
FA2275522925983
FA2358722343509
FA2358922762032
FA2375022960348
FA2376322357296
FA2426719416403
FA2566121352991
FA2597622267990
FA2651221276368
FA2791821298974
... snip ...
FA14280322345176
FA14293822951851
FA14562622914865
FA14653722846950
FA14654022329970
FA14865622287634
FA15367222122809
FA15367222122809
FA15408322848012
FA15413622284243
FA16009721935665
FA16010422304823
FA16082322908335
FA16083422156021
FA16186522286873
FA16898822083811
FA16898822083811
FA19799722289502
FA20701822278332

It's quite likely but because of two mislabels, we should not miss-classify future cases that off.