If you run this request, it shows that we only track one image in this version of the article, but there's actually quite a few. That dramatically affects the predicted quality. https://ores.wikimedia.org/v3/scores/nlwiki/60819226/articlequality?features
Description
Event Timeline
I think the right next step is to implement some tests to see if we detect the following image links:
[[Bestand:Stevie Wonder 1967 (1).jpg|thumb|In 1967 tijdens een repetitie voor een optreden in een [[TROS]]-programma]] [[Bestand:Burt Bacharach - jam session.jpg|thumb|Stevie Wonder tijdens een optreden met [[Burt Bacharach]] in de jaren zestig]]
I can't seem to replicate the issue with the current version of the feature in the articlequality repo. When I run this same revision through the feature extractor, I get 5 image links rather than 1.
$ python Python 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from articlequality.feature_lists import nlwiki >>> from revscoring.extractors import api >>> import mwapi >>> extractor = api.Extractor(mwapi.Session("https://nl.wikipedia.org")) Sending requests with default User-Agent. Set 'user_agent' on mwapi.Session to quiet this message. >>> print("\n".join(str(v) for v in list(zip(nlwiki.wp10, extractor.extract(60819226, nlwiki.wp10))))) (<feature.dutch.stemmed.revision.stems_length>, 22163) (<feature.(dutch.stemmed.revision.stems_length / max(wikitext.revision.content_chars, 1))>, 0.8959453450297126) (<feature.revision.image_links>, 5.0) (<feature.(revision.image_links / max(wikitext.revision.content_chars, 1))>, 0.00020212636940615273) (<feature.revision.category_links>, 10.0) (<feature.(revision.category_links / max(wikitext.revision.content_chars, 1))>, 0.00040425273881230546) (<feature.len(<datasource.dutch.dictionary.revision.dict_words>)>, 3729.0) (<feature.(len(<datasource.dutch.dictionary.revision.dict_words>) / max(len(<datasource.wikitext.revision.words>), 1))>, 0.802453195610071) (<feature.enwiki.revision.paragraphs_without_refs_total_length>, 3369.0) (<feature.(enwiki.revision.paragraphs_without_refs_total_length / max(wikitext.revision.content_chars, 1))>, 0.1361927477058657) (<feature.nlwiki.revision.cn_templates>, 0.0) (<feature.(nlwiki.revision.cn_templates / max(wikitext.revision.content_chars, 1))>, 0.0) (<feature.nlwiki.revision.infobox_templates>, 1.0) (<feature.(nlwiki.revision.infobox_templates / max(wikitext.revision.content_chars, 1))>, 4.042527388123055e-05) (<feature.wikitext.revision.chars>, 39846.0) (<feature.wikitext.revision.content_chars>, 24737.0) (<feature.wikitext.revision.ref_tags>, 84.0) (<feature.(wikitext.revision.ref_tags / max(wikitext.revision.content_chars, 1))>, 0.0033957230060233656) (<feature.wikitext.revision.wikilinks>, 269.0) (<feature.(wikitext.revision.wikilinks / max(wikitext.revision.content_chars, 1))>, 0.010874398674051017) (<feature.wikitext.revision.external_links>, 48.0) (<feature.(wikitext.revision.external_links / max(wikitext.revision.content_chars, 1))>, 0.0019404131462990662) (<feature.wikitext.revision.headings_by_level(2)>, 6.0) (<feature.(wikitext.revision.headings_by_level(2) / max(wikitext.revision.content_chars, 1))>, 0.00024255164328738327) (<feature.wikitext.revision.headings_by_level(3)>, 8.0) (<feature.(wikitext.revision.headings_by_level(3) / max(wikitext.revision.content_chars, 1))>, 0.0003234021910498444) (<feature.wikitext.revision.list_items>, 36.0) (<feature.(wikitext.revision.list_items / max(wikitext.revision.content_chars, 1))>, 0.0014553098597242997)
Could it be my parser version?
>>> mwparserfromhell.__version__ '0.5.4'
That is old and we specifically push a new version to prod.
$ python Python 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from articlequality.feature_lists import nlwiki >>> from revscoring.extractors import api >>> import mwapi >>> extractor = api.Extractor(mwapi.Session("https://nl.wikipedia.org")) Sending requests with default User-Agent. Set 'user_agent' on mwapi.Session to quiet this message. >>> print("\n".join(str(v) for v in list(zip(nlwiki.wp10, extractor.extract(60819226, nlwiki.wp10))))) <snip> (<feature.revision.image_links>, 5.0) (<feature.(revision.image_links / max(wikitext.revision.content_chars, 1))>, 0.00020212636940615273) <snip> >>> import mwparserfromhell >>> mwparserfromhell.__version__ '0.6.4'
Nope. That didn't do it.
That's all the time I have now. I'll do some more exploration later.
@Halfak: Removing task assignee as this open task has been assigned for more than two years - see the email sent to all task assignees on 2024-04-15.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!