Exploration on using language-agnostic models to assess article quality in Wikipedia across languages.
Related work:
Exploration on using language-agnostic models to assess article quality in Wikipedia across languages.
Related work:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | diego | T288333 Understanding the spread of disinformation on Wikipedia | |||
Resolved | paramita_das | T305390 Cross-Lingual Article Quality Exploration |
Just wanted to comment that I love this task and a few thoughts:
Thanks @Isaac for these inputs. There was a mistake on the title, this work is about article quality and not specifically about citations.
There was a mistake on the title, this work is about article quality and not specifically about citations.
Oh drat -- well then a small tweak on my comments that largely repeats what I said in our meeting. The two main gaps I see in the quality model are:
Updates
@paramita_das please report your progress here every Friday.
Update of the last week-
Update of the week-
Update of the week- 18,04.2022 - 23.04.2022:
I have prepared the yearly snapshot of the quality class assignment of the English Wikipedia articles from the current dump. For this work, I have used PySpark and cluster data. I have found that the last quality class assessment for many articles was done quite a long time (i.e., years here) ago.
FYI if you want an alterative approach for extracting quality ratings for current articles, you can use the page assessments MySQL table as well. They are available in at least English, French, Arabic, Hungarian, and Turkish but only contain the current state:
weekly update: 25.04.22 - 30.04.22
weekly update:
weekly update: 08.05.22 - 14.05.22
weekly update: 15,05.22 - 21.05.22
@diego resolving this as most of the work is done. Please reopen if that's not the case!