Page MenuHomePhabricator

Stats for election-related articles and edits (EU / India)
Closed, ResolvedPublic

Assigned To
Authored By
Pablo
Jan 23 2025, 3:45 PM
Referenced Files
F58519649: image.png
Feb 28 2025, 3:59 PM
F58519621: image.png
Feb 28 2025, 3:59 PM
F58519641: image.png
Feb 28 2025, 3:59 PM
F58519632: image.png
Feb 28 2025, 3:59 PM
F58519613: image.png
Feb 28 2025, 3:59 PM
F58519645: image.png
Feb 28 2025, 3:59 PM
F58519618: image.png
Feb 28 2025, 3:59 PM
F58519637: image.png
Feb 28 2025, 3:59 PM

Description

After analyzing Wikipedia activity related to the 2024 US election (T375691), @Abhas requested exploring hypotheses 2 and 3 in the context of the following elections:

  • 2024 EU election - English, French, Italian, Romanian wikis
  • 2024 India election - English, Hindi, Marathi wikis

This task will be updated as progress is made.

Event Timeline

@NForrester @Abhas I compiled these lists of English Wikipedia articles related to:

Do you find them comprehensive? Are there any other relevant articles that you think I might be missing?

The EU election tab looks comprehensive to me. The India elections could use some further data about elected politicians, although this could simply be a gap in English Wikipedia coverage of these individuals.

@Pablo your wikidata query for "All MEPS from Wikidata" misses the cases where no start time is present - around 350 MEPs [1] [2]. This also affects recent legislatures, with only 630 MEPs out of 705 MEPs [3] (actually more, as some don't finish their mandate and the Brexit replacements) present in tab "MEPS (2019-2024)". There are no red links on enwiki [3], so this must be a data issue.

[1] 4648 for a version of your query without duplicate names https://w.wiki/Crah vs 4993 in this simplified version https://w.wiki/Crap
[2] example: https://www.wikidata.org/wiki/Q65561
[3] https://en.wikipedia.org/wiki/List_of_members_of_the_European_Parliament_(2019%E2%80%932024)

@Strainu, thank you for your feedback! On Friday, I performed a quick precision check and plan to conduct a recall check this week, and your observation about entries missing the start time property is really helpful. The issue with replacements is certainly a limitation of this approach, but conducting a more comprehensive analysis to identify all edge cases would unfortunately be too costly.

After multiple checks with Wikidata queries, I found even better results by simply leveraging the article topic prediction model to identify which wikilinks in https://en.wikipedia.org/wiki/List_of_members_of_the_European_Parliament_(2019%E2%80%932024) and https://en.wikipedia.org/wiki/List_of_members_of_the_17th_Lok_Sabha correspond to biographies.

Take home messages

  • In most of the wikis analyzed, the lead up to the election attracted IP editors, but not newcomers.
  • In the lead up to the election, no increase in reverts to IP edits or newcomer edits was observed.
  • Low quality edits were usually moderated quickly.
  • Wikis with content moderation bots reverted more low-quality edits and more quickly.
  • The data collection process could have missed election-related articles not existing in English Wikipedia.
  • It is worth exploring if wikis with few edits to election-related articles (e.g., Marathi, Romanian) contain relevant articles not present in the datasets.

2024 Indian general election on Wikipedia

Hypothesis 2: In the lead up to the election an increasing number of IP edits and newcomer edits are reverted (❌ not support)

  • English: Although IP and newcomer edits increased in the lead up to the election, a decreasing rate of these edits are being reverted. For IP edits, activity in mid-2021 was greater than in the end of the campaign.
image.png (390×790 px, 41 KB)
image.png (390×790 px, 35 KB)
image.png (390×789 px, 42 KB)
image.png (390×789 px, 50 KB)
  • Hindi: Revert rates decreased in the lead up to the election for both IP and newcomer edits. No increase of newcomer edits in the end of the campaign.
image.png (390×789 px, 51 KB)
image.png (390×789 px, 41 KB)
image.png (390×789 px, 53 KB)
image.png (390×789 px, 56 KB)
  • Marathi: Very few IP edits and newcomer edits occurred.
image.png (390×790 px, 35 KB)
image.png (390×790 px, 38 KB)
image.png (390×790 px, 52 KB)
image.png (390×790 px, 34 KB)

Hypothesis 3: During the election campaign, low-quality edits in relevant articles are quickly moderated (✅ support)

wikiEnglishHindiMarathi
edits: count317231166329
reverted edits: count604021841
reverted edits: percentage19.0418.7012.46
time to revert edits: median4-5 hours2-3 hours1-2 hours
low quality edits: count30312136
low quality edits: percentage0.9610.3810.94
low quality reverted edits: count2478423
low quality reverted edits: percentage81.5269.4263.89
time to revert low quality edits: median3 mins2 hours1-2 hours
  • 10% of low-quality edits in English Wikipedia were reverted by ClueBot_NG.
  • 0% of low-quality edits in Hindi and Marathi Wikipedia were reverted by bots.

⚠️ The absence of moderation bots in Hindi and Marathi Wikipedia could explain the lower percentage of low-quality edits that were reverted and the longer median time to revert them.


2024 European Parliament election on Wikipedia

Hypothesis 2: In the lead up to the election an increasing number of IP edits and newcomer edits are reverted (❌ not support)

  • English, French and Italian: IP edits increased in the lead up to the election, with no increase in the revert rate. Newcomer edits did not increase in the lead up to the election.
image.png (390×790 px, 38 KB)
image.png (390×790 px, 43 KB)
image.png (390×790 px, 47 KB)
image.png (390×790 px, 52 KB)
image.png (390×790 px, 44 KB)
image.png (390×790 px, 38 KB)
image.png (390×790 px, 50 KB)
image.png (390×790 px, 52 KB)
image.png (390×790 px, 37 KB)
image.png (390×790 px, 49 KB)
image.png (390×790 px, 49 KB)
image.png (390×790 px, 74 KB)
  • Spanish: Both IP and newcomer edits increased in the lead up to the election, with no increase in the revert rate of these edits.
image.png (390×790 px, 36 KB)
image.png (390×790 px, 47 KB)
image.png (390×790 px, 55 KB)
image.png (390×790 px, 63 KB)
  • Romanian: Very few IP edits and newcomer edits.
image.png (390×790 px, 41 KB)
image.png (390×790 px, 32 KB)
image.png (390×790 px, 56 KB)
image.png (390×790 px, 36 KB)

Hypothesis 3: During the election campaign, low-quality edits in relevant articles are quickly moderated (✅ support)

wikiEnglishFrenchItalianSpanishRomanian
edits: count12675717031972380422
reverted edits: count102047739644928
reverted edits: percentage8.056.6512.3918.876.64
time to revert edits: median2 hours19-20 mins4-5 hours3-4 mins4-5 hours
low quality edits: count903829948
low quality edits: percentage0.710.530.913.951.90
low quality reverted edits: count713118788
low quality reverted edits: percentage78.8981.5862.0782.98100.00
time to revert low quality edits: median1-2 mins1-2 mins5-6 mins30 secs25 secs

Note:

  • 7% of low-quality edits in French Wikipedia were reverted by Salebot.
  • 6% of low-quality edits in French Wikipedia were reverted by Salebot.
  • 0% of low-quality edits in Italian Wikipedia were reverted by bots.
  • 37% of low-quality edits in Spanish Wikipedia were reverted by SeroBOT.
  • 75% of low-quality edits in Romanian Wikipedia were reverted by PatrocleBot.

⚠️ The absence of moderation bots in Italian Wikipedia could explain the lower percentage of low-quality edits that were reverted and the longer median time to revert them.

Thanks for this update @Pablo ! Two questions:

  • For those ~20% of low-quality edits on English Wikipedia that aren't getting reverted, what's going on? Is that bad predictions by the model? Reverts that our revert-detection approaches are missing? I recently put together a future research task around the reasons behind reverts (T387040) so I'm curious if this will provide any information to inform that (hopefully) eventual study of the reasons behind reverts. Hopefully this check is just manually eyeballing a few edits (no need to be robust about it) but let me know if it'd take more work than that.
  • Hypothesis 2 could be interpreted as being about total # of reverted edits (greater burden on patrollers) or revert rate (I guess more noise for patrollers but not necessarily a greater burden if there are fewer). In the data, you're looking at revert rate but you do mention that IP edits were increasing in volume. Does this lead to more reverts happening as well despite the falling ratio?

Thanks for your questions!

  • ~20% of low-quality edits on English Wikipedia. This is definitely something that would be worth inspecting as there are many factors that could be influencing this finding. For example, in the US election analysis, I reviewed some low-quality edits that were not reverted and found that some were followed by edits that partially or completely removed their content. Are you imagining something that could be used for model re-training?
  • Total # of reverted edits. Yes, I actually modified the original formulation of the hypothesis because the rationale provided was: "As people are incentivised to promote or obstruct a candidate, more newcomers are making politically biased edits (or edits that violate NPOV), and thus more newcomer edits are reverted". To ensure that peaks in reverting activity are not merely a reflection of overall editing volume but instead highlight the revertability of edits (likely due to increased malicious behavior), I decided to use the revert rate rather than the absolute count of reverted edits (see figures below for the latter). However, your question indicates that further clarification is needed and suggests considering showing both metrics.
wikiEnglishFrenchItalianSpanishRomanian
IP reverted edits
image.png (390×790 px, 40 KB)
image.png (390×790 px, 46 KB)
image.png (390×790 px, 46 KB)
image.png (390×790 px, 44 KB)
image.png (390×789 px, 43 KB)
newcomers reverted edits
image.png (390×790 px, 48 KB)
image.png (390×789 px, 41 KB)
image.png (390×790 px, 43 KB)
image.png (390×790 px, 43 KB)
image.png (390×790 px, 32 KB)
Isaac triaged this task as Medium priority.Mar 5 2025, 2:15 PM
Isaac moved this task from Backlog to In Progress on the Research board.

For outreach, these findings have been shared with colleagues and posted on Meta. I will resolve the ticket, as no further work is expected.