Page MenuHomePhabricator

Essential work - Monitoring and Stats for election-related articles and edits
Closed, ResolvedPublic

Assigned To
Authored By
XiaoXiao-WMF
Sep 25 2024, 8:14 PM
Referenced Files
F57689577: image.png
Nov 8 2024, 5:45 PM
F57689574: image.png
Nov 8 2024, 5:45 PM
F57664392: image.png
Oct 31 2024, 6:28 PM
F57664390: image.png
Oct 31 2024, 6:28 PM
F57664386: image.png
Oct 31 2024, 6:28 PM
F57643057: image.png
Oct 25 2024, 4:53 PM
F57643055: image.png
Oct 25 2024, 4:53 PM
F57643052: image.png
Oct 25 2024, 4:53 PM

Description

This work is to support Movement Insights on election-related analysis.

Support election-focused workstreams across 3 areas of work:

  1. Work with @LDickinsonWMF to define a set of election-related statistics to track based on this asana task
    • Next Steps: Setup a meeting with Lauren to understand data needs
  2. Join the #election-opporunities working group (latest notes) to design a body of research focused on learning how readership and editing more broadly change during the US election. Next Steps:
    • Align within the group on a set of achievable research questions
    • Conduct analyses to answer research questions
    • Prepare a report on findings and collaborate with working group members to share and socialize findings within the foundation and with broader community audiences
  3. Maintain alignment between 1 and 2 and look for opportunities to share efforts between these two workstreams and coordinate with ongoing work in T369325

Event Timeline

OSefu-WMF updated the task description. (Show Details)
OSefu-WMF added a subscriber: LDickinsonWMF.

Weekly update

  • Review of the existing documentation.
  • Contribution with three ideas to the Election assumptions document.
  • @OSefu-WMF and I will connect early next week for coordinating next steps.

Weekly update

  • @OSefu-WMF and I connected to check the background of the project and to identify intersections with existing work.
  • We started assessing the election assumptions according to data needs, complexity and potential impact.
  • The progress has been shared at the Elections working group meeting.

Weekly update

  • @OSefu-WMF and I consolidated the list of hypotheses and prioritized 6 of them.
  • I created a Gitlab repo where I will start uploading notebooks next week.

Weekly update

  • I attended the elections working group meeting and shared the final list of hypotheses. It was recommended that we also include Spanish Wikipedia in our analysis.
  • I contacted Luciano Floridi's team for they to share the list of 2024 US Presidential Election related articles analyzed in https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4990973.
  • @nshahquinn-wmf and I connected to exchange ideas.
  • I started creating notebooks about potential increases in traffic from mobile devices and registrations. As shown below, no trends are observed.
image.png (491×580 px, 71 KB)
image.png (491×589 px, 68 KB)
image.png (471×560 px, 39 KB)
image.png (471×552 px, 63 KB)

Weekly update

  • I curated a list of over 1K Wikipedia articles related to the 2024 US Presidential Election: General (12), Party (105), Independent (8), Independent (withdrawn) (9), Other third-party candidates (46), Disputes (1), Elections in State (53), US Senate elections (35), US House elections (71), Governor elections (19), Attorneys general elections (11), Secretaries of state elections (8), State treasurers elections (10), Other state wide elections (17), State legislative elections (88), Mayoral elections (34), Local elections (18), States and territories (31), Ballot measures (31), Senator (100), US House representative (432).
  • I continued with the creation of notebooks: page protection and IP edits. As shown below, there is an increasing trend in the protection of articles related to the 2024 US Presidential Election, which could affect the decrease in the ratio of IP edits and the decrease in the ratio of reverts to IP edits.
image.png (453×565 px, 31 KB)
image.png (476×568 px, 43 KB)
image.png (476×568 px, 59 KB)
  • Next efforts will focus on the remaining hypotheses, which will require selecting a standardized metric for newcomers and calculating revert risk predictions for hundreds of thousands of revisions.

Weekly update

  • Multiple notebooks have been updated in the Gitlab repository:
    • h1: I checked which articles are protected from the API to verify the reliability of the parser of protect log actions.
    • h4: I improved the data visualizations and found that on election day there was a spike in traffic to English Wikipedia from desktop devices.
      image.png (318×1 px, 58 KB)
    • h5: I compared newcomers editing US related articles to regular newcomers (since 2011). The former exhibit higher retention, more revisions in the first month, and more revision in the second month. Also, the newcomers editing US related articles in the first month than continue editing in the second month (i.e. retained) have a lower revert ratio in their first month than those who stop editing in the second month.
      image.png (274×1 px, 25 KB)
  • For h3, I encountered some issues in running an existing notebook to get multilingual revert risk model predictions. To avoid this blocker, I am running a script against the API (I already have results from over 125K revisions covering March 2024 - election day).
  • Next week we will have the RDS offsite and discussions with @OSefu-WMF are expected.

Weekly update

  • I updated the notebook for h3 now that the process to get multilingual revert risk model predictions is completed.
  • I shared the results of this specific hypothesis with @OSefu-WMF and the attendees of the biweekly elections working group call:
    • From January to October 2024, we have 142,538 revisions and found that:
      • 13,523 revisions (9.49%) have been reverted. The time (seconds) to revert them is 418 (q1), 5,502 (q2/median), 70,517 (q3).
      • Only 753 revisions (0.52%) are high risk (defined as revert risk score equal to or greater than 0.95).
        • Of these, 578 revisions (76,76%) has been reverted. The time (seconds) to revert them is much shorter: 32.25 (q1), 111.50 (q2/median), 1014.75 (q3).
  • I started a report to cover the findings from the 6 hypotheses (work in progress)
  • Next week I will be OoO, @OSefu-WMF @nshahquinn-wmf and I will (sync) connect on December 2.

Weekly update

  • I shared the findings from the 6 hypothesis at the closing meeting of the elections working group call (I prepared these slides, although I did not used them finally). A consequence of this effort is that @Abhas has requested to explore h2 and h3 in these two other elections:
    • 2024 India election - English, Hindi, Marathi wikis
    • 2024 EU election - English, French, Italian, Romanian wikis
  • I completed a first draft of the report (to be reviewed and improved by the end of Q2).

Weekly update

  • I updated the analysis of some hypotheses (and the corresponding sections of the report) now that the 2024-11 snapshot of MediaWiki History was available.
  • The report was shared with multiple teams to collect feedback (and I have been making minor edits and responding to comments accordingly).

Solving this ticket, as all the results are included in the report.