Page MenuHomePhabricator

[FY25-WE.3.1.12] Simple summaries on Mobile and Desktop Web
Closed, ResolvedPublic

Description

Scope

  • Wikipedia
  • Languages: English

Details of scoping: (private link) Research Brief

Output
A report about the perceptions of Wikipedia readers when interacting with AI generated article summaries.

Other resources

  • Internal (private) parent Asana task

ORIGINAL REQUEST
Please provide all the following information:

  • Context. Provide a short paragraph with some background context for your request, please include links to relevant material.

From the Reading Wikipedia research, an AI-generated summary was specifically suggested as a potential area of exploration. As the Wikipedia readability/content simplification research confirmed, much of the content on Wikipedia can be complex, and qualitative research has shown users indicating that they would find value in a shorter overview. The simple summaries feature was developed as a result to address this need, and an initial round of Userlytics testing was completed.

  • Description.What is your request about?

The team would like to understand how readers perceive and understand the summaries feature within the Wikipedia context. Additionally, as the first round of testing did not provide much context for some participants' critical reaction, more exploration in this area is needed. Finally, due to the fact that the generated content can vary in quality, we want to understand how it might affect Wikipedia's reputation.

  • Expected Deliverable. What is the ideal outcome or result of your request?

A report describing qualitative and quantitative (ratings/rankings) findings regarding how the variability in simple summaries content quality can affect participants' trust and perception of Wikipedia and its reputation.

I need this task resolved in:

  • 1 month.
  • 3 months.
  • 6 months.
  • Whenever you get to it :-)
  • Other. Do you have any other questions or comments ?

For use by WMF Research team; please leave everything below as it is:

  1. Does the request serve one of the existing Research team's audiences? If yes, choose the primary audience. (1 of 4)

Yes. Audience 1. WMF.

  1. What is the type of work requested? At a high level: User research to understand the perception of Wikipedia readers towards article summaries generated by AI. (details: Private research brief)
  2. What is the impact of responding to this request?
    • Support a technology or policy need of one or more WM projects
    • Advance the understanding of the WM projects.
    • Something else. If you choose this option, please explain briefly the impact below.

Details

Due Date
May 5 2025, 12:00 AM

Event Timeline

met with @MRaishWMF and discussed approaches (noted below). will meet with Web team next Tuesday to clarify 1) what the team's most prioritized research goal is and 2) what capacity they have to provide the testing artifacts and/or environment to meet the goal. after confirmation, we can begin work.

~~~

1: you want to know more about how the summaries are received, how participants interact and respond to them naturally, how it affects their comprehension and reading experience, and their general sentiments wrt trust, AI, AI on WP, etc (more in-depth but similar to justin's previous testing). no add'l effort needed as we can just use the browser extension, but you will not necessarily get direct information about the effect of bad quality summaries.
2a: you want to know more specifically about negative effects of seeing bad quality summaries. slightly higher effort with static mock-ups or minimally interactive artifacts of test articles in 2-3 topic categories and generated good/bad summaries for each.
2b: you want to know both 1+2a. very high effort with a built out test/beta wiki environment with test articles/summaries.

In consultation with the Web team and Research team colleagues, we are approaching alignment on a structure for this research, as outlined in the research brief. Briefly, we are planning to use panel participants (sourced externally) to investigate what happens when people interact with good or bad summaries in more difficult or easier articles. Next steps will include aligning with the team on experimental materials, further consultation with Research colleagues, and setting up the instruments on Qualtrics as we move toward the pilot stage.

leila renamed this task from [Request] Simple summaries on Web WE 3.1 to [FY25-WE.3.1.XX] Simple summaries on Web WE 3.1.Mar 3 2025, 10:06 PM
leila assigned this task to MRaishWMF.
leila triaged this task as High priority.
leila updated the task description. (Show Details)
leila set Due Date to Apr 15 2025, 12:00 AM.
leila added a project: OKR-Work.
leila moved this task from Backlog to Staged on the Research board.
leila moved this task from Staged to In Progress on the Research board.
leila renamed this task from [FY25-WE.3.1.XX] Simple summaries on Web WE 3.1 to [FY25-WE.3.1.12] Simple summaries on Mobile and Desktop Web.Mar 4 2025, 3:44 PM
leila updated the task description. (Show Details)

(Moving the project to Backlog until we decide on the question of how many languages to include. As I have learned from Mike, this can affect the overall study scope and design and may need a few more days of iteration. I'm working with Mike on it and will be led by him to involve other folks as needed to finalize the scoping piece.)

leila changed Due Date from Apr 15 2025, 12:00 AM to May 5 2025, 12:00 AM.Mar 21 2025, 10:53 PM

Mike and I discussed and decided to keep the scope of this project focused on English. From the now updates Research Brief: "Once this study has concluded, English may serve as a “proof of concept” for testing or evaluating this feature, and the study may be extendable to other non-English languages depending on the Web team’s rollout plan.". We also updated the timelines to leave sufficient room for analyses, internal review in the Design Research team, and shareout.

Justin and Olga as the stakeholders for this projects signed off on the scope and phases/timeline.

I updated the task deadline here. The rest is in Mike's capable hands.

After ongoing consultation with Research team colleagues, this project has moved into the pilot testing phase. @JScherer-WMF has helpfully provided static mockups to be used as visual artifacts, and they are being shown to participants in a survey hosted on Qualtrics. Currently, participants entering the survey have an equal chance of being routed into the "good summary," "bad summary," and "control (no summary" conditions, after which they have an equal chance of seeing article pairs easyA+hardA, hardA+easyA, easyB+hardB, or hardB+easyB. There is a pre-test measuring trust in and knowledge of Wikipedia, AI usage, and selected demographic variables, and a post-test measuring opinions about AI+Wikipedia, followed by a repetition of the Wikipedia trust questions shown in the pre-test.

The survey appears to be functioning as intended, however further consultations are planned to determine if a more complete randomization of articles is appropriate (e.g., easyB+hardB, easyB+hardA, hardA+easyA, hardA+easyB, etc.) is ideal. Also, a small number of the static mockups currently have red underlining indicating misspellings under some of the content (e.g., latin species names) which will ideally be removed before we go "live" with larger numbers of Prolific participants.

A more-or-less complete draft of the final report has been delivered to the Web team, and a larger readout has been scheduled for Monday: WMF-internal link to Simple Summaries report

Marking as resolved. I will follow up with Leila to get her input on whether we should create a Metawiki page for this project or not.