Page MenuHomePhabricator

Use AI to automatically generate edit summaries
Open, Needs TriagePublicFeature

Assigned To
None
Authored By
Bugreporter2
Apr 12 2023, 1:59 PM
Referenced Files
None
Tokens
"Dislike" token, awarded by TheresNoTime."Dislike" token, awarded by LucasWerkmeister."Dislike" token, awarded by Asartea."100" token, awarded by Lectrician1.

Description

MediaWiki should use AI to automatically generate edit summaries by analyzing the changes made by the editor and the context of the article.

The AI system should use natural language processing techniques to generate a summary that accurately describes the changes made by the editor.

This would greatly improve editing efficiency as it would save editors time and effort spent on manually entering edit summaries. This should encourage more editors to use edit summaries, resulting in better documentation of changes made to articles. It would also help reduce the number of misleading or incomplete edit summaries, leading to better communication between editors and more transparency in the editing process. Overall, the use of AI-generated edit summaries should help streamline the editing process and improve the quality of Wikipedia articles.

This needs to be based on open source AI for practical and ideological reasons.

Event Timeline

@Bugreporter2: It is up to teams what teams plan to have on their workboards, thus removing tag

Was interested in how many hours this could save per year and queried for a rough answer: https://quarry.wmcloud.org/query/76523

The query only counts non-bot revisions based on the recentchanges table (so edits in past 30 days) that are ones performed through the editor (mw.edit) and not a tool, are not revisions that are reverted or restored (since those have autogenerated summaries), and the character count does not include the autogenerated section names of the edit summary (names between /* */).

Average number of human-typed revisions per day on enwiki: 124,947
Average number of typed characters (non auto-generated) per revision: 29.9631

Assuming edit summaries are typed at an average typing speed of 200 characters per minute or 50 wpm, these results calculate to:

3,743,799 edit summary characters typed per day
18,719 minutes per day typed
312 hours per day spent typing edit summaries
113,873 hours per year spent typing edit summaries

And this is just for enwiki...

See also this CLI that writes git commit messages for inspiration.

This proposal could work as a Community Tech wishlist item or a grant proposal from a community member (to cover the costs of the GPT-4 API).

TheresNoTime rescinded a token.
TheresNoTime awarded a token.
TheresNoTime subscribed.

Here are some demo queries and responses testing this with English Wikipedia revisions:

Original diff: https://en.wikipedia.org/w/index.php?title=Template:Protected_areas_of_Massachusetts&diff=prev&oldid=1174757060
Original edit summary: Added Mass Central Rail Trail Wayside to list of MA State Parks "other" which includes significant Rail Trails
AI edit summary: Added a link to "Mass Central Rail Trail - Wayside" under the "Other" section in the list of state parks and reservations.
Chat: https://chat.openai.com/share/7e9003e6-72e8-4d4f-ab69-5a5a2ca67ab8

Original diff: https://en.wikipedia.org/w/index.php?diff=1174757870&oldid=1174757609&title=Lancaster_County,_South_Carolina&curid=91880
Original edit summary: Updated county area figures
AI edit summary: Updated geographic data in the infobox, including area, land area, water area, and population density figures.
Chat: https://chat.openai.com/share/7e9003e6-72e8-4d4f-ab69-5a5a2ca67ab8

Original diff: https://en.wikipedia.org/w/index.php?diff=1174760528&oldid=1167916285&title=List_of_the_largest_automotive_suppliers&curid=73823052
Original edit summary: Fetched links; removed double links; Korea -> South Korea
AI edit summary: Added the country names for each automotive supplier.
Chat: https://chat.openai.com/share/14001bb7-68be-46ef-b271-9debd2f0b0d5

Notes:

  • The diffs picked were ones that currently had edit summaries and did not involve deep context such as:
    • Edits previous to this diff (whether this involved reverting vandalism, an edit war, or just changing style)
    • Discussion page decision or Wikipedia processes
  • I only used the changed lines and a short segment from the article in the prompt. When given the entire revisions, ChatGPT could not detect the differences: https://chat.openai.com/share/7e9003e6-72e8-4d4f-ab69-5a5a2ca67ab8
  • 44.5% of edit summaries don't have text added at all (https://quarry.wmcloud.org/query/76537). AI adding edit summaries to these edits without them could help speed up patrolling.

Things to put in future prompts:

  • Specifically what was changed and text around it
  • Summary of the article itself?
  • What user is making the edit
  • Vandalism score of edit
  • Summaries of discussion page topics and maybe the text of their most recent comments