Page MenuHomePhabricator

Tool to summarise sections on a wiki page
Open, LowPublic

Description

Proposed idea for Wikimedia-Hackathon-2023 (still in a WIP).

Use case:
Developing a user-script or tool that generates summaries for sections or sub-sections using available AI tools, such as chatGPT. The objective is to enhance the readability and accessibility of article pages.

Reader journey
Reader clicks on a button near the section title here:

Screenshot 2023-05-15 at 19.21.46.jpg (92×382 px, 19 KB)
on a link that says "summarise". A new dialogue/popup gets populated under the section with a generated summary.

Tech

  1. Implement a user script that sends a request to a tool hosted on Toolforge.
  2. Develop a Toolforge tool capable of interacting with the ChatGPT API. Initially, a single hardcoded key will be used for all users. In subsequent versions, integration with MW OAuth and the option for readers to use their own API key will be implemented.

Created during T335334.

V1:

IMG_3855.jpg (4×3 px, 1 MB)

Event Timeline

01tonythomas updated the task description. (Show Details)

I was planning on working on something very similar - summarizing talk page sections, preferably as a gadget and not a hosted tool (although not sure if that's feasible).

I was planning on working on something very similar - summarizing talk page sections, preferably as a gadget and not a hosted tool (although not sure if that's feasible).

Woah. Lets team up then? We have a couple of moving parts here, and I think we might have a chance to build a proxy on toolforge to store OpenAI keys for OAuth authenticated MW users if we get the time (since we cannot store that key on the user script :-(). This proxy could be used by future projects that would need OpenAI access as well, I would think.

Woah. Lets team up then?

Sounds good!

We have a couple of moving parts here, and I think we might have a chance to build a proxy on toolforge to store OpenAI keys for OAuth authenticated MW users if we get the time (since we cannot store that key on the user script :-(). This proxy could be used by future projects that would need OpenAI access as well, I would think.

They could be stored as cookies - that's slightly annoying for people who are active on several projects, but avoids the need of storing PII on Wikimedia Cloud which is technically against policy. (Not sure how granular the API keys are but access to past chats is very personal information.)

I do agree though that the getting-access-to-OpenAI part should be a separate tool/gadget/library/whatever to maximize reusability.

They could be stored as cookies - that's slightly annoying for people who are active on several projects, but avoids the need of storing PII on Wikimedia Cloud which is technically against policy. (Not sure how granular the API keys are but access to past chats is very personal information.)

Aha, interesting. For a v1, we can set the cookie manually even, I guess? And later, someone can come up with a UI to set the cookie? Is my understanding of your comment correct, @Tgr?

Interesting idea - did you have any thoughts on how long each section summary should be? Would it be proportional to the section length? Fixed length? User definable with a slider/parameter? I used to do an exercise with my students to summarize a long article in SIX WORDS and only six words, so this project certainly is intriguing.

Interesting idea - did you have any thoughts on how long each section summary should be? Would it be proportional to the section length? Fixed length? User definable with a slider/parameter?

We have been discussing this and there are many options. As a first step to keep things simple, we will keep the length fixed. We would aim for a length of a snippet, say around 5 sentences. This gives enough context but not too much information so it can be read in a short period of time. Certainly, one could think about making that customizable.

I used to do an exercise with my students to summarize a long article in SIX WORDS and only six words, so this project certainly is intriguing.

Six words sounds very little. Do you think this would be enough? I am afraid this would be jus the page title and the title of the section.

IMG_20230520_235211_227.jpg (716×1 px, 156 KB)

Working demo is ready. Now it can stream completetions from OpenAI: https://github.com/skripnik/wikipedia-section-summaries

Thanks for participating in the Hackathon! We hope you had a great time.

  • If this task was being worked on and resolved at the Hackathon: Please change the task status to resolved via the Add Action...Change Status dropdown, and make sure that this task has a link to the public codebase.
  • If this task is still valid and should stay open: Please add another active project tag to this task, so others can find this task (as likely nobody in the future will look back at the Hackathon workboard when trying to find something they are interested in).
  • In case there is nothing else to do for this task, or nobody plans to work on this task anymore: Please set the task status to declined.

Thank you,
Phabricator housekeeping service

Thanks @AxelPettersson_WMSE. We also published it on https://wmhack2023.github.io/posts/wmhack2023-report-tonythomas/.

Looking forward, I wonder if it makes sense to create a Phabricator project to track our backlog. What do you folks think? or, should I stay on a wiki Talk page?