[Session] LLMs, ChatGPT, machine learning tools, etc
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	kostajh
	Mar 27 2023, 8:43 AM

Description

(Please set yourself as task assignee of this session)

Title of session: LLMs, ChatGPT, machine learning tools, etc
Session description: A facilitated discussion to talk through interests/concerns/possibilities/threats around ChatGPT, large language models (LLMs), and machine learning tools in general. For this to be productive, we'll try to identify some agenda items and common themes in advance. We might also break out during the session so that there can be more focused discussions about particular topics. But one goal I'd have for the session is that people who are interested in opportunities to use LLMs in Wikimedia projects also hear concerns from, and are in dialog with, those who are focused on threats to Wikimedia projects.
Username for contact: @kostajh
Session duration (25 or 50 min): 50 minutes
Session type (presentation, workshop, discussion, etc.): Facilitated discussion
Language of session (English, Arabic, etc.): English
Prerequisites (some Python, etc.): Have done at least some background reading
- ChatGPT and large language models (LLMs)
- notes from the Community Call on AI (2023-03-23)
Any other details to share?:
- Your suggestions are welcome about how to make this session a success!
- Etherpad Link
Interested? Add your username below:
- @CristianCantoro
- @MGerlach
- @NicoleLBee
- @Bmueller
- @Husky
- @Eleni.Christopoulou
- @Tgr
- @Novem_Linguae
- @Slst2020
- @waldyrious
- @DimitriosRingas
- @MnLsVt
- @Dogogos123
- @Mathglot
- @Robertsky
- @LabDom
- @Nicholas_Perry
- @ItamarWMDE
- @saper
- @roti_WMDE
- @Jorgelhq

Related Objects

Mentioned In: T333853: [Session] Self-hosting ML models on Cloud Services
Mentioned Here: T333853: [Session] Self-hosting ML models on Cloud Services
T333974: [Session] WikiGPT - Natural Language search results based on Wikipedia knowledge and ChatGPT

Event Timeline

kostajh created this task.Mar 27 2023, 8:43 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 27 2023, 8:43 AM

Aklapper moved this task from Backlog to Proposed sessions on the Wikimedia-Hackathon-2023 board.Mar 27 2023, 9:13 AM

CristianCantoro updated the task description. (Show Details)Mar 27 2023, 3:43 PM

CristianCantoro subscribed.

MGerlach updated the task description. (Show Details)Mar 28 2023, 7:15 AM

MGerlach subscribed.

NicoleLBee updated the task description. (Show Details)Mar 28 2023, 9:06 AM

NicoleLBee subscribed.

Zache subscribed.Mar 28 2023, 10:17 AM

Bmueller updated the task description. (Show Details)Mar 28 2023, 2:24 PM

Bmueller subscribed.

@kostajh a few of us were also thinking about a session focused on some of the techical aspects of running a LLM on Toolforge/Cloud VPS infrastructure / (hopefully) demoing some LLMs that we'd set up in advance. I haven't submitted the session yet but thoughts on whether to combine efforts here or make separate?

Michael subscribed.Mar 30 2023, 2:44 PM

Lydia_Pintscher subscribed.Mar 30 2023, 2:44 PM

Relevant notes from an earlier community call about this topic: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/Draft/External_Trends/Community_call_notes

Isaac mentioned this in T333853: [Session] Self-hosting ML models on Cloud Services.Apr 3 2023, 4:19 PM

Eleni.Christopoulou updated the task description. (Show Details)Apr 4 2023, 8:21 PM

Eleni.Christopoulou subscribed.

Tgr updated the task description. (Show Details)Apr 4 2023, 9:26 PM

Tgr subscribed.

Novem_Linguae updated the task description. (Show Details)Apr 5 2023, 2:49 AM

Novem_Linguae subscribed.

Slst2020 updated the task description. (Show Details)Apr 5 2023, 8:54 AM

Slst2020 subscribed.

waldyrious updated the task description. (Show Details)Apr 5 2023, 9:16 AM

waldyrious subscribed.

DimitriosRingas updated the task description. (Show Details)Apr 5 2023, 8:39 PM

DimitriosRingas subscribed.

MnLsVt updated the task description. (Show Details)Apr 9 2023, 6:35 PM

MnLsVt updated the task description. (Show Details)

MnLsVt subscribed.

Dogogos123 updated the task description. (Show Details)Apr 9 2023, 6:39 PM

Dogogos123 subscribed.

Is this open, either on a full-participant, or observer-only basis? There is intense interest in this topic at Wikipedia.

Mathglot updated the task description. (Show Details)Apr 10 2023, 5:01 AM

srishakatux moved this task from Proposed sessions to Accepted sessions on the Wikimedia-Hackathon-2023 board.Apr 17 2023, 7:06 PM

Robertsky updated the task description. (Show Details)Apr 17 2023, 7:43 PM

Robertsky subscribed.

LabDom updated the task description. (Show Details)Apr 18 2023, 10:52 AM

LabDom subscribed.

Nicholas_Perry updated the task description. (Show Details)Apr 19 2023, 1:29 PM

Nicholas_Perry subscribed.

In T333127#8736311, @Isaac wrote:

@kostajh a few of us were also thinking about a session focused on some of the techical aspects of running a LLM on Toolforge/Cloud VPS infrastructure / (hopefully) demoing some LLMs that we'd set up in advance. I haven't submitted the session yet but thoughts on whether to combine efforts here or make separate?

@Isaac I'm sorry I missed your comment! It probably makes sense for you to have your own session, which I see you posted in T333853: [Session] Self-hosting ML models on Cloud Services. T333974: [Session] WikiGPT - Natural Language search results based on Wikipedia knowledge and ChatGPT will also be relevant to folks interested in the session mentioned in this task.

To people subscribed here: are any of you interested to co-facilitate or otherwise help with planning this session? I am envisioning this as an open, but guided, discussion; coming up with a list of prompts (sorry) in advance would help focus the conversation.

ItamarWMDE updated the task description. (Show Details)Apr 27 2023, 11:06 AM

ItamarWMDE subscribed.

@kostajh see below:

@Isaac I'm sorry I missed your comment! It probably makes sense for you to have your own session, which I see you posted in T333853: [Session] Self-hosting ML models on Cloud Services. T333974: [Session] WikiGPT - Natural Language search results based on Wikipedia knowledge and ChatGPT will also be relevant to folks interested in the session mentioned in this task.

Not a problem -- I didn't want to submit too many sessions around this topic but I think these three are nicely distinct (general discussion; demo of possibilities of this tech; demo of what we can currently do on our Cloud Services machines) so not too much overlap in the end.

To people subscribed here: are any of you interested to co-facilitate or otherwise help with planning this session? I am envisioning this as an open, but guided, discussion; coming up with a list of prompts (sorry) in advance would help focus the conversation.

Given that I'm already running a different session, I don't want to steer this one too much, but I love your framing in the task of connecting folks who are a bit more hesitant about this with folks who are a bit more excited. Could be fun and very useful to see what sorts of potential guardrails folks come up with to get the benefits of this new generation of AI while addressing some of the major concerns. Maybe aiming towards the start of a more technical complement to some of the draft policies such as enwiki:Large language models.

Below you may find the link attached that redirects the user to the corresponding Etherpad: https://etherpad.wikimedia.org/p/wmh2023-LLMs%2C_ChatGPT%2C_machine_learning_tools%2C_etc

SpyridonKokotos updated the task description. (Show Details)May 4 2023, 8:45 PM

saper subscribed.May 7 2023, 8:24 PM

saper updated the task description. (Show Details)May 7 2023, 8:26 PM

In T333127#8808800, @kostajh wrote:

To people subscribed here: are any of you interested to co-facilitate or otherwise help with planning this session? I am envisioning this as an open, but guided, discussion; coming up with a list of prompts (sorry) in advance would help focus the conversation.

@kostajh I am happy to co-facilitate - I signed up as session coordinator. Also happy to help planning.

MGerlach updated the task description. (Show Details)May 10 2023, 7:57 AM

roti_WMDE updated the task description. (Show Details)May 16 2023, 3:28 PM

roti_WMDE subscribed.

Arian_Bozorg subscribed.May 17 2023, 1:30 PM

Alexey_Skripnik subscribed.May 19 2023, 2:53 PM

Jorgelhq updated the task description. (Show Details)May 19 2023, 7:42 PM

Jorgelhq subscribed.

Session Notes:

LLMs, ChatGPT, machine learning tools, etc.

Date & time: Saturday, May 20th at 10:30 am EEST / 7:30 am UTC

Relevant links

Phabricator task: https://phabricator.wikimedia.org/T333127
Background reading:
- ChatGPT: https://en.wikipedia.org/wiki/ChatGPT
- LLMs: https://en.wikipedia.org/wiki/Large_language_model
Notes from the community call on AI: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/Draft/External_Trends/Community_call_notes

Participants (~20)

Notes

[Most of activity / discussions / notes happening in-person on notepads]
Breaking into two grous: opportunities and threats

Opportunities:

opportunities for developers, editors, users
themes that came up repeatedly: fact-checking, finding related content to expand content
variety of projects: Wikipedia, Wikidata, Commons

Threats:

three "buckets" of points:
    1. the content itself: copyright infringement, disinformation spamming
    2. what happens to the community?  where does the authority of the tools start and stop?
    3. people no longer coming to our content directly but consuming it elsewhere

        in order to use the tools "safely" we might have to put so many checks in that they become unusable

Kosta:

"opportunities" are all about things we could build; for "threats" there were fewer actionable ideas about how to deal with the issues

Comment about use case for detecting conflicting/false information in articles
Using a tool to check/verify citations
Summarizing text: good idea or bad idea? Summarizing an article might be questionable but summarizing a discussion (e.g., the Talk page) could make it easier to follow and encourage engagement

When prototyping, not thinking about "what is flashiest thing" but "how do we not disrupt existing process"

Laws behind training the models. What's the standpoint of movement and WMF. Is it allowed to train proprietary models on Wikipedia data. Lots of TODOs on political topics.

WP is half articles, then talk pages, community processes, mechanisms, etc. A lot of caveats about using on content text go away when you look at using it on talk page discussions. "Constructive criticism bot" idea. Worries about accuracy/missing points are less urgent in non-content spaces.

But assumption is that summarizing would not include biases, or innacurately reflect sentiment.

What about for software development / tool development. ChatGPT helps with regex, writing SPARQL. Many opportunities are there.

You have to know what to ask to get a good response. Create a framework for how to ask LLM about e.g. SPARQL queries.

Documentation generation, use it for things that are typically left behind.

We tried that for Parsoid. It was generous, it generated some correct info, and some incorrect info.

Article hoax problem: creating plausible but factually incorrect.

Is partially accurate documentation better than zero documentation?

We need ability to annotate bot generated content, it would make identification and action on this content much easier.

Project ideas:

transcribe all podcasts from last year. Runs locally, quite promising. Works well with Dutch. That could apply to Wikimedia system.
Session tomorrow by Slavina and Isaac, about self-hosted ML on cloud services. Good place to get started.
C. Scott's idea (didn't catch this)
Hackathon project for gadget to use chat GPT to summarize article sessions

Opportunities & Threats raised during the brainstorming session