Page MenuHomePhabricator

[Session] Generative AI: how will it change the landscape for Wikimedia projects
Closed, ResolvedPublic


  • Title of session: Generative AI: how will it change the landscape for Wikimedia projects
  • Session description: The current rise of the generative AI will significantly change the technological landscape and the place of Wikimedia projects in it. Wikimedia should be prepared for the changes the generative AI will bring. At least, we need to have a vision what to do. There are a lot of questions we have to discuss: how will we detect misinformation, to what exent the AI could be used to write and illustrate articles, could chat-bots (in particular those developed by the search engine companies) completely replace encyclopedias, could AI break how SEO currently works and push Wikipedia out of search engine results, etc.
  • Username for contact: @Tohaomg
  • Session duration: 50 min
  • Session type: discussion
  • Language of session: English
  • Prerequisites: none

Notes from session

Generative AI: how will it change the landscape for Wikimedia projects

Date and time: May 4, 2024, 14:00

Relevant links




Generative AI is not something the wiki movement is prepared to manage, there are a lot of challenges and requires a lot of knowledge to know how to address it.
Today you can easily fake a voice, photo or video.
Examples: Donald Trump, In Ukraine Russian's spread videos of President Zelensky "giving up" generated by AI
It requires effort to determine if something is true. Being used by evil forces and have influence on Wiki-projects
Q; When I think of Wikimedia and AI I always think about the challenges in people reading Wikipedia and whether they can get answers. They are more comfortable with interacting. How come we use it to edit and correct and haven't thought about it so much, generating false sources and enabling to add references to statements, I wasn't concerned but am now :)
The competition is a concern, Google and Apple are developing their own AIs that could give answers to some questions and instead of going to Wikipedia people will go to those search engines and never reach Wikipedia and we will become obsolete like a paper encyclopedia. Much easier to take your smart phone and read Wikipedia and I fear we may become obsolete in very same way, also from my personal POV in Ukranian Wikipedia we have a big problem with automatic not proof read translation. Many editors just take the text of English paste it into Google translate and copy the result and paste it into Ukranian Wikipedia without editing it. This translation is not perfect.
Q: How do you know people are doing that?
First of all I saw one doing it on a conference sitting behind laptop; second they don't hide it. Don't see
also happening with content translations - people don't read the info they're adding int
Q: Are they doing this with ContentTranslation? What is the motivation for doing this? __ we're chasing quatlity instead of numbers, always wanting new articles to be more and more prollific . AI can scale dramatically.

Q: There's a lot of people coming in and going out do we want to move to a bigger room
<itnermission as we move to ballroom 1>

  • So we are gathered here to talk aboiut generative AI and how we should prepare for the new age when Internet/electricity/steam engine became a thing...\

The wiki community has a challenge to face, people can change something and no one will notice.
Some cases like in Chinese Wikipedia there was a whole cluster of made up articles, fantasy that were discovered only years later. In Polish
<those hundreds of articles in Chinese were human generated>

  • The way a cut and paste article from google translation looks very different from a generated article. So if the Foundation's architecture wants to recognise things that are generated AI, we could do that. There are a lot of problems
  • The issue with ChatGPT is it might write very plausible text with stylistic mistakes that nobody except a specialist will notice. The second world war started on a certain date but only a professional historian knows the actual date.
  • But that's not my point - the signature- you know when to look more closely at a large block of text than if an article generates slowly over time.
  • We do it on the basis of the problems and concerns. There are also opportunities though this is not all bad and I do think that these developments come with market logic. You can argue for or against them if it meets a demand. You see new technological inventions as an invasion, this will destroy us, but it also needs us to respond to them. To give an example, I have a friend who is a moderator for a popular radio channel and her voice is popular and she creates more visual content now, because she heard that maybe her voice would be taken over by AI in future so she should be more present with her face. But will people want this? AI generated voices - this is one tiny argument to come from the point of view that it's not super negative and scary. There is a bit more than this.
  • The voice thing - if the voice can be generated in a way that can sound identical, it could potentially take over.. Maybe voices can be copyrighred. We don't look at the legal perspective that decide about the framework!, in Wikimedia and in the government.
  • I'm going to jump to solution but is there a template we can use that says "I think this is AI generated"
  • Have any communities used that? Has anyone implemented any solution that we can look at?
  • We're humans, we question a lot of things and create a big backlog
  • what I think is that there is a problem with seeing how people use generative AI to create an article. only used chatgpt and so far that doesn't give citations. should things be licensed and require sources? if no sources, shoudl it not be allowed to be on wikipedia (auto delete?)
  • Bringing more examples of these generated content that is harmful to Wikimedia projects - some time ago we had the a wikipedia that one user created most of the articles. The decision was that the articles would not be created in Wikidata any more. One other topic that I'd like to bring as a priority for the "global south" communities to advocate for standards and protocals in AI that respect indigenous communities - AI is mostly based on the global north and English-speaking perspective, not the global south. Do not have perspectives from all communities should do. Work towards the benefit of the movement by putting that context into Wikipedia (with a bias for global north). bias does not prevent more bias
  • [replicating and listening to AI voice] reminds me of hypothetical lab grown meat - will people want to eat it in the future? it is ethical?
  • with all this AI, we're missing the point of what we're here to do. we are here to create knowlendge that is open, truthful and of quality (not quantity). maybe we need to put the focus on the quality of the articles, rather than focuing on having more people write the articles. maybe we'd have more content for everyone to be shared.

.[L] I was curious how commons is handling AI generated images right now. It's really interesting looking at deletion requests and seeing associated discussion - is it realistic, is it historically accurate, imperalist bias, and this is very valid and be worried about and we should also look at and explore on how we can stop that. I'm a big user of DALL.E but I like to generate diverse images of different skin colors and there's very strange results when I do that. Last week, I got a very racist image of a woman in color when I typed in the prompt "women with brown skin relaxing reading a book on couch".
[M]The AI community loves our data. They use it for a lot of information but they don't know how to contribute and expand the human knowledge. The market has changed, how do we react to it. We can partner with organizations that are ethical and will give fund back to our work via donations. Our job is to get information out to the world. We have the most wide ranging languages and cultural information from across the road and it's not being fed into this models (image, audio). By contributing to thee models they will become diverse and the open source AI communities are really apt for this (talked to 50 people at the last meeting). It's about not knowing what we have, not being able to access it . We work with them to improve that. They didn't know about Wikidata for example, only Wikipedia.
[host] in my country Ukraine there was one case where people wanted AI and one where they don't. Kiev metro, voice actor was used to announce station names, and say "be careful traindoor is closing". Was doing this for 50 years and died and then city government had to replace him. Tried different voices but nobody was liked by the city residents, so they proposed to put his voice into AI.
[host] Opposite case: Ministry of affairs of Ukraine said they had developed an AI spokeswoman and released videos of this woman commmenting on recent events and foreign policy but was not a real person. People don't like her because pyschologically they want a real person to reassure them. Don't believe the message as its a heartless machine.
[Susana] Jose of Wikipedia knowledge - there was some discussion I overheard that some labelling / community practices for tagging conflict don't travel to these areas so it can be treated differently. This understanding of special nature would then be understood by the reusers of the knowledge. There is likely a lot of legistation to come. Right now it is a wild west/gold rush. We are in a good position to define those practices as we know about the whole thing.
[QD] there are special practices around indigenous knowledge.
[Susana] we have very few

  • We have research in Brasil about it

(Will send a link to telegram group or put it in this etherpad)

  • The discrimination in systems will be propagated. When I go to commons sometimes I am scared as I am not sure what I am going to see as a lot of things are unfiltered and a bit too unfiltered for my taste - for example pictures of violence. I do have situations where I wish content was arranged differently. There are ways we could ask volunteers to adjust/moderate differently. Think of using AI

the people that are able to do the things, this time comes with privilege. Mostly white men moderating. You could use generative AI to represent under-represented voices. Will take them less time to do it. Build more equity by equipping these people with tools to contribute.
[Host] need a whole orchestra to replicate your idea, but now you can download a program on your notebook, print some keys and listen how it sounds. Another of my concerns is that current AIs are trained on the data which was generated by people. Sum of all knowledge available to people and used to train an AI. It's being reused againx3 and not capable of creating original ideas. It may be reusing the old ideas over and over again
and we will be stuck in one point of time forever. When I write to AI I write "please" so they will spare me in the war with the robots.

  • If humans were not creating content is the only way, this would happen. It is important to put the right training data.

[HOST] what if people become too lazy to generate their own data? and rely only on AI?

  • Not everyone can do it is the thing at the core. You need to have the knowledge.
  • Just curious has anyone typed into ChatGPT to see what it thinks about making Wikipedia obsolete and got 2 answers: (1)Generative AI has the ability to contribute but is unlikely to make Wikipedia obsolte. its a dynamic platform. Is unlikely to replace the human touch and editorial processes. (2) Lots of points about legal/ethical/human touch/interpretations. the following italic text is the question and answer posed:

Q posed to ChatGPT3.5: Will generative AI make wikipedia obsolete
A1: Generative AI certainly has the potential to contribute to information generation and synthesis. However, it's unlikely to render Wikipedia obsolete. Wikipedia is more than just a collection of information; it's a dynamic platform curated by volunteers worldwide, ensuring accuracy, neutrality, and comprehensiveness. Generative AI might assist in creating content, but it's unlikely to replace the human touch required for maintaining Wikipedia's standards and navigating its complex editorial processes.
A2: Generative AI has the potential to create content, but it's unlikely to make Wikipedia obsolete for several reasons:
Quality and Reliability: Wikipedia is known for its rigorous editorial standards and community-driven oversight. Generative AI, while capable of generating text, may not consistently produce accurate, unbiased, or well-researched content that meets Wikipedia's standards.
Contextual Understanding: Generative AI lacks true comprehension of context, nuance, and real-world knowledge. Wikipedia's articles are written by humans who understand the topics deeply and can connect related information in meaningful ways.
Dynamic and Living Community: Wikipedia is a dynamic platform with a large community of editors continuously updating and improving content. Generative AI, on the other hand, creates static outputs based on existing patterns and data.
Legal and Ethical Considerations: The use of generative AI for content creation raises concerns about plagiarism, copyright infringement, and ethical use of information. Wikipedia's content is mostly generated by individuals who adhere to legal and ethical guidelines.
Human Touch and Interpretation: Wikipedia articles often reflect the diversity of human perspectives and interpretations. Generative AI lacks this human touch and may struggle to convey complex ideas accurately.
In summary, while generative AI may automate certain aspects of content creation, it's unlikely to replace the multifaceted contributions and quality control mechanisms that characterize Wikipedia. Instead, AI can complement Wikipedia by assisting human editors in tasks like fact-checking, language translation, or identifying content gaps.

  • Thanks for bringing that up! ;) it's a good exaple of collaborating. but we can also be having several pessimistic comments - driven to bring this in a more positive direction. If we stop creating, we are lost anyway.
  • It's very human to create.
  • I like the idea of using generative AI to make the moderation better because I think many of the problems we have mentioned - lack of neutrality, sources are problems we already have (racism,). The problem with AI is it increases the volume and makes these issues worse as it's faster to make those same mistakes. One way to attack is to use AI for moderation as well as creation and quality of articles.
  • It's not about the fact we're going to stop doing this thing. There is nothing we do as a people. We had a discussion with developers yesterday that when you are using AI you are doing it to support your creation. e.g. when you want to get started or be given a push when stuck on an idea rather than blatantly . Most of us not doing that. I'm evolving it. When history creates things, we use it to build on top of it. e.g. what comes next, we just don't have enough practice. There's a middle ground before here and there.
  • the best thing ever is the +1 - perfectly happy with that. humans should be doing that.
  • I think to achieve the goal of making Wikipedia - we need to add more interactivity. Maybe AI is needed to control that. We are too slow.
  • a very good point - working on small wiki's - are we covering the right topics? are there others we should be doing? we need feedback on consumption of articles. people search for things and if they got a response or not - would be good to get that info on if they found what they were looking for
  • Are we ready to ?? What better could we do so we can do more than ChatGPT

[HOST] I gave a speech last year in UK, lot of people in audience with different interests. started expaining about AI bot. concluded with - we're all technical folks and understand this. but then when talking with wikipedians - they understood the first ten min and then were lost. couldn't understand. made an effort to explain AI to your local communities so that they can understand and use it in a good way (to use it in a wise way)

  • tedious formatting of wikipedia articles, edit a lot of text but not sure about lists or tables. in our group - we use chatGPT to generate information on artists in events - to determine who of those people have and article and not (and need to have one written).

[AA] What did you mean by interactive?
[BB] Ask questions to understand information.
[Host] Come to Helsinki to hear more about this subject :)
[Susana] Event is fully booked. Let me pitch what it's about. Intersection of GLAM and use of AI. What do institutions want from this environment and how does AI influence this. Lightning talks, workshops, hack day.


Are there already AI cleanup templates?



Event Timeline

Hello! 👋 The 2024 Hackathon Program is now open for scheduling! If you are still interested in organizing a session, you can claim a slot on a first-come, first-serve basis by adding your session to the daily program, following these instructions. We look forward to hearing your presentation!

debt triaged this task as Medium priority.Apr 17 2024, 7:25 PM

Hi @tahaomg looking forward to seeing you at hackathon! I noticed in the schedule this is scheduled for 30 minutes but in the task description here it says 50 mins. Which is correct?

Hi @tahaomg looking forward to seeing you at hackathon! I noticed in the schedule this is scheduled for 30 minutes but in the task description here it says 50 mins. Which is correct?

I have edited the timetable so that this session takes two neighbouring slots with total duration of 1 hour.