[Session] WikiGPT - Natural Language search results based on Wikipedia knowledge and ChatGPT
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	isarantopoulos
	Apr 4 2023, 3:51 PM

Description

Title of session: WikiGPT - Natural Language search results based on Wikipedia knowledge and ChatGPT
Session description: The session will be a demo of an application named WikiGPT that the Machine Learning Team built as a proof of concept of what the future of Wikipedia could look like using natural language queries to interact with Wikipedia's content.
Username for contact: isarantopoulos
Session duration (25 or 50 min): 25
Session type (presentation, workshop, discussion, etc.): presentation/demo
Language of session (English, Arabic, etc.): English
Prerequisites (some Python, etc.): A general idea about what Large Language Models do (LLMs), but people without relevant knowledge can attend.
Any other details to share?:
- Etherpad Link
Interested? Add your username below:

Related Objects

Mentioned In: T333127: [Session] LLMs, ChatGPT, machine learning tools, etc

Event Timeline

isarantopoulos created this task.Apr 4 2023, 3:51 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 4 2023, 3:51 PM

isarantopoulos updated the task description. (Show Details)Apr 4 2023, 3:51 PM

srishakatux moved this task from Backlog to Proposed sessions on the Wikimedia-Hackathon-2023 board.Apr 5 2023, 6:26 AM

Slst2020 updated the task description. (Show Details)Apr 5 2023, 8:48 AM

Slst2020 subscribed.

diegodlh updated the task description. (Show Details)Apr 5 2023, 9:20 PM

diegodlh subscribed.

NicoleLBee updated the task description. (Show Details)Apr 9 2023, 6:46 PM

NicoleLBee subscribed.

MGerlach updated the task description. (Show Details)Apr 11 2023, 7:29 AM

MGerlach subscribed.

Tgr updated the task description. (Show Details)Apr 17 2023, 5:57 AM

Tgr subscribed.

kostajh updated the task description. (Show Details)Apr 17 2023, 7:46 AM

kostajh subscribed.

Daniel_Mietchen updated the task description. (Show Details)Apr 17 2023, 10:32 AM

Daniel_Mietchen subscribed.

srishakatux moved this task from Proposed sessions to Accepted sessions on the Wikimedia-Hackathon-2023 board.Apr 17 2023, 7:05 PM

LabDom updated the task description. (Show Details)Apr 18 2023, 10:55 AM

LabDom subscribed.

Nicholas_Perry updated the task description. (Show Details)Apr 19 2023, 1:15 PM

Nicholas_Perry subscribed.

Nicholas_Perry updated the task description. (Show Details)Apr 19 2023, 1:18 PM

kostajh mentioned this in T333127: [Session] LLMs, ChatGPT, machine learning tools, etc.Apr 26 2023, 7:03 PM

Lydia_Pintscher subscribed.Apr 27 2023, 7:49 AM

Michael subscribed.Apr 27 2023, 8:39 AM

ItamarWMDE updated the task description. (Show Details)Apr 27 2023, 11:01 AM

ItamarWMDE subscribed.

Below you may find the link attached that redirects the user to the corresponding Etherpad: https://etherpad.wikimedia.org/p/wmh2023-WikiGPT_-_NL_search_results

SpyridonKokotos updated the task description. (Show Details)May 4 2023, 8:37 PM

Muhammad_Yasser_Jazirahly_WMDE updated the task description. (Show Details)May 9 2023, 11:38 AM

Muhammad_Yasser_Jazirahly_WMDE subscribed.

adee_wmde updated the task description. (Show Details)May 16 2023, 7:51 AM

adee_wmde subscribed.

roti_WMDE updated the task description. (Show Details)May 16 2023, 3:26 PM

roti_WMDE subscribed.

Nidiah subscribed.May 16 2023, 10:04 PM

Stang subscribed.May 17 2023, 1:13 PM

akosiaris updated the task description. (Show Details)May 18 2023, 9:45 AM

akosiaris subscribed.

Eleni.Christopoulou subscribed.May 18 2023, 7:59 PM

01tonythomas updated the task description. (Show Details)May 19 2023, 8:14 AM

01tonythomas subscribed.

Alexey_Skripnik subscribed.May 19 2023, 8:24 AM

WhitePhosphorus subscribed.May 19 2023, 11:59 AM

Jorgelhq subscribed.May 19 2023, 5:58 PM

Session Notes:

WikiGPT - Natural Language search results based on Wikipedia knowledge and ChatGPT

Date & time: Friday, May 19th at 15:00 pm EEST / 12:00 pm UTC

Relevant links

Phabricator task: https://phabricator.wikimedia.org/T333974

https://wiki-gpt.toolforge.org/search

Speaker

Ilias Sarantopoulos

Participants (~35)

Notes

Problem statement:

be able find complex information in a more automated way
Need for natural language search interfaces
Interact with our searchbox more interactively

Problems with LLMs as a knowledge base: hallucinations, staleknowledge, ethical considerations

Can lead to spread misinformation.

ChatGPT example of famous dogs

Google can provide equally good answers with chatGPT
- Google provides sources, while chatGPT does not
- ChatGPT also returns nonexistent statues
- When we ask wikimedia sites, we get some results, though not very efficient

Enter WikiGPT

use Wikipedia as a knowledge base; the strength is its reliability
use LLM as an interface to assist as a search engine

WikiGPT Architecture

(TBA after the session)

QUESTION -> Large Language Model -> Knowledge base (eg wikipedia) -> Large Language Model -> Answer
(somewhere in the middle: chatGPT)

Live Demo!
wiki-gpt.toolforge.org/search (password protected) -> results are pre-calculted as normally this would be a slow process
Examples:

Who won WWII ?
- In the result there is a list of sources that led to the answer we got from chatGPT
- There is Call of Duty Championshit 2018

What is the Technopolis in Athens
- Sources are related to the actual topic

Who were the 5 astronauts that landed apollo 11 on the moon
- Chatgpt replied about 5 astronauts, though they were 3
- Chatgpt4 actually replied that they were 3 and 2 of them walked

Where can I buy a fridge at Technopolis in Athens
- While it replied that his question is out of scope
- The linked it provided were articles about shops selling appliances

Future Directions:

Using a closed source application is problematic
Search using article embeddings

Question:

Q: Can any technology for producing a list of related articles to the question be used?

A: yes, any search engine. Even something like chatGPT could be used but then we'd be back at the original problem (stale knowledge, results inlcude data of when the model was trained)

Open Source LLMs on Lift Wing (https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing)

Open source models can help if their improvement as we as provide confidence to them
Licensing is challenging, so we are looking for one that represents our values

Wikimedia Research

Has developed a ChatGPT plugin that uses wikipedia as source and includes wikipedia links in answers
code is publicly available: https://gitlab.wikimedia.org/repos/machine-learning/chatgpt-plugin/-/tree/dev
Example: "Who is the queen of England?" Answer contains links to relevant wikipedia articles, where information was sourced

Discussion:

The test page is password protected because we do not want to be responsible for the content, it could be wrong
Do you have a way for the plugin to include updated content?
- The plugin does a search on google, gets wikipedia results, and parses them that is information is more up to date
Did you cosider only giving the relevant articles as a result and not a whole answer? Could improve wikipedia search. ChatGPT can also write SPARQL queries, could be a nice way for users to write them without having to know the language.

many possible ways to use it

What to do with outdated content?

same issue with Wikipedia, could add time interval

Phrasing can be really subtle and also very relevant if you change small things.

current phrasing is basically the same as on wikipedia. Manual fixes would be needed

Could also think about asking for direct citations when using Wikipedia

The problem we are trying to solve is not just technical
- we could end up wit controvertal results

@isarantopoulos: Thanks for participating in the Hackathon! We hope you had a great time.

If this session / event took place: Please change the task status to resolved via the Add Action... → Change Status dropdown.
- If there are session notes (e.g. on Etherpad or a wiki page), or if the session was recorded, please make sure these resources are linked from this task.
- If there are specific follow-up tasks from this session / event: Please create dedicated tasks and add another active project tag to those tasks, so others can find those tasks (as likely nobody in the future will look at the Hackathon workboard when trying to find something they are interested in).
In this session / event did not take place: Please set the task status to declined.

Thank you,
Phabricator housekeeping service

isarantopoulos closed this task as Resolved.Jun 13 2023, 2:59 PM

Stang unsubscribed.Jun 16 2023, 10:33 PM

[Session] WikiGPT - Natural Language search results based on Wikipedia knowledge and ChatGPTClosed, ResolvedPublicActions