Page MenuHomePhabricator

[Session] WikiGPT - Natural Language search results based on Wikipedia knowledge and ChatGPT
Closed, ResolvedPublic

Description

  • Title of session: WikiGPT - Natural Language search results based on Wikipedia knowledge and ChatGPT
  • Session description: The session will be a demo of an application named WikiGPT that the Machine Learning Team built as a proof of concept of what the future of Wikipedia could look like using natural language queries to interact with Wikipedia's content.
  • Username for contact: isarantopoulos
  • Session duration (25 or 50 min): 25
  • Session type (presentation, workshop, discussion, etc.): presentation/demo
  • Language of session (English, Arabic, etc.): English
  • Prerequisites (some Python, etc.): A general idea about what Large Language Models do (LLMs), but people without relevant knowledge can attend.
  • Any other details to share?:
  • Interested? Add your username below:

Event Timeline

Below you may find the link attached that redirects the user to the corresponding Etherpad: https://etherpad.wikimedia.org/p/wmh2023-WikiGPT_-_NL_search_results

Session Notes:

WikiGPT - Natural Language search results based on Wikipedia knowledge and ChatGPT

Date & time: Friday, May 19th at 15:00 pm EEST / 12:00 pm UTC

Relevant links

https://wiki-gpt.toolforge.org/search

Speaker

Ilias Sarantopoulos

Participants (~35)

Notes

Problem statement:

  • be able find complex information in a more automated way
  • Need for natural language search interfaces
  • Interact with our searchbox more interactively

Problems with LLMs as a knowledge base: hallucinations, staleknowledge, ethical considerations

  • Can lead to spread misinformation.

ChatGPT example of famous dogs

  • Google can provide equally good answers with chatGPT
    • Google provides sources, while chatGPT does not
    • ChatGPT also returns nonexistent statues
    • When we ask wikimedia sites, we get some results, though not very efficient

Enter WikiGPT

  • use Wikipedia as a knowledge base; the strength is its reliability
  • use LLM as an interface to assist as a search engine

WikiGPT Architecture

  • (TBA after the session)

QUESTION -> Large Language Model -> Knowledge base (eg wikipedia) -> Large Language Model -> Answer
(somewhere in the middle: chatGPT)

Live Demo!
wiki-gpt.toolforge.org/search (password protected) -> results are pre-calculted as normally this would be a slow process
Examples:

  • Who won WWII ?
    • In the result there is a list of sources that led to the answer we got from chatGPT
    • There is Call of Duty Championshit 2018
  • What is the Technopolis in Athens
    • Sources are related to the actual topic
  • Who were the 5 astronauts that landed apollo 11 on the moon
    • Chatgpt replied about 5 astronauts, though they were 3
    • Chatgpt4 actually replied that they were 3 and 2 of them walked
  • Where can I buy a fridge at Technopolis in Athens
    • While it replied that his question is out of scope
    • The linked it provided were articles about shops selling appliances

Future Directions:

  • Using a closed source application is problematic
  • Search using article embeddings

Question:

Q: Can any technology for producing a list of related articles to the question be used?

A: yes, any search engine. Even something like chatGPT could be used but then we'd be back at the original problem (stale knowledge, results inlcude data of when the model was trained)

Open Source LLMs on Lift Wing (https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing)

  • Open source models can help if their improvement as we as provide confidence to them
  • Licensing is challenging, so we are looking for one that represents our values

Wikimedia Research

Discussion:

  • The test page is password protected because we do not want to be responsible for the content, it could be wrong
  • Do you have a way for the plugin to include updated content?
    • The plugin does a search on google, gets wikipedia results, and parses them that is information is more up to date
  • Did you cosider only giving the relevant articles as a result and not a whole answer? Could improve wikipedia search. ChatGPT can also write SPARQL queries, could be a nice way for users to write them without having to know the language.

    many possible ways to use it
  • What to do with outdated content?

    same issue with Wikipedia, could add time interval
  • Phrasing can be really subtle and also very relevant if you change small things.

    current phrasing is basically the same as on wikipedia. Manual fixes would be needed

    Could also think about asking for direct citations when using Wikipedia
  • The problem we are trying to solve is not just technical
    • we could end up wit controvertal results

@isarantopoulos: Thanks for participating in the Hackathon! We hope you had a great time.

  • If this session / event took place: Please change the task status to resolved via the Add Action...Change Status dropdown.
    • If there are session notes (e.g. on Etherpad or a wiki page), or if the session was recorded, please make sure these resources are linked from this task.
    • If there are specific follow-up tasks from this session / event: Please create dedicated tasks and add another active project tag to those tasks, so others can find those tasks (as likely nobody in the future will look at the Hackathon workboard when trying to find something they are interested in).
  • In this session / event did not take place: Please set the task status to declined.

Thank you,
Phabricator housekeeping service