Page MenuHomePhabricator

[Session] Olympics data-thon and edit-a-thon
Closed, ResolvedPublic

Description

Let's have an Olympics datathon and edit-a-thon at the Wikimania 2021 Hackathon! The Hackathon itself is scheduled for August 13th, Friday. No specific time exists.

Thinks we can work on:

  • Ontology of events, medals, participants and delegations
  • Lists of recipients of medals
  • Lists of participants by delegation and year
  • Uploading to Wikidata medals won by each delegation by year
  • Change olympic discipline events to the relevant item (now some of them are under olympic sporting event, and not under discipline)
  • Add "Olympedia event ID" (P9055) to all of the 339 items of type "Olympic sporting event" for 2020
  • Add participation data to athlete items using P1344 ("participant in")
  • ...

(Feel free to add more and open subtasks)

We should be able to answer to some questions using Wikidata Query. Here are some of them. You can add more and the relevant query result when done.

  • which is the olympic delegation (2020 Summer Olympics) with more female/male members ratio?
  • which events in Tokyo gave a medal to countries that were part of the Warsaw Pact?
  • who was the younger and older winner of each event in all the record?
  • how many medals have won the countries which were part of the European Union?

Event Timeline

Possible basic query for all participants in the 2020 Summer Olympics:

https://w.wiki/3nMw

# Participants in the 2020 Summer Olympics
SELECT ?item ?itemLabel 
WHERE 
 {
  ?item wdt:P31 wd:Q5 .
  ?item wdt:P1344 ?event .
  ?event wdt:P361 wd:Q181278 .
  ?event wdt:P31 wd:Q26132862 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Things we can't currently do: make a list using Listeria of participants by olympic delegation. We should work on this.

FYI - I started an overview of structure of Olympic events a couple of years ago and there are still probably some open questions on that. The athletics-oriented Open Track data model may be of use for considering competition data structures.

That said, I think it's a very good plan to focus on making some specific things possible (like medallist lists) rather than getting stuck on high level ontology, as the complexity of Olympic competitions is very high.

Lea_Lacroix_WMDE renamed this task from Olympics data-thon and edit-a-thon to [Session] Olympics data-thon and edit-a-thon.Aug 5 2021, 6:57 AM
Lea_Lacroix_WMDE moved this task from Backlog to Sessions on the Wikimania-Hackathon-2021 board.

Possible basic query for all participants in the 2020 Summer Olympics:

https://w.wiki/3nMw

# Participants in the 2020 Summer Olympics
SELECT ?item ?itemLabel 
WHERE 
 {
  ?item wdt:P31 wd:Q5 .
  ?item wdt:P1344 ?event .
  ?event wdt:P361 wd:Q181278 .
  ?event wdt:P31 wd:Q26132862 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Some of the events are not under "discipline" (and they should). So the result is better using this query:

Participants in the 2020 Summer Olympics

SELECT ?item ?itemLabel WHERE {

?item wdt:P31 wd:Q5;
  wdt:P1344 ?event.
?event (wdt:P361+) wd:Q181278.
{ ?event wdt:P31 wd:Q26132862. }
UNION
{ ?event wdt:P31 wd:Q18536594. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

}

https://w.wiki/3nU4

Some of the events are not under "discipline" (and they should).

That is actually not correct.

For the Olympics, the only suitable values for P1344 ("participant in") are instances of Q18536594 ("Olympic sporting event"). These are the items on the finest level of detail that we have comprehensively and thus most specific ones available.

All "Olympic sporting event" items are supposed to be connected to instances of Q26132862 ("Olympic sports discipline event") bidirectionally via P361 ("part of") and P527 ("has part"). As much as I am aware, this should already be the case for all 339 Olympic sporting events and 40 Olympic sports discipline events of the 2020 Olympics.

There is also a noticable amount of claims P1344 ("participant in"): Q181278 ("2020 Summer Olympics"). These should be refined to "Olympic sporting event" level as well.

In case the terminology is not clear, here are some examples:

  • Q18536594 ("Olympic sporting event"): Q64809505 ("athletics at the 2020 Summer Olympics – men's 100 metres")
  • Q26132862 ("Olympic sports discipline event"): Q39080746 ("athletics at the 2020 Summer Olympics")
  • Q181278 ("2020 Summer Olympics")

Every Olympian has a Olympedia.org-profile but many new/recent Olympians are still missing this on WikiData.

Some of the events are not under "discipline" (and they should).

That is actually not correct.

For the Olympics, the only suitable values for P1344 ("participant in") are instances of Q18536594 ("Olympic sporting event"). These are the items on the finest level of detail that we have comprehensively and thus most specific ones available.

All "Olympic sporting event" items are supposed to be connected to instances of Q26132862 ("Olympic sports discipline event") bidirectionally via P361 ("part of") and P527 ("has part"). As much as I am aware, this should already be the case for all 339 Olympic sporting events and 40 Olympic sports discipline events of the 2020 Olympics.

There is also a noticable amount of claims P1344 ("participant in"): Q181278 ("2020 Summer Olympics"). These should be refined to "Olympic sporting event" level as well.

In case the terminology is not clear, here are some examples:

  • Q18536594 ("Olympic sporting event"): Q64809505 ("athletics at the 2020 Summer Olympics – men's 100 metres")
  • Q26132862 ("Olympic sports discipline event"): Q39080746 ("athletics at the 2020 Summer Olympics")
  • Q181278 ("2020 Summer Olympics")

Ok, then we understand the opposite for discipline. Whatever, everything should be added in the same way, now there is a little mess.

Every Olympian has a Olympedia.org-profile but many new/recent Olympians are still missing this on WikiData.

Would you be able to add that during the data-thon?

Every Olympian has a Olympedia.org-profile but many new/recent Olympians are still missing this on WikiData.

Would you be able to add that during the data-thon?

Well, if we have a (preferably FULL) list of contestants somewhere, I can generate a python script that will check for every contestant if an olympedia-profile is available in WikiData.
That will give us a list of olympians that are still missing, so we can work it out from there.

Every Olympian has a Olympedia.org-profile but many new/recent Olympians are still missing this on WikiData.

Would you be able to add that during the data-thon?

Well, if we have a (preferably FULL) list of contestants somewhere, I can generate a python script that will check for every contestant if an olympedia-profile is available in WikiData.
That will give us a list of olympians that are still missing, so we can work it out from there.

I hope that I can provide an Olympedia catalog for mix'n'match containing all 2020 Summer Olympics participants. Hopefully this evening or tomorrow.

Well, if we have a (preferably FULL) list of contestants somewhere, I can generate a python script that will check for every contestant if an olympedia-profile is available in WikiData.
That will give us a list of olympians that are still missing, so we can work it out from there.

I hope that I can provide an Olympedia catalog for mix'n'match containing all 2020 Summer Olympics participants. Hopefully this evening or tomorrow.

I made a start with a script https://public.paws.wmcloud.org/User:Edoderoo/workitems/Olympedia.ipynb here ... It can list all humans still missing an olympedia-ID.
We can use that for some aftercare, maybe together with the mix-n-match process.

I dropped all in a Google Sheet, so ppl can work it out from there: https://docs.google.com/spreadsheets/d/10H_uJ7SHdN4OLCwhk6oLZfKzFHrKa_P_D7nuaWX3lII/edit?usp=sharing
I also added a column with the number of statements, if ppl like to work on that, they can find some with just a few statements, in order to add a few more.

The Olympedia mix'n'match catalog is available at https://mix-n-match.toolforge.org/#/catalog/4628. Particularly via https://mix-n-match.toolforge.org/#/list/4628/auto, users can review suggestions and add missing Olympedia identifiers to Wikidata items via "Confirm", or reject the suggestion via "Remove". There is plenty to do, and still many easy cases to process.

The data (identifier, name, country, discipline) was scraped from Olympedia itself. It currently contains all listed 2020 Summer Olympics participants (11882 in total) except for Hockey players which are currently not available from Olympedia due to a bug. If necessary, I will update the catalog since there are still updates on Olympedia's side.

More tasks:

  • Add "Olympedia event ID" (P9055) to all of the 339 items of type "Olympic sporting event" for 2020. Related query: https://w.wiki/3qa4. (Only ~100 or so are still missing for 2020; this can be done manually.)
  • Add participation data to athlete items using P1344 ("participant in"). Discussion is needed which qualifiers should be added there alongside. This would result in very useful infobox-able statements.

I think I can automate the second task by importing from Olympedia, based on existing P9055 (events) and P8286 (athletes) identifiers. Addition of these identifiers should have high priority :-)

@Theklan: Thanks for participating in the Hackathon! We hope you had a great time.

  • If this session / event took place: Please change the task status to resolved via the Add Action...Change Status dropdown.
    • If there are specific follow-up tasks from this session / event: Please create dedicated tasks and add another active project tag to those tasks, so others can find those tasks (as likely nobody in the future will look at the Hackathon workboard when trying to find something they are interested in).
  • In this session / event did not take place: Please set the task status to declined.

Thank you,
your Hackathon venue housekeeping service

Partly done, there should be another project to accomplish everything, outside Wikimania Hachathon.