tiny subsets from Wikidata
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Lydia_Pintscher
	Feb 10 2023, 1:44 PM

Description

Wikidata can be a great provider of lists of entities based on queries. Some lists are used again and again but it can be a bit hard to get them from Wikidata. We can regularly run queries and provide CSV downloads of generally useful lists of entities.

The lists that could be useful:

countries and capitals
timezones
airports
languages
units
...

What would need to happen:

open a repository on the Wikidata github org
define the SPARQL query for the required list in a file in the repository
run the query, generate a result CSV and upload it to the repository
automate the above to regularly update the result CSV
publicize that this is available

Why would this be useful?

It'd encourage more streamlined modeling for the Items in these lists.
It'd further position Wikidata as a provider of standard sets of entities.
It'd make it easier to spot changes in the data similar to what happens with Listeria.
It'd encourage discussion around sharpening the definitions of concepts like "country"

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T90870 selfcontained projects around Wikidata (tracking)
		Open		None	T329368 tiny subsets from Wikidata

Event Timeline

Lydia_Pintscher created this task.Feb 10 2023, 1:44 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 10 2023, 1:44 PM

This should be integrated with T67626: [Epic] Support for queries on-wiki (automated list generation).

Arian_Bozorg mentioned this in T334499: [Session] Wikimedia Hackathon 2023 Opening.May 4 2023, 10:41 AM

I'd go more general and introduce "named queries". Some such queries might indeed have cached results. I'd remove the "tiny" restriction in case the named query is "relevant" e.g. popular highly requested, most useful ...

This sounds like a very good idea and shouldn't be to hard to implement (knock on wood!)… but if this becomes popular, are we going to run into any limits on github?

I wouldn'd cache the results on github but some other place in Wikimedia tool universe as outlined by Bryan Davis (whom i already talked about this during the hackathon last weekend)

Lydia_Pintscher triaged this task as Medium priority.Aug 2 2023, 9:54 AM

Lydia_Pintscher added a project: Wikimania-Hackathon-2023.

Lydia_Pintscher moved this task from Inbox to Hacking Projects on the Wikimania-Hackathon-2023 board.

So9q awarded a token.Feb 10 2024, 9:54 AM

So9q subscribed.

tiny subsets from WikidataOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

tiny subsets from Wikidata
Open, MediumPublic
Actions

Related Objects
Search...