Wikibase Cloud data export tool
Open, Needs TriagePublicFeature
Actions

Assigned To

None

Authored By

	Drjwbaker
	Jul 5 2022, 3:59 PM

Description

Feature summary (what you would like to be able to do and where):

I'd like to be able to download the data (Qs/Ps) in a wikibase as a dataset. I would like this feature to be supported as part of Wikibase Cloud. Possible need for history, discussion, etc, but I'm focused on getting the statements/triples out in some kind of plain text form (rdf, ttl, yml, etc).

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

I work in cultural heritage. I work on the principle that interactive sites we build as part of projects, like a wikibase, will at some point cease to be updated as projects lose momentum, team members, etc, and then at some point go offline. However, the data remains of value *after* the interactive website is gone. So my model of preservation involves creating dumps of the data produced and depositing those somewhere for reuse at a later date by somebody who want to use that data.

Benefits (why should this be implemented?):

There are ineffecient/computational solutions to this problem - scraping a site with wget, using dumpgenerator https://github.com/WikiTeam/wikiteam/issues/395 - but they are unsupported by Wikibase Cloud. This feature would encourage Wikibase Cloud creators to carefully consider the preservation of the data they produce, and given them a supported tool for exporting their data at the end of a project (which is often a requirement of a research project).

Related Objects

Mentioned In: T340074: Option to produce RDF dump on demand

Event Timeline

Drjwbaker created this task.Jul 5 2022, 3:59 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 5 2022, 3:59 PM

Some wbstack.com context here is that I always said this would be desired, and did provide the odd JSON or RDF dump to people that requested them (manually creating them).
My past goal would have been to have this be self serve.
But also not allow folks to make infinite dumps arbitrarily, as creating the dumps is not free.

Thanks for chipping in @Addshore. To amend my feature request:

json or rdf is sensible.
agree on self-serve and some kind of dump limit (I'd be looking to do this roughly every 1-3 months)

Morning. I note that since this ticket went in a more stable Python 3 verison of dumpgenerator has emerged (huge thanks to all involved): https://github.com/elsiehupp/wikiteam3/ This solution works for me for now, but I suspect there are users for whom this use case remains valid.

Sharing the data and using the exported data for import into our UI are primary use cases for us. We definitely need RDF and may need JSON as well, but the dumpgenerator @Drjwbaker cites can be used to get the JSON.

Evelien_WMDE subscribed.Dec 16 2022, 12:55 PM

Evelien_WMDE moved this task from Backlog (incoming) to Product prioritized backlog on the Wikibase Cloud board.Sep 1 2023, 2:02 PM

So9q subscribed.Feb 27 2024, 8:00 AM

Kbseah subscribed.Mar 18 2024, 6:12 PM

Uomovariabile subscribed.Mar 20 2024, 11:49 AM

GreenReaper mentioned this in T340074: Option to produce RDF dump on demand.Tue, Mar 26, 1:17 AM

Superraptor123 subscribed.Tue, Mar 26, 1:17 AM

Wikibase Cloud data export toolOpen, Needs TriagePublicFeatureActions

Description

Related Objects

Event Timeline

Wikibase Cloud data export tool
Open, Needs TriagePublicFeature
Actions