Page MenuHomePhabricator

Partial RDF dumps
Open, LowPublic

Description

Implement options for the RDF dump script to only dump a pre-defined set of entities (given on the command line or in a file, or by limiting the type of entities to dump).

In addition to the entities given explicitly, any properties used in describing the entities should be automatically included in the dump.


Version: unspecified
Severity: enhancement
Whiteboard: u=dev c=backend p=0

Details

Reference
bz44581

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusAssignedTask
OpenNone
OpenNone
ResolvedSmalyshev
OpenNone
OpenNone

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:28 AM
bzimport set Reference to bz44581.
bzimport added a subscriber: Unknown Object (MLST).
daniel created this task.Feb 1 2013, 11:36 AM
Denny added a comment.Feb 7 2013, 4:54 PM

A list of entities is very different than limiting it by the type of entities. Once this bug is taken up it should be first split into two.

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).

In addition to the entities given explicitly, any properties used in describing the entities should be automatically included in the dump.

What about other linked items in statements?

I wonder if this is still worth pursuing. Is there demand for it? What are the concrete usecases it's serve?

Lydia_Pintscher moved this task from incoming to hold on the Wikidata board.Mar 16 2015, 9:54 AM
Lydia_Pintscher closed this task as Invalid.Mar 30 2015, 11:00 AM
Lydia_Pintscher claimed this task.
hoo reopened this task as Open.Aug 14 2017, 2:52 PM
hoo added a subscriber: hoo.

Giving the speed at which the dump grows, we need to look into this again.

Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptAug 14 2017, 2:52 PM
Lucie added a subscriber: Lucie.Oct 6 2017, 4:20 PM

I'd suggest splitting (at least the nt dump as I work mostly with those and therefore have an opinion on that side) in the following:

  1. Triples regarding properties (where the property is the subject)
  2. Triples that contain language information (aka the labels)
  3. The "pure" direct triples

In case someone works with those, I think it's reasonable to assume they just need one of those dumps, or are able to combine them. Not sure however how much of that is already done (I think property-triples (1.) aren't in the nt dump atm anyway, are they?)

Lydia_Pintscher removed Lydia_Pintscher as the assignee of this task.Jan 14 2018, 12:44 PM
Bugreporter added a subscriber: Bugreporter.EditedJan 19 2019, 9:20 PM

These types of dumps should be considered:
Terms

  1. Dump of all labels
  2. Dump of all descriptions
  3. Dump of all aliases
  4. Dumps of all labels in a specific language
  5. Dumps of all descriptions in a specific language
  6. Dumps of all aliases in a specific language
  7. Dump of all terms (optional)
  8. Dumps of all terms in a specific language (optional)

Sitelinks

  1. Dump of all sitelinks
  2. Dumps of all sitelinks in a given wiki
  3. Dumps of all entities with sitelinks in a given wiki

Statements

  1. Dump of all statements
  2. Dump of all truthy statements
  3. Dumps of all statements for a property
  4. Dumps of all truthy statements for a property
  5. Dumps of all entities with statements for a property

Other

  1. Dump of all page properties (wikibase:statements, wikibase:sitelinks)

Users may easy to make a custom dump by combining several types of dumps above.