Page MenuHomePhabricator

provide incremental JSON dumps for Wikidata
Open, LowPublic

Description

We should provide incremental dumps also for the JSON dumps.

Details

Reference
bz70246

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 3:41 AM
bzimport set Reference to bz70246.
bzimport added a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
hoo added a comment.Jan 6 2015, 3:08 PM

This would probably be implemented like this: Have a script that dumps all entity ids that have been changed since the last incremental dump. Then just dump all entities on that list.
The first script is not yet implemented, but that shouldn't be to hard.

Potential shortcomings of this (that may or may not also apply to the other incremental dumps, I have no idea): Deletions and merges (that turn things into redirects) wouldn't show up that way.

Nemo_bis lowered the priority of this task from Normal to Low.Apr 9 2015, 7:16 AM
Nemo_bis set Security to None.
Magnus added a comment.Apr 9 2015, 8:07 AM

I believe I originally asked for this, but current WDQ wouldn't use these anymore, and SPARQL replacements are on the way. In case I would have been the only customer, this task could be closed now.

JanZerebecki closed this task as Resolved.May 4 2015, 12:58 PM
JanZerebecki claimed this task.

If anyone else wants this please reopen.

Report here what I have writed in Wikidata:
The actual JSON dump compressed is more than 6 Gigabyte so, it's possible to create json dumps with only item changed/added from the previous week/dump? This allows for smaller files, and then you need less time to download and decompression. Useful for those who have slow connections

Is useful for bot operator that done periodic task of maintenance

ValterVB reopened this task as Open.Nov 15 2016, 6:33 PM
awight added a subscriber: awight.Dec 19 2016, 8:37 PM

I have a side project that would benefit from daily JSON dumps. Happy to look into providing this if there's anyone else who cares?

Addshore removed JanZerebecki as the assignee of this task.Aug 28 2018, 7:52 AM
Addshore updated the task description. (Show Details)
awight removed a subscriber: awight.Mar 21 2019, 4:07 PM