provide incremental JSON dumps for Wikidata
Open, LowPublic
Actions

Assigned To

None

Authored By

	Lydia_Pintscher
	Sep 1 2014, 9:38 AM

Description

We should provide incremental dumps also for the JSON dumps.

Details

Reference: bz70246

Related Objects
Search...

Status	Assigned	Task
Open	None	T88728 Improve Wikimedia dumping infrastructure
Open	None	T88991 improve Wikidata dumps [tracking]
Open	None	T72246 provide incremental JSON dumps for Wikidata

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:41 AM

• bzimport added a project: MediaWiki-extensions-WikibaseRepository.

• bzimport set Reference to bz70246.

• bzimport added a subscriber: Unknown Object (MLST).

Lydia_Pintscher created this task.Sep 1 2014, 9:38 AM

Lydia_Pintscher added a project: Wikidata.Dec 1 2014, 2:41 PM

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).

This would probably be implemented like this: Have a script that dumps all entity ids that have been changed since the last incremental dump. Then just dump all entities on that list.
The first script is not yet implemented, but that shouldn't be to hard.

Potential shortcomings of this (that may or may not also apply to the other incremental dumps, I have no idea): Deletions and merges (that turn things into redirects) wouldn't show up that way.

• MZMcBride subscribed.Jan 6 2015, 11:08 PM

JanZerebecki merged a task: T85100: create incremental hourly dumps.Jan 12 2015, 11:49 AM

JanZerebecki added subscribers: Aklapper, JanZerebecki, Liuxinyu970226.

JanZerebecki mentioned this in T85951: Implement incremental updates.Jan 12 2015, 11:55 AM

Lydia_Pintscher added a project: Datasets-General-or-Unknown.Feb 9 2015, 4:00 PM

Lydia_Pintscher added a parent task: T88991: improve Wikidata dumps [tracking].Feb 9 2015, 4:03 PM

Nemo_bis lowered the priority of this task from Medium to Low.Apr 9 2015, 7:16 AM

Nemo_bis set Security to None.

Liuxinyu970226 unsubscribed.Apr 9 2015, 7:43 AM

I believe I originally asked for this, but current WDQ wouldn't use these anymore, and SPARQL replacements are on the way. In case I would have been the only customer, this task could be closed now.

If anyone else wants this please reopen.

ArielGlenn moved this task from Backlog to Done on the Datasets-General-or-Unknown board.Apr 25 2016, 3:27 PM

Report here what I have writed in Wikidata:
The actual JSON dump compressed is more than 6 Gigabyte so, it's possible to create json dumps with only item changed/added from the previous week/dump? This allows for smaller files, and then you need less time to download and decompression. Useful for those who have slow connections

Is useful for bot operator that done periodic task of maintenance

ValterVB reopened this task as Open.Nov 15 2016, 6:33 PM

I have a side project that would benefit from daily JSON dumps. Happy to look into providing this if there's anyone else who cares?

Lydia_Pintscher moved this task from incoming to needs discussion or investigation on the Wikidata board.Mar 1 2018, 1:08 PM

Addshore removed JanZerebecki as the assignee of this task.Aug 28 2018, 7:52 AM

Addshore updated the task description. (Show Details)

awight unsubscribed.Mar 21 2019, 4:07 PM

Bugreporter mentioned this in T229290: Incremental RDF dumps.Jul 29 2019, 11:10 PM

Hello - i am extremely interested in incremental JSON dumps. The dumps are now over 80 GB so it feels a bit weird having to process over 88 million records every two weeks just to get new and updated records. The amount of unneccessary downloaded GBs starts to grow rapidly, it would spare Wikidata a lot of bandwidth, if they care, and save me a lot of processing.

Akuckartz subscribed.Aug 5 2020, 6:45 PM

provide incremental JSON dumps for WikidataOpen, LowPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

provide incremental JSON dumps for Wikidata
Open, LowPublic
Actions

Related Objects
Search...