Page MenuHomePhabricator

improve WD update performance
Closed, DuplicatePublic

Description

I've made 3.5M edits and I'm very frustrated with WD's update performance.
There can be no quality without the ability to easily make edits, that's why I'm posting in this board.

Wikidata updates have become frustratingly slow, especially through QS.
This demotivates data contributors and prevents us from making updates that would enrich data and improve quality.

This has been discussed many times in the past, and I calculated that WD updates 5 MILLION times slower than a conventional semantic repository.
Even QS batches that only remove statements cause many errors, eg https://quickstatements.toolforge.org/#/batch/62657 has failed at least 5 times.

Vladimir Alexiev, [31.08.21 11:16]
[ Photo ]
Can't reply on my own talk page? And QS remove batch failing at least 5 times? WD is down on its knees and has become a HUGE FRUSTRATION for any data contributor

Vladimir Alexiev, [31.08.21 11:16]
https://quickstatements.toolforge.org/#/batch/62657 is the failing batch.

(Some say that https://github.com/maxlath/wikibase-cli works better: see https://github.com/maxlath/wikibase-cli/issues/62 and https://github.com/maxlath/wikibase-edit/issues/64 for transition from QS to this tool, and see https://github.com/maxlath/ for other works of this author.)

IMHO WD updating is down on its knees and has become a HUGE FRUSTRATION for any data contributor. What are WMD's plans to ameliorate the situation?

Event Timeline

@VladimirAlexiev What's happening is that the batches you're running cannot be throttled further than maxlag and that is why you're unable to make other edits at the same time. The system is working as intended, besides Quickstatement going at full speed -- but that would be a fix for Magnus to look at.

I agree, the system is working as designed. Edits on-wiki are prioritized over API edits, which are throttled when WDQS cannot keep up. The worst bottleneck at the moment is Blazegraph and copying over every single item there on every edit.
I suggest closing this ticket.

@Mohammed_Sadat_WMDE Can you explain why a QS batch that merely deletes (does not create statements with qualifiers) returns errors and has to be restarted 10 or 20 times until all deletes go through? Or is "throttling" the same as "failing"?

Half a year ago I had another painful experience trying to insert 1.5M "WorldCat Identities" statements (with references) and calculated that WD updates are 5 MILLION times slower than a conventional semantic repository.

So let me ask again: what are WMDE's plans to ameliorate the situation?

We need to ensure that our infrastructure stays intact so as @So9q said there are intentional limits set by what the system as a whole can deal with. Right now the bottleneck is the query service with the time it takes to update that based on an edit that has been made in Wikidata. Improving that situation is for various reasons not trivial but has been worked on for many months now. The work for that by the search team is tracked in T244590.