Page MenuHomePhabricator

Loading dump creates multiple wikibase:Dump schema:dateModified statements
Closed, ResolvedPublic

Description

Since RDF dumps are sharded, resulting dump contains multiple wikibase:Dump schema:dateModified statements. When starting Updater anew from fresh dump load, it bases its starting point on schema:dateModified for dump, however since there are multiple ones, it can choose wrong one (too late) and miss some updates.

The fix for it can be twofold:

  1. Make Updater use only the earliest dateModified statement
  2. Make Munger filter out the extra ones (maybe remember the earliest one and drop ones that are higher).

Event Timeline

Smalyshev triaged this task as Medium priority.Aug 1 2019, 7:58 PM
Smalyshev created this task.

Change 527225 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/rdf@master] Only use earliest date-modified when loading dump

https://gerrit.wikimedia.org/r/527225

Change 527225 merged by jenkins-bot:
[wikidata/query/rdf@master] Only use earliest date-modified when loading dump

https://gerrit.wikimedia.org/r/527225