Page MenuHomePhabricator

Unrecognized subject messages in Updater
Closed, ResolvedPublic

Description

Updater logs show these messages:

05:26:33.222 [update 5] INFO  o.wikidata.query.rdf.tool.rdf.Munger - Unrecognized subjects: [http://www.wikidata.org/entity/statement/Q63199357-bb5c53bf-4c82-1b40-a7bc-21396dc9332b] while processing http://www.wikidata.org/entity/Q63199357.  Expected only sitelinks and subjects starting with http://www.wikidata.org/wiki/Special:EntityData/ and http://www.wikidata.org/entity/

This should never happen while fetching Wikidata updates, so we need to investigate what's going on there, might be some bug in Munger.

Event Timeline

Restricted Application added a project: Wikidata. · View Herald TranscriptApr 18 2019, 8:55 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Smalyshev added a comment.EditedApr 19 2019, 7:43 PM

Curiously, this happens only on hosts where revision-fetch is enabled for T217897. I wonder whether it's related, though I am not sure how. E.g. in wdq24 there's a lot of Unrecognized subjects messages while wdq21 has none.

First error on wdq24 is from 2019-04-11 07:45:56.038. Patch was merged also on Apr 11 and Updater was restarted with it on 07:42:21.097. I'm pretty much sure it's connected - still not sure how.

Also, errors on different servers do not seem to match, even though the content is supposed to be exactly the same - this is the whole point of caching after all! Something weird is definitely going on.

I think I know the reason... Constraint violations are always fetched for latest version, while revision fetches may be fetched for non-current ones. This means that constraint violation statements can include statement IDs that are not present in current revision... I wonder what the best way to fix it. Possible solutions are:

  1. Throw out constraints that relate to non-existing statements.
  2. Use some kind of If-Modified-Since protocol for constraints
  3. Somehow mark constraints with version it is relevant for and return nothing if older version is asked (eventually it would catch up and ask the right version)

I think 1 and 3 would be the best way to go, 2 may be useful too.
@Lucas_Werkmeister_WMDE, @Addshore, @WMDE-leszek would be happy to hear what do you think on that.

Smalyshev triaged this task as Normal priority.Apr 19 2019, 9:03 PM

Change 505312 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/rdf@master] Drop constraints that belong to unknown subjects.

https://gerrit.wikimedia.org/r/505312

Change 505337 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/WikibaseQualityConstraints@master] Allow revision= parameter for constraintrdf

https://gerrit.wikimedia.org/r/505337

Deploying the constraints patch on test servers seems to eliminate these messages.

Change 505312 merged by jenkins-bot:
[wikidata/query/rdf@master] Drop constraints that belong to unknown subjects.

https://gerrit.wikimedia.org/r/505312

Mentioned in SAL (#wikimedia-operations) [2019-04-23T20:42:38Z] <smalyshev@deploy1001> Started deploy [wdqs/wdqs@51b4728]: Deploy new Updater fix for cnstraints (T221407)

Mentioned in SAL (#wikimedia-operations) [2019-04-23T20:55:41Z] <smalyshev@deploy1001> Finished deploy [wdqs/wdqs@51b4728]: Deploy new Updater fix for cnstraints (T221407) (duration: 13m 03s)

Smalyshev closed this task as Resolved.Apr 23 2019, 8:57 PM

Change 505337 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Allow revision= parameter for constraintrdf

https://gerrit.wikimedia.org/r/505337

All of the options seem like they would work, looks like 1 was the easiest and it seems you've already done it?