Page MenuHomePhabricator

Provide valid rdf error response for invalid data in the database
Closed, ResolvedPublic

Description

While we should never have invalid data in our database, we should still degrade somewhat gracefully if we do and provide something syntactically valid to the user if they request RDF or ttl data.

So if we have for example invalid data like the following (notice the strange precision):

{
        "P625": [
          {
            "mainsnak": {
              "snaktype": "value",
              "property": "P625",
              "datavalue": {
                "value": {
                  "latitude": -90,
                  "longitude": 0,
                  "altitude": null,
                  "precision": 146706019195900,
                  "globe": "http://www.wikidata.org/entity/Q308"
                },
                "type": "globecoordinate",
                "error": "$precision needs to be between -360 and 360"
              },
              "datatype": "globe-coordinate"
            },
            "type": "statement",
            "id": "Q3629997$469068A1-32BD-4D93-AF94-150EEC63BD2D",
            "rank": "normal",
            "references": [...]
          }
        ],
}

Then the ttl (and RDF) could look somewhat like this:

wd:Q3629997 # no wdt:P625;
  p:P625 wds:Q3629997-469068A1-32BD-4D93-AF94-150EEC63BD2D.
wds:Q3629997-469068A1-32BD-4D93-AF94-150EEC63BD2D # no ps:P625;
  psv:P625 _:someBlankNode.
_:someBlankNode rdf:type wikibase:Error; # look for more appropriate class
  wikibase:errorMessage "$precision needs…". # TODO needed?

that way

  • queries selecting wdt:P625 ?value silently skip the value
  • queries counting p:P625 [] count the statement
  • queries looking at p:P625/psv:P625 [ wikibase:geoLongitude ?lon; wikibase:geoLatitude ?lat ] skip the value

Event Timeline

Hm, and what if a malformed datavalue appears in a qualifier or reference?

Probably the same thing: make the pqv:/prv: values error bnodes, leave out pq:/pr:.

On second thought, I’m not sure there’s anything to do here…? I thought earlier that T144248 was about a crash, but it’s just a logged warning – we already “provide something syntactically valid to the user”:

curl -s 'https://www.wikidata.org/wiki/Special:EntityData/Q3629997.ttl?nocache=1536759432200&flavor=dump' | grep -A10 P625
wd:Q3629997 p:P625 s:Q3629997-469068A1-32BD-4D93-AF94-150EEC63BD2D .

s:Q3629997-469068A1-32BD-4D93-AF94-150EEC63BD2D a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	prov:wasDerivedFrom ref:d5847b9b6032aa8b13dae3c2dfd9ed5d114d21b3 .

wd:Q3629997 p:P646 s:Q3629997-757F941A-55D7-44DC-8568-2B92F4ECC16F .

s:Q3629997-757F941A-55D7-44DC-8568-2B92F4ECC16F a wikibase:Statement,
		wikibase:BestRank ;

That is, we emit a statement node with rank and references, but no main snak at all, neither ps:P625 nor psv:P625 (or wdt:P625).

I suppose the TTL proposed in the task description (attach some sort of error under psv:P625) could still be an improvement, but it hardly seems worth the effort. So maybe the only thing left to do is to make the warning more quiet so it doesn’t pollute production logstash? (I think “notice” log level would be appropriate: “normal, but significant, condition”, above “info” but below “warning”.)

On second thought, I’m not sure there’s anything to do here…? I thought earlier that T144248 was about a crash, but it’s just a logged warning – we already “provide something syntactically valid to the user”:

curl -s 'https://www.wikidata.org/wiki/Special:EntityData/Q3629997.ttl?nocache=1536759432200&flavor=dump' | grep -A10 P625
wd:Q3629997 p:P625 s:Q3629997-469068A1-32BD-4D93-AF94-150EEC63BD2D .

s:Q3629997-469068A1-32BD-4D93-AF94-150EEC63BD2D a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	prov:wasDerivedFrom ref:d5847b9b6032aa8b13dae3c2dfd9ed5d114d21b3 .

wd:Q3629997 p:P646 s:Q3629997-757F941A-55D7-44DC-8568-2B92F4ECC16F .

s:Q3629997-757F941A-55D7-44DC-8568-2B92F4ECC16F a wikibase:Statement,
		wikibase:BestRank ;

That is, we emit a statement node with rank and references, but no main snak at all, neither ps:P625 nor psv:P625 (or wdt:P625).

I suppose the TTL proposed in the task description (attach some sort of error under psv:P625) could still be an improvement, but it hardly seems worth the effort.

I agree. The naming issue is apparently unrelated.

So maybe the only thing left to do is to make the warning more quiet so it doesn’t pollute production logstash? (I think “notice” log level would be appropriate: “normal, but significant, condition”, above “info” but below “warning”.)

Either that, or we could keep it an error and log it to the Wikibase channel. Because if it happens again, then we would surely want to know about it because that means we got invalid data into our DB again and need to do something about it. (Assuming the invalid data currently in the DB gets removed, as talked about elsewhere.)

So maybe the only thing left to do is to make the warning more quiet so it doesn’t pollute production logstash? (I think “notice” log level would be appropriate: “normal, but significant, condition”, above “info” but below “warning”.)

Either that, or we could keep it an error and log it to the Wikibase channel. Because if it happens again, then we would surely want to know about it because that means we got invalid data into our DB again and need to do something about it. (Assuming the invalid data currently in the DB gets removed, as talked about elsewhere.)

Fair enough. (It would be really cool if we could log this conditionally depending on whether it happens for a recent/latest revision or a historical one, but I don’t think that’s feasible.)

Change 700373 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Inject logger into DispatchingValueSnakRdfBuilder

https://gerrit.wikimedia.org/r/700373

Change 700374 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseLexeme@master] Inject logger into ValueSnakRdfBuilderFactory

https://gerrit.wikimedia.org/r/700374

Change 700375 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseMediaInfo@master] Inject logger into ValueSnakRdfBuilderFactory

https://gerrit.wikimedia.org/r/700375

Change 700373 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Inject logger into DispatchingValueSnakRdfBuilder

https://gerrit.wikimedia.org/r/700373

Change 700374 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Inject logger into ValueSnakRdfBuilderFactory

https://gerrit.wikimedia.org/r/700374

Change 700375 merged by jenkins-bot:

[mediawiki/extensions/WikibaseMediaInfo@master] Inject logger into ValueSnakRdfBuilderFactory

https://gerrit.wikimedia.org/r/700375

Addshore subscribed.

The logging change looks good to me