Page MenuHomePhabricator

Convert blank nodes to “unknown value”
Open, MediumPublic

Description

Take, for example, the following query:

SELECT ?type ?typeLabel WHERE {
  wd:Q302 wdt:P1853 ?type.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

The results are:

wd:Q19831455AB
t505463773t505463773

t505463773 is the string representation of an RDF blank node, which in this context means unknown value. Since we don’t use blank nodes anywhere else in the Wikibase RDF export format, I think it would be a nice idea for the label service to display this as “unknown value”.

(Slight complication: ideally, “unknown value” should of course be localized, but as far as I can tell the entire query service doesn’t yet have any i18n infrastructure.)

Event Timeline

Restricted Application added a project: Discovery. · View Herald TranscriptAug 13 2017, 2:00 PM
Restricted Application added subscribers: PokestarFan, Aklapper. · View Herald Transcript

I think it would be a nice idea for the label service to display this as “unknown value”.

In GUI, certainly, but in API results, I don't think so. For two reasons:

  1. Tools that process data should be able to easily spot such kind of data. t\d+ is easy enough, "unknown value" in 200 languages may be much harder.
  2. These IDs actually have meaning, even if rather obscure and mostly useless one. Bnodes are not all equal - different bnodes are different things. If we replace them all with the same string, we will create an illusion that the values are actually equal, and for some tools that do not know about string "unknown value" being special (in all 200 languages) it may mislead them into thinking two bnodes are equal, which is almost always wrong.
Smalyshev renamed this task from Convert blank nodes to “unknown value” in label service to Convert blank nodes to “unknown value” .Aug 13 2017, 7:29 PM
Smalyshev triaged this task as Medium priority.

Isn’t the label service a kind of GUI-like thing already? Tools that process data should primarily look at the value itself, not at the label, and the same goes for tools that are actually interested in the bnode’s identity: they can all still select the regular variable (in the example query, ?type) and continue to use it as before.

In fact, any tool that looks at the label to distinguish bnodes, or to decide that t\d+ must mean “unknown value”, is broken already, since any regular entity can also have such a label.

True, but it is also possible that the label is "unknown value", so it also makes it potentially misleading. "Special" values are always problematic...

Yes, but the point of the label isn’t to be unambiguous anyways.

I see that the response is

		<result>
			<binding name='type'>
				<bnode>t1514691780</bnode>
			</binding>
			<binding name='typeLabel'>
				<literal>t1514691780</literal>
			</binding>
		</result>

Would that work if the API returns again a blank node instead of trying to deal with the string?

		<result>
			<binding name='type'>
				<bnode>t1514691780</bnode>
			</binding>
			<binding name='typeLabel'>
				<bnode>t1514691780</bnode>
			</binding>
		</result>

The UI could do something special when it encounters blank nodes.