Page MenuHomePhabricator

Add normalized predicates to Blazegraph vocabulary
Closed, ResolvedPublic

Description

Currently, Blazegraph vocabulary does not include normalized prefixes (like psn) and quantityNormalized predicate. This makes such properties and predicates take more space and be slightly slower to handle then their peers. They should be added to the vocabulary.

This will require full data reload after merge, so it may be a good idea to time this to a planned data reload.

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Smalyshev triaged this task as Medium priority.Sep 14 2017, 6:36 PM

The list of the things missing:

Prefixes:

PREFIX psn: <http://www.wikidata.org/prop/statement/value-normalized/>
PREFIX pqn: <http://www.wikidata.org/prop/qualifier/value-normalized/>
PREFIX prn: <http://www.wikidata.org/prop/reference/value-normalized/>

Predicates and URIs:
schema:name
schema:isPartOf
wikibase:quantityNormalized
wikibase:TimeValue
wikibase:QuantityValue
wikibase:GlobecoordinateValue
wikibase:badge
wikibase:Property
wikibase:wikiGroup
wikibase:statements
wikibase:sitelinks

For categories:
mediawiki:Category
mediawiki:isInCategory

Maybe also (used only with properties, may not be worth it):
wikibase:statementValueNormalized
wikibase:qualifierValueNormalized
wikibase:referenceValueNormalized
wikibase:directClaimNormalized

The tricky part here is that once we deploy the change, we'd need to reload immediately (since the dictionary would be incompatible). But we don't want to take all servers down at the same time. So we need a way to run server with different configs for a while. @Gehel, do you have any ideas how to do it best?

Change 383777 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/rdf@master] Add more declarations to vocabulary.

https://gerrit.wikimedia.org/r/383777

Also candidates for inlining are these third-party URIs:

2971223	P1566	[1-9]\d{0,8}	http://sws.geonames.org/$1/
145480	P662	([1-9]\d{0,8}|)	http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID$1
125422	P661	[1-9]\d*	http://rdf.chemspider.com/$1

Change 384114 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Temporarily silence noisy warnings for dictionary upgrade

https://gerrit.wikimedia.org/r/384114

The tricky part here is that once we deploy the change, we'd need to reload immediately (since the dictionary would be incompatible). But we don't want to take all servers down at the same time. So we need a way to run server with different configs for a while. @Gehel, do you have any ideas how to do it best?

Is it different config or different code? As I read https://gerrit.wikimedia.org/r/#/c/383777/, I understand that the new vocabulary is deployed as part of Blazegraph code. We might be using different definitions of "config"... In my vocabulary, config is something deployed externally to balzegraph, and that shoudl be deployed with Puppet, and can easily be parameterized for each server, via hiera (yes, I simplify a bit).

In this case, if I understand correctly, we need something like:

for each server n:

  • depool server n
  • update code
  • data reimport import
  • pool server n

And we can probably parallelize eqiad and codfw.

This should not be an issue, we can restrict scap to deploy to only a subset of servers...

Change 384114 merged by Gehel:
[operations/puppet@production] Temporarily silence noisy warnings for dictionary upgrade

https://gerrit.wikimedia.org/r/384114

In this case, if I understand correctly, we need something like:

Yes, that sounds correct.

Change 383777 merged by jenkins-bot:
[wikidata/query/rdf@master] Add more declarations to vocabulary.

https://gerrit.wikimedia.org/r/383777

Change 389759 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/rdf@master] Use new dictionary (V003)

https://gerrit.wikimedia.org/r/389759

Change 389759 merged by jenkins-bot:
[wikidata/query/rdf@master] Use new dictionary (V003)

https://gerrit.wikimedia.org/r/389759

Smalyshev claimed this task.