Provide a way to add new unit normalizations to the query service without a full reload
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	daniel
	Sep 12 2016, 5:43 PM

Description

We want to be able to add normalization for additional units without needing a full reload.

Note that we don't need to be able to change an existing conversion; that would be nice to have, bit it's not a requirement.

Possible implementation strategy, option I:

Find all values referencing the respective unit using a SPARQL query
Find the statements using that value, and the items using that statement

...option II:

Find all values referencing the respective unit using a SPARQL query
Find the statements using that value
Compute the normalized value using the new mapping, and add them to the triple store.

...option II|:

Find all statement values referencing the respective unit while scanning a JSON dump
Compute the normalized value using the new mapping, and output them (as n-triples or turtle).
Bulk-load the new triples into the query service

Details

	Subject	Repo	Branch	Lines +/-
	Script to produce RDF mappings for new normalized units	mediawiki/extensions/Wikibase	wmf/1.28.0-wmf.23	+724 -45
	Script to produce RDF mappings for new normalized units	mediawiki/extensions/Wikibase	master	+724 -45

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T88728 Improve Wikimedia dumping infrastructure
Open		None	T88991 improve Wikidata dumps [tracking]
Open		None	T46581 Partial dumps
Resolved		Smalyshev	T46580 Script for creating RDF dumps of all entities
Duplicate		None	T211495 Dump(s) of Wikidata classes
Duplicate		None	T211497 Dump(s) of Wikidata instances of Q5
Open		None	T162351 Create a "page prop" RDF dump for Wikidata entities ("pagePropertiesRdf")
Open		None	T98320 [Task] Create dump of entity redirects (JSON or n-triples)
Open		None	T285307 Create randomly split partial entity dumps
Open		None	T44063 [Epic] Provide a plain linked data interface for accessing entities
Resolved		hoo	T101837 [Story] switch default rdf format to full (include statements)
Open		None	T50143 Implement complete RDF mapping for entities (tracking)
Open		None	T56318 Quantity datatype (tracking)
Resolved	Feature	Michael	T68580 Better support for exact values in Quantity DataType
Resolved		daniel	T140997 When guessing quantity precision from digits, use the +/-0.5 rule rather than the +/-1 rule
Open		None	T133042 Quantity datatype precision (tracking)
Resolved		Lydia_Pintscher	T115269 [Story] When a Quantity is entered with no uncertainty/bounds given, do not guess uncertainty/bounds until needed.
Resolved		thiemowmde	T115270 Allow QuantityValue objects to have empty/no bounds set
Resolved		daniel	T105623 [Task] Investigate quantification of quantity precision (+/- 1 or +/- 0.5)
Open		None	T77977 [Epic] Unit support
Stalled		None	T111770 [Story] Decide how to represent quantities with units in the "truthy" RDF mapping
Stalled		None	T115934 Represent simple values with units in RDF
Resolved		Smalyshev	T117031 Represent normalized unit values in full values RDF
Open		None	T145424 Represent normalized unit values in simple values RDF
Open		None	T77978 [Story] Support unit conversion
Resolved		Smalyshev	T117032 Create configuration for specifying units conversions
Resolved		Smalyshev	T145426 Provide a way to add new unit normalizations to the query service without a full reload
Resolved		Smalyshev	T144768 Decide how to enable/disable unit conversions in flavors
Invalid		None	T150656 Light-year and gigaparsec do not show up in the query for the longest things on Wikidata.
Resolved		daniel	T150877 Unit conversion does not create psn:P2043 for Q531
Resolved		Smalyshev	T150881 add support for additional dimensions for unit conversion
Open		None	T234809 Support normalized quantities in Lua
Resolved		Lydia_Pintscher	T142086 [Task] announce quantity changes
Resolved		Ladsgroup	T142087 [Task] prepare bot for quantity change fixes
Open		None	T112247 [RFC] Create a "number" datatype for exact values
Resolved		Ladsgroup	T154168 Quantity changes broke ORES

Event Timeline

daniel created this task.Sep 12 2016, 5:43 PM

daniel mentioned this in T117032: Create configuration for specifying units conversions.Sep 12 2016, 5:50 PM

daniel updated the task description. (Show Details)Sep 12 2016, 6:16 PM

Smalyshev moved this task from Incoming to Current work on the Wikidata-Query-Service board.Sep 12 2016, 7:52 PM

Smalyshev moved this task from Current work to SDAW on the Wikidata-Query-Service board.Sep 12 2016, 10:36 PM

Smalyshev claimed this task.Sep 24 2016, 12:02 AM

Smalyshev added a project: Discovery-Wikidata-Query-Service-Sprint.

Right now I think this should be the plan:

The tool gets two .json config files - new config and old config. Old config can be optional. Then:

Diff the configs and produce list of new units
For each new primary unit:
1. Run SPARQL query to find all values using it and generate self-referencing normalized statements with wikibase:quantityNormalized
2. Run SPARQL query to find all statements using those values (need to see if we have too many we may have to split it in batches) and generate parallel normalized statements for those, with the same value.
For each new non-primary unit:
1. Run SPARQL query to find all values using it and generate new converted value for each one. Generate SPARQL for those new values and also wikibase:quantityNormalized statements on the old values.
2. Run SPARQL query to find all statements using those values and generate parallel normalized statements for those, with the new converted value.

The output of the tool will be RDF/TTL that can be bulk-loaded into the instance.

We need to see if we will be able to hold all the values described in memory. So far the most popular unit - square kilometre - has 13398 usages, it should not be a problem to hold all of them in memory I think.

Smalyshev moved this task from Backlog to In progress on the Discovery-Wikidata-Query-Service-Sprint board.Sep 24 2016, 12:25 AM

Change 312627 had a related patch set uploaded (by Smalyshev):
Script to produce RDF mappings for new normalized units

https://gerrit.wikimedia.org/r/312627

gerritbot added a project: Patch-For-Review.Sep 24 2016, 4:55 AM

Smalyshev moved this task from In progress to Needs review on the Discovery-Wikidata-Query-Service-Sprint board.Oct 2 2016, 9:15 PM

Change 312627 merged by jenkins-bot:
Script to produce RDF mappings for new normalized units

https://gerrit.wikimedia.org/r/312627

Change 319402 had a related patch set uploaded (by Smalyshev):
Script to produce RDF mappings for new normalized units

https://gerrit.wikimedia.org/r/319402

Smalyshev closed this task as Resolved.Nov 2 2016, 11:39 PM

Change 319402 abandoned by Smalyshev:
Script to produce RDF mappings for new normalized units

Reason:
we can do it without backporting

https://gerrit.wikimedia.org/r/319402

Smalyshev removed a project: Discovery-Wikidata-Query-Service-Sprint.Jul 14 2017, 10:31 PM

Provide a way to add new unit normalizations to the query service without a full reloadClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Provide a way to add new unit normalizations to the query service without a full reload
Closed, ResolvedPublic
Actions

Related Objects
Search...