Maniphest T142087

[Task] prepare bot for quantity change fixes
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Lydia_Pintscher
	Aug 4 2016, 1:41 PM

Description

We are introducing "unbounded" quantities, see T115269. This allows more intuitive interaction: if no bounds are entered, no bounds are shown. If bounds were entered, they are always shown. Also, we changed the auto-detection of uncertainty based on the decimal representation to be +/-0.5 of the magnitude of the least significant digit, instead of +/-1 of that magnitude, see T140997.

We want a bot to address both issues:

Strip +/-0

In the past, bounds where hidden if they were the same as the nominal value (that is, +/-0 uncertainty, indicating an absolutely exact value). Users took advantage of this to suppress display of bounds, even in situations where +/-0 was semantically inappropriate (e.g. for measurements, which are never absolutely exact).

This (ab)use of +/-0 should be fixed by replacing all quantities with +/-0 uncertainty by an unbounded quantity in statements that represent measured or estimated value. Bot runs should be done per-property.

For statements that claim exact counts (number of planets, of atoms, of members of parliament, etc), +/-0 is technically correct, but can be omitted because it is implicit. Whether such statements should be converted to using unbounded quantities is up to the community.

Strip +/-1

In the past, when no uncertainty was explicitly entered, we estimated an uncertainty of +/-1 times the magnitude of the least significant digit, and stored this. These bounds can lead to incorrect rounding, see T95425. To resolve this, quantities with such bounds should be replaced with unbounded quantities. Note that we cannot know whether the +/-1 was entered explicitly or not, and thus may lose some legitimate bounds, and replace them by "unknown". This is however likely to be rare, at least much rarer than +/-1 actually standing for unknown bounds.

In any case, this should also only be applied per-property, after community discussion.

Values

This should cover values in qualifiers and statements.

Related Objects
Search...

Status	Assigned	Task
Open	None	T56318 Quantity datatype (tracking)
Open	None	T133042 Quantity datatype precision (tracking)
Resolved	Lydia_Pintscher	T115269 [Story] When a Quantity is entered with no uncertainty/bounds given, do not guess uncertainty/bounds until needed.
Resolved	Lydia_Pintscher	T142086 [Task] announce quantity changes
Resolved	Ladsgroup	T142087 [Task] prepare bot for quantity change fixes

Event Timeline

Lydia_Pintscher created this task.Aug 4 2016, 1:41 PM

daniel updated the task description. (Show Details)Aug 8 2016, 10:33 AM

Pasleim subscribed.Aug 9 2016, 2:23 PM

Edgars2007 subscribed.Aug 9 2016, 3:56 PM

@Ladsgroup Want to look into this?

If you want to bot to change the +/-0 (for unitless quantities) and other parts upon community consensuses to a new type called "unbounded" quantities and if there's API to do that (I can find the API, just need the confirmation before checking) . It is super straightforward. I'm just double checking to see If I understand the task correctly.

@Ladsgroup close. Unbounded quantities are not a separate data type. They are just quantities with the upperBound and lowerBound not set.

The bot should have two modes: convert +/-0 to unbounded, and convert the old default to unbounded (the old default was +/-1 for integers, but more generally, +/- the order of magnitude of the least significant digit). It should do this for all statements about a given property, and we'd need community consensus for each property (or for a list of properties).

The relevant statements can be found via SPARQL. For now, I think it's sufficient to only do this for main snaks, not qualifiers, I think.

No extra API is needed for the update, you can use wbsetclaimvalue, wbsetclaim, or wbeditentity.

I guess I wasn't clear, that's definitely not a new data type :D I meant a new type of boundary. It seems doable. Just we need an API to do that.

Ladsgroup claimed this task.Aug 22 2016, 9:26 PM

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptAug 22 2016, 9:26 PM

I hope it's okay that I take over this task.

Ladsgroup moved this task from Incoming to In progress on the User-Ladsgroup board.Aug 22 2016, 9:26 PM

The SPARQL query to find it times out every time. I guess we should work on reading dumps.

(e.g. for measurements, which are never absolutely exact).

Measurements can be exact. Two examples:

Speed of light is exactly 299,792,458 meters / second. Because definition of meter is derived from the speed of light.
Spin of electron is exactly half of Plank's constant.

What I mean is that being exact or not is highly dependent on unit measured and that needs to be taken into consideration.

Ladsgroup moved this task from In progress to Later on the User-Ladsgroup board.Oct 18 2016, 5:57 PM

Lydia_Pintscher moved this task from incoming to consider for next sprint on the Wikidata board.Nov 5 2016, 10:01 AM

Just a note that pywikibot currently defaults to +/-0 so it might be an idea to give that some time to be patched (T150210) (also to the pip version) and announced before running a bot to avoid it "contaminating" the cleaned data afterwards.
(this is me guessing pywikibot is a rather large source of added data on Wikidata, if not then no worries).

Ladsgroup added a project: WMDE-TLA-Team.Nov 8 2016, 12:26 AM

WMDE-leszek moved this task from proposed to accepted on the WMDE-TLA-Team board.Nov 8 2016, 1:51 PM

Jakob_WMDE moved this task from consider for next sprint to in progress on the Wikidata board.Nov 8 2016, 1:55 PM

Ladsgroup moved this task from Later to In progress on the User-Ladsgroup board.Nov 8 2016, 4:06 PM

In T142087#2610204, @Ladsgroup wrote:

(e.g. for measurements, which are never absolutely exact).

Measurements can be exact. Two examples:

Speed of light is exactly 299,792,458 meters / second. Because definition of meter is derived from the speed of light.

That's a definition, not a measurement.

So yea, not all values of the speed property are going to be actual measurements; there can be absolutely exact values for speed that are not measurements. But that's a rare edge case, rather than the rule. I cannot think of a property where it would be equally likely for the value to be a measurement or a definition.

Spin of electron is exactly half of Plank's constant.

That's a theoretical result (confirmed by inexact measurements). But spin is kind of special anyway, since it's not a continuum, as far as I know.

What I mean is that being exact or not is highly dependent on unit measured and that needs to be taken into consideration.

You are right that it is not necessarily the same for all Statements for a given property - speed is not always a measurement. So stripping +/-0 from all values for speed is going to be wrong in some cases. But it's going to be correct in far more cases. The cases in which the speed is not a measurement can probably be found and managed by hand, since it's a rare case .

In T142087#2778009, @Lokal_Profil wrote:

Just a note that pywikibot currently defaults to +/-0

That's the default, really? Oh no :(

Lokal_Profil mentioned this in T150210: Make WbQuantity handle case without errors.Nov 10 2016, 9:17 AM

Ladsgroup moved this task from accepted to doing on the WMDE-TLA-Team board.Nov 14 2016, 11:32 AM

Here is a query that times out: https://query.wikidata.org/#SELECT%0A%3Fitem%20%3Fproperty%20%3Fstatement%0A%3FupperBound%20%3FlowerBound%0AWHERE%20%7B%0A%20%20%3FdataValueId%20wikibase%3AquantityUpperBound%20%3FupperBound%20.%0A%20%20%3FdataValueId%20wikibase%3AquantityUpperBound%20%3FlowerBound%20.%0A%20%20FILTER%20%28%3FupperBound%20%3D%20%3FlowerBound%29%20.%0A%20%20%3Fstatement%20%3FvalueRef%20%3FdataValueId%20.%0A%20%20%3Fitem%20%3Fproperty%20%3Fstatement%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%0A%7D%0ALIMIT%2010

@hoo @Smalyshev Is there a way we can run this internally once without the timeout?

I must note that in this case you need to remove the LIMIT 10 before run.

@Lydia_Pintscher I'm afraid querying "for all values" may be a bit tricky... What exactly we're trying to do here? Not sure I understand completely.

We certainly have a way to run very long queries manually without the 30s time limit.

Also, I don't see what SERVICE is doing in that query. Seems to be unnecessary.

I've made the bot and you can find the source code in here.
It uses the generic pywikibot page generator so we can feed result of SPARQL query or read dumps and use that. (Proper DI I guess) Right now it errors out because pywikibot doesn't accept "None" for upperbound/lowerbound value. Fixing it would be easy. Once the changes got deployed we can run this.

In T142087#2798190, @Ladsgroup wrote:

I've made the bot and you can find the source code in here.
It uses the generic pywikibot page generator so we can feed result of SPARQL query or read dumps and use that. (Proper DI I guess) Right now it errors out because pywikibot doesn't accept "None" for upperbound/lowerbound value. Fixing it would be easy. Once the changes got deployed we can run this.

There is a patch at https://gerrit.wikimedia.org/r/320649 and associated disscusion in T150210

Ladsgroup moved this task from doing to done on the WMDE-TLA-Team board.Nov 17 2016, 9:15 AM

Ladsgroup moved this task from In progress to Done on the User-Ladsgroup board.

Ladsgroup edited projects, added Wikidata-Former-Sprint-Board; removed WMDE-TLA-Team.Nov 18 2016, 10:32 AM

Ladsgroup moved this task from Proposed to Done on the Wikidata-Former-Sprint-Board board.Nov 18 2016, 10:33 AM

Wesalius subscribed.Nov 21 2016, 4:21 PM

Lydia_Pintscher moved this task from Done to Doing on the Wikidata-Former-Sprint-Board board.Nov 29 2016, 1:20 PM

Under what account does the bot do the edits?

Ladsgroup moved this task from Done to In progress on the User-Ladsgroup board.Nov 29 2016, 9:15 PM

Is this doing values in qualifiers?

daniel added a project: User-Daniel.Dec 6 2016, 4:50 PM

Nikki subscribed.Dec 6 2016, 5:55 PM

Esc3300 updated the task description. (Show Details)Dec 7 2016, 2:29 PM

In T142087#2831356, @Wesalius wrote:

Under what account does the bot do the edits?

I'm running it. It's under the name of Dexbot

daniel moved this task from Inbox to Revisit on the User-Daniel board.Jan 5 2017, 6:59 PM

Ladsgroup moved this task from In progress to Blocked on others on the User-Ladsgroup board.Jan 22 2017, 3:02 AM

Ladsgroup moved this task from Blocked on others to In progress on the User-Ladsgroup board.Jan 27 2017, 2:34 PM

Jc3s5h subscribed.Jan 30 2017, 4:22 PM

Closing this as the bot is running for a while already and it seems things are going fine.

Ladsgroup moved this task from In progress to Done on the User-Ladsgroup board.Apr 9 2017, 9:32 AM

[Task] prepare bot for quantity change fixesClosed, ResolvedPublicActions