Page MenuHomePhabricator

[Task] prepare bot for quantity change fixes
Closed, ResolvedPublic

Description

We are introducing "unbounded" quantities, see T115269. This allows more intuitive interaction: if no bounds are entered, no bounds are shown. If bounds were entered, they are always shown. Also, we changed the auto-detection of uncertainty based on the decimal representation to be +/-0.5 of the magnitude of the least significant digit, instead of +/-1 of that magnitude, see T140997.

We want a bot to address both issues:

Strip +/-0

In the past, bounds where hidden if they were the same as the nominal value (that is, +/-0 uncertainty, indicating an absolutely exact value). Users took advantage of this to suppress display of bounds, even in situations where +/-0 was semantically inappropriate (e.g. for measurements, which are never absolutely exact).

This (ab)use of +/-0 should be fixed by replacing all quantities with +/-0 uncertainty by an unbounded quantity in statements that represent measured or estimated value. Bot runs should be done per-property.

For statements that claim exact counts (number of planets, of atoms, of members of parliament, etc), +/-0 is technically correct, but can be omitted because it is implicit. Whether such statements should be converted to using unbounded quantities is up to the community.

Strip +/-1

In the past, when no uncertainty was explicitly entered, we estimated an uncertainty of +/-1 times the magnitude of the least significant digit, and stored this. These bounds can lead to incorrect rounding, see T95425. To resolve this, quantities with such bounds should be replaced with unbounded quantities. Note that we cannot know whether the +/-1 was entered explicitly or not, and thus may lose some legitimate bounds, and replace them by "unknown". This is however likely to be rare, at least much rarer than +/-1 actually standing for unknown bounds.

In any case, this should also only be applied per-property, after community discussion.

Values

This should cover values in qualifiers and statements.

Event Timeline

If you want to bot to change the +/-0 (for unitless quantities) and other parts upon community consensuses to a new type called "unbounded" quantities and if there's API to do that (I can find the API, just need the confirmation before checking) . It is super straightforward. I'm just double checking to see If I understand the task correctly.

@Ladsgroup close. Unbounded quantities are not a separate data type. They are just quantities with the upperBound and lowerBound not set.

The bot should have two modes: convert +/-0 to unbounded, and convert the old default to unbounded (the old default was +/-1 for integers, but more generally, +/- the order of magnitude of the least significant digit). It should do this for all statements about a given property, and we'd need community consensus for each property (or for a list of properties).

The relevant statements can be found via SPARQL. For now, I think it's sufficient to only do this for main snaks, not qualifiers, I think.

No extra API is needed for the update, you can use wbsetclaimvalue, wbsetclaim, or wbeditentity.

I guess I wasn't clear, that's definitely not a new data type :D I meant a new type of boundary. It seems doable. Just we need an API to do that.

I hope it's okay that I take over this task.

The SPARQL query to find it times out every time. I guess we should work on reading dumps.

(e.g. for measurements, which are never absolutely exact).

Measurements can be exact. Two examples:

  • Speed of light is exactly 299,792,458 meters / second. Because definition of meter is derived from the speed of light.
  • Spin of electron is exactly half of Plank's constant.

What I mean is that being exact or not is highly dependent on unit measured and that needs to be taken into consideration.

Just a note that pywikibot currently defaults to +/-0 so it might be an idea to give that some time to be patched (T150210) (also to the pip version) and announced before running a bot to avoid it "contaminating" the cleaned data afterwards.
(this is me guessing pywikibot is a rather large source of added data on Wikidata, if not then no worries).

(e.g. for measurements, which are never absolutely exact).

Measurements can be exact. Two examples:

  • Speed of light is exactly 299,792,458 meters / second. Because definition of meter is derived from the speed of light.

That's a definition, not a measurement.

So yea, not all values of the speed property are going to be actual measurements; there can be absolutely exact values for speed that are not measurements. But that's a rare edge case, rather than the rule. I cannot think of a property where it would be equally likely for the value to be a measurement or a definition.

  • Spin of electron is exactly half of Plank's constant.

That's a theoretical result (confirmed by inexact measurements). But spin is kind of special anyway, since it's not a continuum, as far as I know.

What I mean is that being exact or not is highly dependent on unit measured and that needs to be taken into consideration.

You are right that it is not necessarily the same for all Statements for a given property - speed is not always a measurement. So stripping +/-0 from all values for speed is going to be wrong in some cases. But it's going to be correct in far more cases. The cases in which the speed is not a measurement can probably be found and managed by hand, since it's a rare case .

Just a note that pywikibot currently defaults to +/-0

That's the default, really? Oh no :(

I must note that in this case you need to remove the LIMIT 10 before run.

@Lydia_Pintscher I'm afraid querying "for all values" may be a bit tricky... What exactly we're trying to do here? Not sure I understand completely.

We certainly have a way to run very long queries manually without the 30s time limit.

Also, I don't see what SERVICE is doing in that query. Seems to be unnecessary.

I've made the bot and you can find the source code in here.
It uses the generic pywikibot page generator so we can feed result of SPARQL query or read dumps and use that. (Proper DI I guess) Right now it errors out because pywikibot doesn't accept "None" for upperbound/lowerbound value. Fixing it would be easy. Once the changes got deployed we can run this.

I've made the bot and you can find the source code in here.
It uses the generic pywikibot page generator so we can feed result of SPARQL query or read dumps and use that. (Proper DI I guess) Right now it errors out because pywikibot doesn't accept "None" for upperbound/lowerbound value. Fixing it would be easy. Once the changes got deployed we can run this.

There is a patch at https://gerrit.wikimedia.org/r/320649 and associated disscusion in T150210

Ladsgroup moved this task from In progress to Done on the User-Ladsgroup board.

Under what account does the bot do the edits?

Is this doing values in qualifiers?

Under what account does the bot do the edits?

I'm running it. It's under the name of Dexbot

Lydia_Pintscher moved this task from Doing to Done on the Wikidata-Former-Sprint-Board board.

Closing this as the bot is running for a while already and it seems things are going fine.