Page MenuHomePhabricator

Agree on properties used for parts of math expressions
Closed, ResolvedPublic

Description

Currently the has part property is used to determine the parts of mathematical expressions which is then used on special pages that explain the constituents of math expressions.
However it seems that Wolfram Alpha needs a different format to harvest data from wikidata? Therefore additional properties were created and much more is available in the new format.

For the math extension it is not essential if the has part or another property is used for the display of the constituents. However for performance reasons the number of requests to wikidata should be limited to 1?

We need to define a method to resolve this conflict of interests.

Event Timeline

Modelling of Wikidata's data is typically discussed on Wikidata. Therefore I propose to close this task and continue the discussion there.
Regarding the alleged conflict of interest I'd like to state that I worked out the proposal for and participated in the migration to the property "symbol represents" in my free time, not as part of my job at Wolfram. The way formulas are modeled on Wikidata has no influence on any operations at Wolfram.

Side Note:

Modelling of Wikidata's data is typically discussed on Wikidata. Therefore I propose to close this task and continue the discussion there.

Developing the math extension is normally discussed here:-) Anyhow, maybe there is not too much to discuss and we just need to change some config values.

For the time being, it would be great to keep the demo values for "mass-energy equivalence" with has part properties.

Regarding the alleged conflict of interest I'd like to state that I worked out the proposal for and participated in the migration to the property "symbol represents" in my free time, not as part of my job at Wolfram. The way formulas are modeled on Wikidata has no influence on any operations at Wolfram.

That is good to hear. So we are free to do anything without the fear that we breaking existing systems.

Side Note:

There is a ton of properties in https://www.wikidata.org/wiki/Wikidata:WikiProject_Mathematics and the discussions are quite involved as far as I can see from a first glance. At the same time @Andreg-p is currently improving the special page to actually show popups. Locally it already works (see screenshot attached)

2022-07-14 17.04.08.jpg (720×1 px, 146 KB)

if we can replace P527 with P9758 and everything works as it did before, we can close this issue right away?

Is this https://www.wikidata.org/wiki/Q30204 a guiding example of how the data should be modeled? If not can you point us to one?

When looking at https://www.wikidata.org/wiki/Q35875 which has the old and the new format at the same time which is great for this discussion development and testing I see that the role of the qualifier and property value has been swapped. @Andreg-p would that mean that the order of the part to be defined and the definition would be swapped, i.e, instead of

E energy
m mass

it would become

energy E
mass m

or will this swap cause additional efforts with the implementation?

Moreover, it is not clear to me if either way has advantages or disadvantages from an ontologic viewpoint.

I confirm that https://www.wikidata.org/wiki/Q30204 is modeled correctly using the "new" scheme.

if we can replace P527 with P9758 and everything works as it did before, we can close this issue right away?

This seems to be possible. The only issue I see is how the identifiers are given. Currently we use P416 to define the identifier for an item in P527. For example, P527 contains Q11379 (energy) and uses P416 (quantity symbol) to set E as the representative identifier. If we use the same setup for P9758 (symbol represents), we only need to update the config and that's it.

@Andreg-p would that mean that the order of the part to be defined and the definition would be swapped, i.e, instead of

E energy
m mass

it would become

energy E
mass m

or will this swap cause additional efforts with the implementation?

No that won't happen. The problem that does not work is P7235 (in defining formula). This property defines the datatype "mathematical expression". The other two (P527 and P9758) use "items". So if we use P7235 (in defining formula), we need to write new code. Updating the config would not be sufficient.

I confirm that https://www.wikidata.org/wiki/Q30204 is modeled correctly using the "new" scheme.

This uses P7235 (in defining formula) and not the P9758 discussed by @Physikerwelt.
I don't really see why P7235 (in defining formula) exists. We have now 2 contradicting setups.

  1. P527 (has part or parts) links directly to an item and uses an extra property (currently P416 (quantity symbol)) to define the identifier.
  2. P7235 (in defining formula) links directly to an identifier (math expression) and uses an extra property (currently P9758 (symbol represents)) to link the identifier to an item.

The only advantage of 2 over 1 is that P416 (quantity symbol) is using a string and not a mathematical expression. Here P9758 (symbol represents) is clearly better.

@Physikerwelt just to clearify again.
Option 1 works with our popup. Option 2 does not and would require new code (updating the config would not be sufficient).

I confirm that https://www.wikidata.org/wiki/Q30204 is modeled correctly using the "new" scheme.

This uses P7235 (in defining formula) and not the P9758 discussed by @Physikerwelt.

It uses both. See the description of P9758

qualifier for "in defining formula" (P7235) which indicates the quantity or operator represented by a symbol in the "defining formula" (P2534)

I don't really see why P7235 (in defining formula) exists. We have now 2 contradicting setups.

The only advantage of 2 over 1 is that P416 (quantity symbol) is using a string and not a mathematical expression. Here P9758 (symbol represents) is clearly better.

This is not a real advantage. In 1 one can use P2534 instead of P416 to use math expressions. I do not feel qualified to judge which one is better. As @Toni_001 suggests the discussion of how the data in Wikidata is organized is not under the control of this discussion space. I would like to focus on how to get the popups most useful for the users and to pave the grounds to improve the accessibility of math in Wikipedia so that people with limited or no vision will no longer be excluded.

@Physikerwelt just to clearify again.
Option 1 works with our popup. Option 2 does not and would require new code (updating the config would not be sufficient).

How much effort would option 2 be? If we can get significantly more support for people with limited vision in Wikipedia this would be an argument to spend some effort. Otherwise, we have the popup, and people trying to use it get into trouble and experience reverted edits, etc. This would frustrate editors and the popup would not be used. There seems to be a very active community that reverts edits to enable popups in Wikipedia quickly.

@Physikerwelt Supporting option 2 would not be too complicated. The issue I see is that in the near future, somebody comes up with a new idea on how to represent the elements of a formula and we need implement another option again. This is very much a never ending story where we (our Math extension) is always trying to catch the new trend on Wikidata to keep things running. And the people on Wikidata are not aware of our system that depends on the correct setup and use.

I suggest, we implement both options for now (Option 1 and 2) so that our popup works for both. We just need a decision on cases where both options are in place (I know this will be usually not the case). And we should discuss how we handle future issues because they will almost certainly come up again.

@Physikerwelt Supporting option 2 would not be too complicated. The issue I see is that in the near future, somebody comes up with a new idea on how to represent the elements of a formula and we need implement another option again. This is very much a never ending story where we (our Math extension) is always trying to catch the new trend on Wikidata to keep things running. And the people on Wikidata are not aware of our system that depends on the correct setup and use.

Yes, it is a never-ending story. However, I would expect that it will take 3-5 years prior to the next change. So I don't see this as a huge problem. We can add constraints and documentation to the properties we use to reduce the problem, but the data model will evolve and not stay constant forever.

I suggest, we implement both options for now (Option 1 and 2) so that our popup works for both. We just need a decision on cases where both options are in place (I know this will be usually not the case). And we should discuss how we handle future issues because they will almost certainly come up again.

Given the statistics provided by @Toni_001 I have the impression that it would be good enough to replace Option 1 with Option 2. Otherwise, the Math extension code will become too complex.

Just a small tip. If you use backtick P## backtick, you get P10 and you avoid creating links to all of these paste objects :)

@TheDJ sorry for that and thanks for the tip.

@Physikerwelt it turned out that the current implementation was working in a way that only required very minimal changes in order to support Option 1 and 2 because both are essentially the same just reversed order. Since the current code does not care about the order, the update only required updating two if clauses. That's it.

I implemented it already in pushed the update in: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/710329/

Oh that other change is already big. Maybe you can do a separate change with just the two if statements. I could merge that independent of the popups discussion?

Change 814156 had a related patch set uploaded (by AndreG-P; author: AndreG-P):

[mediawiki/extensions/Math@master] Support new properties 'Symbol Represents' and 'In Defining Formula'

https://gerrit.wikimedia.org/r/814156

@Toni_001 FYI: You can see the proposed patch here. Even if you are not a PHP developer, feel free to add comments or questions either here on using the Gerrit web interface.

Hello.
I agree with the suggestion above to support only the "new" way to describe variables. After all, the reason to introduce the "new" system was to deprecate the previous two competing styles, and then have only one going forward. The remaining "has part"-based variable explanations are mostly there because some editors kept reintroducing them. But eventually I think they'll go away - especially if there's support from the math extension maintainers.
I also share the hope that for the next five years at least there won't be any changes to the "new" system.

Change 814156 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Support new properties 'Symbol Represents' and 'In Defining Formula'

https://gerrit.wikimedia.org/r/814156