Page MenuHomePhabricator

For consistency MediaInfo serialization should use "claims" as key, rather than "statements"
Open, HighPublic

Description

This should be changed in order to stay consistent with the Item/ Property serialization which both use "claims".

This topic came up during the initial development of Lexem (De-)Serializers.

Event Timeline

hoo created this task.Oct 28 2016, 9:29 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 28 2016, 9:29 AM
Addshore added a subscriber: Addshore.

This is in MediaInfoSerializer::getSerialized

		$serialization['statements'] = $this->statementListSerializer->serialize(
			$mediaInfo->getStatements()
		);

This is already now going to cause an issue as we now have entities in the commons DB including serialization with a "statements" key, so we will have to have a compat layer.

Magnus added a subscriber: Magnus.Jun 20 2019, 12:15 PM

Magnus ran into this and this is very confusing and inconsistent. The longer we wait with fixing this, the more effort it will cost. Example https://commons.wikimedia.org/w/api.php?action=wbgetentities&ids=M27401711 uses "statements" as the key, but we expect "claims" as the key, like https://commons.wikimedia.org/w/api.php?action=wbgetentities&ids=Q422341 . This is also consistent with the api calls like https://commons.wikimedia.org/w/api.php?action=help&modules=wbcreateclaim , https://commons.wikimedia.org/w/api.php?action=wbgetclaims&entity=M27401711 , https://commons.wikimedia.org/w/api.php?action=help&modules=wbremoveclaims and https://commons.wikimedia.org/w/api.php?action=help&modules=wbsetclaim

By the way, I noticed another difference: The imageinfo output doesn't contain the line "datatype": "wikibase-item". Not sure if this is related or not?

Another inconsistency pointed out by @Lucas_Werkmeister_WMDE is at T222159

https://commons.wikimedia.org/wiki/File:Wikidata_statement.svg might be useful for the relation between a claim and a statement.

Magnus ran into this and this is very confusing and inconsistent. The longer we wait with fixing this, the more effort it will cost.

Totally agree there, we have already put this off for years.
It's a pretty breaking change though.
But with the correct announcement and time that would all be fine, and even if people miss it, fixing their code should be trivial.

Thoughts @Lydia_Pintscher ?

Can't we do a smart trick with showing it twice (claims and statements) in the front and storing it only once in the back? A big bang is much more complicated and riskier. You would have a timeline like:

  • Switch api read and write functions to expose both claims and statements
  • Clients can start switching to "claims"
  • Switch the backend to read it as both statements and claims, but store it as "claims"
  • Do a null edit run to get rid of the "statements" and replace it with "claims"
  • Switch backend code to only use "claims"
  • Check if no code is using statements anymore in the api
  • Switch front end to only use "claims"

Magnus ran into this and this is very confusing and inconsistent. The longer we wait with fixing this, the more effort it will cost.

Totally agree there, we have already put this off for years.
It's a pretty breaking change though.
But with the correct announcement and time that would all be fine, and even if people miss it, fixing their code should be trivial.
Thoughts @Lydia_Pintscher ?

From my side absolutely. I'm not sure why this is different in the first place.

FWIW, I have already changed my code to work with either claims or statements. Quick thoughts:

  • IMHO this change is too significant to do it "just because it's a nicer word". No one really cares what it's called, as long as you call it the same thing every time.
  • A big problem with this change that it was not announced anywhere I did see, and I'm pretty much subscribed to everything public. Give us poor volunteer devs some warning, at least
  • Also, it's inconsistent. Wikidata items and properties have claims, mediainfo items have statements. Is this going to change on Wikidata as well? Wikibase in general?
  • Other things have changes in the Commons/mediainfo implementation. datatype is missing, for one (tracked by some other issue here). Will it come back? Will it stay missing but only for mediainfo?

The problem is not that things are different, they are different needlessly (AFAICT), and unannounced.

Can't we do a smart trick with showing it twice (claims and statements) in the front and storing it only once in the back? A big bang is much more complicated and riskier.

We could do, but this would drastically increase the sizes of responses.
We could use a method similar to some mediawiki core api versions and introduce a new temporary param, for example statementsnotclaims=1.
When 1 send statements, when not set send claims, then we can slowly monitor adoption of the new format and send out warnings with the old format, trying to chase down any user agents that still use the old one before actually fully switching over?
We could also just introduce a parameter for "serializationVersion" which could basically do the same thing.
This would all be easier if we had a nice versioned API already or versioned serialization :)

T92961: [Story] Versioning in JSON output

We also have T221737: REST API Infrastructure in MediaWiki to look forward to, it might make sense to hold off on a change like this until we make a "big move" to a new API.

A big problem with this change that it was not announced anywhere I did see, and I'm pretty much subscribed to everything public. Give us poor volunteer devs some warning, at least

Hmm, you mean the fact that statements on mediainfo entities are serialized as "statements" not "claims".
I guess it is not seen as a breaking change as media info entities are new, and they can do what they want with their serialization.

Also, it's inconsistent. Wikidata items and properties have claims, mediainfo items have statements. Is this going to change on Wikidata as well? Wikibase in general?

Everything should probably change to "statements" in the long run, as having "claims" being talked about in the serialization and only the serialization is confusing. and per the definition of our data model it also doesn't make sense.

Other things have changes in the Commons/mediainfo implementation. datatype is missing, for one (tracked by some other issue here). Will it come back? Will it stay missing but only for mediainfo?

Which ticket number is this one?

Can't we do a smart trick with showing it twice (claims and statements) in the front and storing it only once in the back? A big bang is much more complicated and riskier.

We could do, but this would drastically increase the sizes of responses.

Drastically? We're talking Commons here. Before I ran a bot only 50.000 files even had claims. Nothing has references and qualifiers are only introduced this week. Everything is still tiny compared to Wikidata

We could use a method similar to some mediawiki core api versions and introduce a new temporary param, for example statementsnotclaims=1.
When 1 send statements, when not set send claims, then we can slowly monitor adoption of the new format and send out warnings with the old format, trying to chase down any user agents that still use the old one before actually fully switching over?

Commons! Not, Wikidata. What adoption? I think we currently have:

  • The stuff the SDOC wrote (front end, uploadwizard, etc.). That's easy to track and fix
  • Some stuff I wrote in which I don't have claims/statements yet
  • Some stuff Magnus wrote and he already updated his code
  • Anything else? I doubt it.

So why bother to make this more complicated than needed? Or am I missing something here?

Drastically? We're talking Commons here. Before I ran a bot only 50.000 files even had claims. Nothing has references and qualifiers are only introduced this week. Everything is still tiny compared to Wikidata

I'm talking about changing this from "claims" to "statements" on wikidata.
That would be the right thing to do, as they are statements, not claims.

Drastically? We're talking Commons here. Before I ran a bot only 50.000 files even had claims. Nothing has references and qualifiers are only introduced this week. Everything is still tiny compared to Wikidata

I'm talking about changing this from "claims" to "statements" on wikidata.
That would be the right thing to do, as they are statements, not claims.

No no no, that's not what this task is about. The scope of this task is only mediainfo on Commons and undoing the mistake of introducing statements instead of claims. Please focus on the issue at hand.

This is partly a question for the SDOC team, and if SDOC will ever have references as part of their data model.
Then there is also the question for wikidata and wikibase, which is, why are we using "claims" in the serialization?

The answer to the latter is for legacy reasons, and to avoid breaking peoples tools, if we could rename our api modules to talk about statements instead of claims, and easily change the serialization without annoying people we would, and we will at some point.
Why are we currently putting statements under a claims key? that makes no sense?

And thus, if we want to make this change in wikibase / on wikidata with items and properties, then why change sdoc to use claims, when a little way down the road we will want to then change it back to statements to be consistent with wikidata and wikibase once again?

Jdforrester-WMF renamed this task from For consistency MediaInfo serialization should use "claims" as key, rather than "statements" to For consistency, Wikibase serialization should use "statements" as key, rather than "claims", like modern Wikibase code now does.Jun 24 2019, 9:07 PM

@Jdforrester-WMF Is that an official design decision (claims=>statements)? Where was this fundamentally breaking change announced to the public?

Personally I don't care what it's called, just that it's (a) consistent and (b) announce before changed in production.

Why are we currently putting statements under a claims key? that makes no sense?

Originally, Statement was a subclass of Claim, so all statements were claims. We chose the more general term for the key, so it could accommodate all kinds of claims. Later, we found that we really always wanted Statements, we didn't find a use cases that never needed references. So Claim was dropped as a base class of Statement.

Multichill renamed this task from For consistency, Wikibase serialization should use "statements" as key, rather than "claims", like modern Wikibase code now does to For consistency MediaInfo serialization should use "claims" as key, rather than "statements".Jun 25 2019, 9:11 PM

Changed back the topic. This is a huge scope change and derailing things. As far as I see everywhere in the api we use "claims", not "statements" (also in the functions). The only inconsistency right now is mediainfo, that should be fixed. If you want to change the everything in the Wikibase API to use statements instead of claims (wbgetclaims -> wbgetstatements, etc.), file a new task so I can down vote that one as a huge waste of resources.

Jdforrester-WMF added a comment.EditedJun 25 2019, 9:28 PM

Changed back the topic. This is a huge scope change and derailing things. As far as I see everywhere in the api we use "claims", not "statements" (also in the functions). The only inconsistency right now is mediainfo, that should be fixed. If you want to change the everything in the Wikibase API to use statements instead of claims (wbgetclaims -> wbgetstatements, etc.), file a new task so I can down vote that one as a huge waste of resources.

OK, then I can just Decline this task? As established above, when Wikimedia DE wrote WBMI in early 2016 they used "statements" because all new code should use that and not "claims", but haven't gone back to fix Wikidata to use the modern language.

Changed back the topic. This is a huge scope change and derailing things. As far as I see everywhere in the api we use "claims", not "statements" (also in the functions). The only inconsistency right now is mediainfo, that should be fixed. If you want to change the everything in the Wikibase API to use statements instead of claims (wbgetclaims -> wbgetstatements, etc.), file a new task so I can down vote that one as a huge waste of resources.

OK, then I can just Decline this task? As established above, when Wikimedia DE wrote WBMI in early 2016 they used "statements" because all new code should use that and not "claims", but haven't gone back to fix Wikidata to use the modern language.

"Established"? Where? When? Point me to the official announcement please! The one that gives everyone time to prepare, before it's released on, say, Commons. That one. Until then, this remains open.

To avoid that people confuse claims and statements in general, maybe the feature should use an entirely different name.

As established above, when Wikimedia DE wrote WBMI in early 2016 they used "statements" because all new code should use that and not "claims", but haven't gone back to fix Wikidata to use the modern language.

"Established"? Where? When? Point me to the official announcement please! The one that gives everyone time to prepare, before it's released on, say, Commons. That one.

I have no idea when exactly the Wikidata made this decision, or whether or how it was communicated "explicitly", but for outside observers like me it's been obvious for years – for example, the Cirrus search modifier is "haswbstatement" not "haswbclaim".

Until then, this remains open.

That's not how Phabricator works.

The SDC team will follow the recommendations/decisions of the original authors (WMDE).

We do believe, as Addshore mentioned above, that there's a strong possibility that the Commons model will end up not using references in the way that Wikidata does. We'll defer to WMDE for how that should be reflected in the serialization code.

Changed back the topic. This is a huge scope change and derailing things. As far as I see everywhere in the api we use "claims", not "statements" (also in the functions). The only inconsistency right now is mediainfo, that should be fixed. If you want to change the everything in the Wikibase API to use statements instead of claims (wbgetclaims -> wbgetstatements, etc.), file a new task so I can down vote that one as a huge waste of resources.

OK, then I can just Decline this task? As established above, when Wikimedia DE wrote WBMI in early 2016 they used "statements" because all new code should use that and not "claims", but haven't gone back to fix Wikidata to use the modern language.

"Established"? Where? When? Point me to the official announcement please! The one that gives everyone time to prepare, before it's released on, say, Commons. That one. Until then, this remains open.

So the data model is described at https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON . Notice the usage of "claims" instead of "statements" . This is considered a stable data format, see https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy#Stable_Data_Formats . The different api functions of claims are subject to https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy#Stable_Public_APIs . Shall I continue? You want to break all these stable policies just because it looks better?

So, although not particularly clear on the https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON page, this is the JSON data model definition for items and properties.
And the stable interface policy on wikidata only currently applies to wikidata.

provided by Wikibase as deployed on www.wikidata.org.

So the SIP doesn't apply to commons at all right now. This is something that we need to discuss with the StructuredDataOnCommons team once the main initial development stages are all complete.

  • Maybe the "labels" represented in JSON will be changed to "captions"?
  • "descriptions" will also probably be removed from the JSON

Anyway, the docs and links etc from the SIP and for the JSON data model docs probably need a bit of work to include things like Lexemes for the link to stable JSON representations on wikidata.org rather than only showing the item / property JSON definition on Wikibase/DataModel/JSON.

In light of all of this we need to:

  • Discuss with the SDOC team the stability interface policy, and if they want one yet, and where this should be, and what it should include for mediainfo vs the rest of wikidata at this stage.
  • Improve the "wikibase json datamodel" docs to make it clear what docs are for what entities etc, and which json data models are currently covered by the SIP.
  • Decide what to do regarding claims vs statements for both wikidata entities and also within media info

Regrading that last point, in the long run, we want to stop talking about "claims" everywhere, as the statements vs claims legacy only leads to more confusion.
In the long run, probably once we have a new iteration on our API, we will likely change this JSON serialization, but none of this is happening yet.
The reason that we even see claims anywhere is explained in T149410#5281355
If we want to move towards "statements" in our JSON output, it is highly unlikely that we are going to change mediainfo to now have a "claims" key when in a year or so it will then be moved again back to "statements".

I feel like I have rambled on enough for now but perfectly happy to keep discussing this, but I believe in terms of the serialization on wikidata.org and on commons.org and the claims vs statements keys, nothing is going to be changing in either place any time soon.