⚓ T149410 For consistency MediaInfo serialization should use "claims" as key, rather than "statements"

Status	Subtype	Assigned	Task
Open	Feature	None	T223820 Properly implement structured data access on Commons in Pywikibot
Open		None	T149410 For consistency MediaInfo serialization should use "claims" as key, rather than "statements"
Resolved		matthiasmullie	T230315 [XL] Create a way to see and add references to structured data on Commons (MediaInfo) statements
Resolved		Etonkovidova	T246818 Clicking on "Remove as Prominent" from a Wikimedia Commons file page result in dropping the related references
Resolved	BUG REPORT	matthiasmullie	T296616 Rendering error
Resolved		Etonkovidova	T297171 Structured data - references can be published without property value

hoo created this task.Oct 28 2016, 9:29 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 28 2016, 9:29 AM

hoo added a subscriber: matthiasmullie.Mar 16 2018, 5:04 PM

Addshore moved this task from incoming to needs discussion or investigation on the Wikidata board.Sep 18 2018, 2:34 PM

Tpt added a project: StructuredDataOnCommons.Jan 10 2019, 8:57 PM

This is in MediaInfoSerializer::getSerialized

		$serialization['statements'] = $this->statementListSerializer->serialize(
			$mediaInfo->getStatements()
		);

This is already now going to cause an issue as we now have entities in the commons DB including serialization with a "statements" key, so we will have to have a compat layer.

Magnus subscribed.Jun 20 2019, 12:15 PM

Multichill subscribed.Jun 20 2019, 8:16 PM

Multichill added subscribers: Keegan, • Ramsey-WMF.Jun 21 2019, 8:18 AM

Magnus ran into this and this is very confusing and inconsistent. The longer we wait with fixing this, the more effort it will cost. Example https://commons.wikimedia.org/w/api.php?action=wbgetentities&ids=M27401711 uses "statements" as the key, but we expect "claims" as the key, like https://commons.wikimedia.org/w/api.php?action=wbgetentities&ids=Q422341 . This is also consistent with the api calls like https://commons.wikimedia.org/w/api.php?action=help&modules=wbcreateclaim , https://commons.wikimedia.org/w/api.php?action=wbgetclaims&entity=M27401711 , https://commons.wikimedia.org/w/api.php?action=help&modules=wbremoveclaims and https://commons.wikimedia.org/w/api.php?action=help&modules=wbsetclaim

By the way, I noticed another difference: The imageinfo output doesn't contain the line "datatype": "wikibase-item". Not sure if this is related or not?

Another inconsistency pointed out by @Lucas_Werkmeister_WMDE is at T222159

Lucas_Werkmeister_WMDE mentioned this in T222159: WikibaseMediaInfo serializes empty statements as empty array [] instead of empty object {} in JSON.Jun 21 2019, 9:12 AM

Serialization:

And the other way around:

https://commons.wikimedia.org/wiki/File:Wikidata_statement.svg might be useful for the relation between a claim and a statement.

Magnus ran into this and this is very confusing and inconsistent. The longer we wait with fixing this, the more effort it will cost.

Totally agree there, we have already put this off for years.
It's a pretty breaking change though.
But with the correct announcement and time that would all be fine, and even if people miss it, fixing their code should be trivial.

Thoughts @Lydia_Pintscher ?

Can't we do a smart trick with showing it twice (claims and statements) in the front and storing it only once in the back? A big bang is much more complicated and riskier. You would have a timeline like:

Switch api read and write functions to expose both claims and statements
Clients can start switching to "claims"
Switch the backend to read it as both statements and claims, but store it as "claims"
Do a null edit run to get rid of the "statements" and replace it with "claims"
Switch backend code to only use "claims"
Check if no code is using statements anymore in the api
Switch front end to only use "claims"

In T149410#5275156, @Addshore wrote:

Magnus ran into this and this is very confusing and inconsistent. The longer we wait with fixing this, the more effort it will cost.

Totally agree there, we have already put this off for years.
It's a pretty breaking change though.
But with the correct announcement and time that would all be fine, and even if people miss it, fixing their code should be trivial.

Thoughts @Lydia_Pintscher ?

From my side absolutely. I'm not sure why this is different in the first place.

FWIW, I have already changed my code to work with either claims or statements. Quick thoughts:

IMHO this change is too significant to do it "just because it's a nicer word". No one really cares what it's called, as long as you call it the same thing every time.
A big problem with this change that it was not announced anywhere I did see, and I'm pretty much subscribed to everything public. Give us poor volunteer devs some warning, at least
Also, it's inconsistent. Wikidata items and properties have claims, mediainfo items have statements. Is this going to change on Wikidata as well? Wikibase in general?
Other things have changes in the Commons/mediainfo implementation. datatype is missing, for one (tracked by some other issue here). Will it come back? Will it stay missing but only for mediainfo?

The problem is not that things are different, they are different needlessly (AFAICT), and unannounced.

Jep absolutely.

In T149410#5275599, @Multichill wrote:

Can't we do a smart trick with showing it twice (claims and statements) in the front and storing it only once in the back? A big bang is much more complicated and riskier.

We could do, but this would drastically increase the sizes of responses.
We could use a method similar to some mediawiki core api versions and introduce a new temporary param, for example statementsnotclaims=1.
When 1 send statements, when not set send claims, then we can slowly monitor adoption of the new format and send out warnings with the old format, trying to chase down any user agents that still use the old one before actually fully switching over?
We could also just introduce a parameter for "serializationVersion" which could basically do the same thing.
This would all be easier if we had a nice versioned API already or versioned serialization :)

T92961: [Story] Versioning in JSON output

We also have T221737: REST API Infrastructure in MediaWiki to look forward to, it might make sense to hold off on a change like this until we make a "big move" to a new API.

A big problem with this change that it was not announced anywhere I did see, and I'm pretty much subscribed to everything public. Give us poor volunteer devs some warning, at least

Hmm, you mean the fact that statements on mediainfo entities are serialized as "statements" not "claims".
I guess it is not seen as a breaking change as media info entities are new, and they can do what they want with their serialization.

Also, it's inconsistent. Wikidata items and properties have claims, mediainfo items have statements. Is this going to change on Wikidata as well? Wikibase in general?

Everything should probably change to "statements" in the long run, as having "claims" being talked about in the serialization and only the serialization is confusing. and per the definition of our data model it also doesn't make sense.

Other things have changes in the Commons/mediainfo implementation. datatype is missing, for one (tracked by some other issue here). Will it come back? Will it stay missing but only for mediainfo?

Which ticket number is this one?

In T149410#5275900, @Addshore wrote:

In T149410#5275599, @Multichill wrote:

Can't we do a smart trick with showing it twice (claims and statements) in the front and storing it only once in the back? A big bang is much more complicated and riskier.

We could do, but this would drastically increase the sizes of responses.

Drastically? We're talking Commons here. Before I ran a bot only 50.000 files even had claims. Nothing has references and qualifiers are only introduced this week. Everything is still tiny compared to Wikidata

We could use a method similar to some mediawiki core api versions and introduce a new temporary param, for example statementsnotclaims=1.
When 1 send statements, when not set send claims, then we can slowly monitor adoption of the new format and send out warnings with the old format, trying to chase down any user agents that still use the old one before actually fully switching over?

Commons! Not, Wikidata. What adoption? I think we currently have:

The stuff the SDOC wrote (front end, uploadwizard, etc.). That's easy to track and fix
Some stuff I wrote in which I don't have claims/statements yet
Some stuff Magnus wrote and he already updated his code
Anything else? I doubt it.

So why bother to make this more complicated than needed? Or am I missing something here?

Drastically? We're talking Commons here. Before I ran a bot only 50.000 files even had claims. Nothing has references and qualifiers are only introduced this week. Everything is still tiny compared to Wikidata

I'm talking about changing this from "claims" to "statements" on wikidata.
That would be the right thing to do, as they are statements, not claims.

In T149410#5275912, @Addshore wrote:

Drastically? We're talking Commons here. Before I ran a bot only 50.000 files even had claims. Nothing has references and qualifiers are only introduced this week. Everything is still tiny compared to Wikidata

I'm talking about changing this from "claims" to "statements" on wikidata.
That would be the right thing to do, as they are statements, not claims.

No no no, that's not what this task is about. The scope of this task is only mediainfo on Commons and undoing the mistake of introducing statements instead of claims. Please focus on the issue at hand.

This is partly a question for the SDOC team, and if SDOC will ever have references as part of their data model.
Then there is also the question for wikidata and wikibase, which is, why are we using "claims" in the serialization?

The answer to the latter is for legacy reasons, and to avoid breaking peoples tools, if we could rename our api modules to talk about statements instead of claims, and easily change the serialization without annoying people we would, and we will at some point.
Why are we currently putting statements under a claims key? that makes no sense?

And thus, if we want to make this change in wikibase / on wikidata with items and properties, then why change sdoc to use claims, when a little way down the road we will want to then change it back to statements to be consistent with wikidata and wikibase once again?

Jdforrester-WMF renamed this task from For consistency MediaInfo serialization should use "claims" as key, rather than "statements" to For consistency, Wikibase serialization should use "statements" as key, rather than "claims", like modern Wikibase code now does.Jun 24 2019, 9:07 PM

@Jdforrester-WMF Is that an official design decision (claims=>statements)? Where was this fundamentally breaking change announced to the public?

Personally I don't care what it's called, just that it's (a) consistent and (b) announce before changed in production.

In T149410#5275921, @Addshore wrote:

Why are we currently putting statements under a claims key? that makes no sense?

Originally, Statement was a subclass of Claim, so all statements were claims. We chose the more general term for the key, so it could accommodate all kinds of claims. Later, we found that we really always wanted Statements, we didn't find a use cases that never needed references. So Claim was dropped as a base class of Statement.

Multichill renamed this task from For consistency, Wikibase serialization should use "statements" as key, rather than "claims", like modern Wikibase code now does to For consistency MediaInfo serialization should use "claims" as key, rather than "statements".Jun 25 2019, 9:11 PM

Changed back the topic. This is a huge scope change and derailing things. As far as I see everywhere in the api we use "claims", not "statements" (also in the functions). The only inconsistency right now is mediainfo, that should be fixed. If you want to change the everything in the Wikibase API to use statements instead of claims (wbgetclaims -> wbgetstatements, etc.), file a new task so I can down vote that one as a huge waste of resources.

In T149410#5284327, @Multichill wrote:

Changed back the topic. This is a huge scope change and derailing things. As far as I see everywhere in the api we use "claims", not "statements" (also in the functions). The only inconsistency right now is mediainfo, that should be fixed. If you want to change the everything in the Wikibase API to use statements instead of claims (wbgetclaims -> wbgetstatements, etc.), file a new task so I can down vote that one as a huge waste of resources.

OK, then I can just Decline this task? As established above, when Wikimedia DE wrote WBMI in early 2016 they used "statements" because all new code should use that and not "claims", but haven't gone back to fix Wikidata to use the modern language.

In T149410#5284363, @Jdforrester-WMF wrote:

In T149410#5284327, @Multichill wrote:

Changed back the topic. This is a huge scope change and derailing things. As far as I see everywhere in the api we use "claims", not "statements" (also in the functions). The only inconsistency right now is mediainfo, that should be fixed. If you want to change the everything in the Wikibase API to use statements instead of claims (wbgetclaims -> wbgetstatements, etc.), file a new task so I can down vote that one as a huge waste of resources.

OK, then I can just Decline this task? As established above, when Wikimedia DE wrote WBMI in early 2016 they used "statements" because all new code should use that and not "claims", but haven't gone back to fix Wikidata to use the modern language.

"Established"? Where? When? Point me to the official announcement please! The one that gives everyone time to prepare, before it's released on, say, Commons. That one. Until then, this remains open.

To avoid that people confuse claims and statements in general, maybe the feature should use an entirely different name.

In T149410#5285579, @Magnus wrote:

In T149410#5284363, @Jdforrester-WMF wrote:

As established above, when Wikimedia DE wrote WBMI in early 2016 they used "statements" because all new code should use that and not "claims", but haven't gone back to fix Wikidata to use the modern language.

"Established"? Where? When? Point me to the official announcement please! The one that gives everyone time to prepare, before it's released on, say, Commons. That one.

I have no idea when exactly the Wikidata made this decision, or whether or how it was communicated "explicitly", but for outside observers like me it's been obvious for years – for example, the Cirrus search modifier is "haswbstatement" not "haswbclaim".

Until then, this remains open.

That's not how Phabricator works.

The SDC team will follow the recommendations/decisions of the original authors (WMDE).

We do believe, as Addshore mentioned above, that there's a strong possibility that the Commons model will end up not using references in the way that Wikidata does. We'll defer to WMDE for how that should be reflected in the serialization code.

In T149410#5285579, @Magnus wrote:

In T149410#5284363, @Jdforrester-WMF wrote:

In T149410#5284327, @Multichill wrote:

Changed back the topic. This is a huge scope change and derailing things. As far as I see everywhere in the api we use "claims", not "statements" (also in the functions). The only inconsistency right now is mediainfo, that should be fixed. If you want to change the everything in the Wikibase API to use statements instead of claims (wbgetclaims -> wbgetstatements, etc.), file a new task so I can down vote that one as a huge waste of resources.

OK, then I can just Decline this task? As established above, when Wikimedia DE wrote WBMI in early 2016 they used "statements" because all new code should use that and not "claims", but haven't gone back to fix Wikidata to use the modern language.

"Established"? Where? When? Point me to the official announcement please! The one that gives everyone time to prepare, before it's released on, say, Commons. That one. Until then, this remains open.

So the data model is described at https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON . Notice the usage of "claims" instead of "statements" . This is considered a stable data format, see https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy#Stable_Data_Formats . The different api functions of claims are subject to https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy#Stable_Public_APIs . Shall I continue? You want to break all these stable policies just because it looks better?

So, although not particularly clear on the https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON page, this is the JSON data model definition for items and properties.
And the stable interface policy on wikidata only currently applies to wikidata.

provided by Wikibase as deployed on www.wikidata.org.

So the SIP doesn't apply to commons at all right now. This is something that we need to discuss with the StructuredDataOnCommons team once the main initial development stages are all complete.

Maybe the "labels" represented in JSON will be changed to "captions"?
"descriptions" will also probably be removed from the JSON

Anyway, the docs and links etc from the SIP and for the JSON data model docs probably need a bit of work to include things like Lexemes for the link to stable JSON representations on wikidata.org rather than only showing the item / property JSON definition on Wikibase/DataModel/JSON.

In light of all of this we need to:

Discuss with the SDOC team the stability interface policy, and if they want one yet, and where this should be, and what it should include for mediainfo vs the rest of wikidata at this stage.
Improve the "wikibase json datamodel" docs to make it clear what docs are for what entities etc, and which json data models are currently covered by the SIP.
Decide what to do regarding claims vs statements for both wikidata entities and also within media info

Regrading that last point, in the long run, we want to stop talking about "claims" everywhere, as the statements vs claims legacy only leads to more confusion.
In the long run, probably once we have a new iteration on our API, we will likely change this JSON serialization, but none of this is happening yet.
The reason that we even see claims anywhere is explained in T149410#5281355
If we want to move towards "statements" in our JSON output, it is highly unlikely that we are going to change mediainfo to now have a "claims" key when in a year or so it will then be moved again back to "statements".

I feel like I have rambled on enough for now but perfectly happy to keep discussing this, but I believe in terms of the serialization on wikidata.org and on commons.org and the claims vs statements keys, nothing is going to be changing in either place any time soon.

Addshore mentioned this in T226940: Discuss a Stable Interface Policy with the SDOC team.Jun 30 2019, 7:04 PM

Addshore mentioned this in T226941: Document the MediaInfo JSON output on mediawiki.org.

matej_suchanek added a parent task: T223820: Properly implement structured data access on Commons in Pywikibot.Jul 15 2019, 10:47 AM

Lucas_Werkmeister_WMDE mentioned this in T212069: API action=wbgetentities does not handle formatversion=2.Oct 21 2019, 4:58 PM

matej_suchanek merged a task: T246743: Inconsistent "claims" vs "statements" in wbgetentities API between Wikidata and Commons.Mar 3 2020, 9:05 AM

matej_suchanek added a subscriber: valerio.bozzolan.

valerio.bozzolan awarded a token.Mar 3 2020, 5:50 PM

valerio.bozzolan mentioned this in T246809: Inconsistencies between Wikidata and Structured Data about Snak's "datatype" from wbgetentities API results.Mar 3 2020, 7:00 PM

Lucas_Werkmeister_WMDE mentioned this in T264086: No generic endpoint to download entity data.Oct 6 2020, 12:57 PM

Lucas_Werkmeister_WMDE mentioned this in T271105: wbeditentity response does not contain lemma data.Jan 26 2021, 2:41 PM

In T149410#5287189, @Ramsey-WMF wrote:

The SDC team will follow the recommendations/decisions of the original authors (WMDE).

We do believe, as Addshore mentioned above, that there's a strong possibility that the Commons model will end up not using references in the way that Wikidata does. We'll defer to WMDE for how that should be reflected in the serialization code.

Hi all (especially @Addshore and @Ramsey-WMF)

Is there any progress on adding references to Commons? Now SDC is being widely adopted it would be extremely helpful to be able to use references. I know that it is currently possible to add references but they are hidden.

Would someone be able to explain in plain English why Structured Data on Commons shouldn't use references in the same way Wikidata does?

Also what would be the resources needed to make references work in a way that individuals and organisations could use in the normal interface?

Thanks

John_Cummings unsubscribed.May 13 2021, 10:36 AM

John_Cummings added a subtask: T230315: [XL] Create a way to see and add references to structured data on Commons (MediaInfo) statements .May 13 2021, 10:49 AM

@John_Cummings - structured data on a commons File page is for describing the file. For example:

what an image depicts
the copyright licence associated with the file
who created the file

In this context I don't see a solid use case for references. We don't need a reference to say that an image depicts a fish, for example. An image might be a photo of a famous painting, but in that case we'd use the digital representation of property to point at the wikidata item for the painting itself, and referenced information about the painting (rather than the image file) would be available in wikidata

Is there something I'm missing? Had you a specific use case in mind?

Hi @Cparle thanks for replying. I know @Fuzheado @Alicia_Fagerving_WMSE @Jopparn @Battleofalma etc will also have thoughts on this.

I'll give you my answer and let other expand on it. I'm basing this on 10 years of working as Wikimedian in Residence for cultural institutions, UN agencies and parts of the EU. The main use case is from my perspective is for any content created by external organisations, which runs to 10s of millions of files on Commons. Many of these organisations share quite extensive metadata with their content way beyond depicts, copyright and author. The main benefits I see are the same as for references on Wikipedia, verifiability and credit.

Wikipedia
Allowing users to know that the metadata comes from an organisation creates a level of trust in the information. I think SDC could be widely used and useful on Wikipedia but without references to provide verifiability it seems unlikely it will get used, in the same way Wikidata data without references are blocked on English Wikipedia infoboxes in a lot of situations. Another benefit for Wikipedia specifically is to make creating Wikipedia articles for things depicted on Commons (eg an object in a museum) easier because the references which are collated in SDC can most probably be reused on Wikipedia.

Organisations sharing content:
Many organisations adopt an open license specifically so they can share it on Wikimedia projects, most of my job in the UN the last 5 years has been around helping orgs adopt open licenses. Generally speaking organisations who share content on Commons want recognition and metrics around page views and a clear delineation between their content and Wikimedia community contributions to avoid confusion from readers. Have references in SDC will give the organisations credit for the metadata they share and reduce concerns about their content being confused with community contributions which may be incorrect. It will also encourage them to start using Wikidata and SDC on their own website eg providing multilingual labels. There's an extra barrier to them adopting open licenses with the CC0 license for SDC statements, generally organisations are willing to share under CC BY or SA for content but CC0 is difficult because is doesn't by its nature give them credit for their content. We get around this with Wikidata because we can say 'there will be references so people can see you added this data'. Generally speaking 'please can you spend a significant amount of time to understand and change your license so you can share your content with us, we won't give you credit for any of it' is really not going to work.

Hope this helps

Thanks again

@Cparle @John_Cummings : Why are you discussion references in this task instead of in T230315 ? This task is about about serialization of the data and that the fact that we use two different keys (claims vs statements) for the same thing.

In T149410#7088952, @Multichill wrote:

@Cparle @John_Cummings : Why are you discussion references in this task instead of in T230315 ? This task is about about serialization of the data and that the fact that we use two different keys (claims vs statements) for the same thing.

I'll copy my answer over to the other task, sorry, didn't realise they were different

thiemowmde unsubscribed.May 17 2021, 6:36 AM

CBogen closed subtask T230315: [XL] Create a way to see and add references to structured data on Commons (MediaInfo) statements as Resolved.Jan 24 2022, 5:22 PM

Restricted Application added a project: Structured-Data-Backlog. · View Herald TranscriptJan 24 2022, 5:22 PM

CBogen moved this task from Triage to For later on the Structured-Data-Backlog board.Jan 24 2022, 5:51 PM

Bugreporter mentioned this in T301089: Can the JSON format of a wikidata item and that of a Structure Data Commons be alligned? .Feb 6 2022, 10:45 PM

LucasWerkmeister mentioned this in T303760: MediaInfo JSON and Lua data missing datatype information in snaks.Mar 14 2022, 6:03 PM

Lucas_Werkmeister_WMDE mentioned this in T311977: Wikimedia Commons entity dumps are lacking datatype field.Jul 4 2022, 8:36 AM

Mitar subscribed.Jul 4 2022, 10:41 AM

Addshore unsubscribed.Jun 27 2023, 12:39 PM

Maxlath subscribed.Jul 11 2023, 2:57 PM

Not only is there inconsistency between Wikidata and Commons, but it also lies within Commons itself. When you wbgetentities, you get the claims/statements under "statements". But when you wbeditentity, you must specify the claims/statements to add/remove/update under "claims"! (Try it yourself.) This inconsistency is surprising and makes the development of API-based tools, like Pywikibot, really painful.

For consistency MediaInfo serialization should use "claims" as key, rather than "statements"
Open, HighPublic
Actions

Description

Related Objects
Search...

Event Timeline

For consistency MediaInfo serialization should use "claims" as key, rather than "statements"Open, HighPublicActions

Description

Related ObjectsSearch...

Event Timeline

For consistency MediaInfo serialization should use "claims" as key, rather than "statements"
Open, HighPublic
Actions

Related Objects
Search...