Page MenuHomePhabricator

Display C2PA metadata when available on photos uploaded to Wikimedia Commons
Open, Needs TriagePublicFeature

Description

Feature summary
A new initiative to help determine the provenance of an image, called “Content Credentials” has recently emerged as one way to help detect AI or otherwise manipulated images. Image editing applications, AI-generation tools, and camera makers are implementing these credentials.

When an image that contains these credentials are uploaded to Wikimedia Commons, Commons should display this metadata alongside the image.

Use case(s)

One potential use case would be someone looking for an image to use on a Wikimedia project and they want to make sure, as best they can, that the image is an actual photo of what it claims to represent. Let’s say someone wanted to add a photo of a city, monument, or an example of something to a Wikipedia article. They search Commons, find a hundred photos of <thing>. Displaying the C2PA metadata would be one indicator of which of these 100 images are of the actual <thing> and not an AI-generated or otherwise manipulated image.

This basic concept would also extend to anyone wanting to use an image from Commons elsewhere, outside our projects. A student doing a report on a city and wanting to add an image of the actual city to their presentation, for instance.

Benefits

As AI-generated images are proliferating and the quality of said images increases, it is becoming more difficult to distinguish between a photo taken with a camera or a photo generated by a computer. Misinformation and disinformation is on the rise. The difficulty an average person has on determining what is “real” becomes more difficult. Displaying this metadata is one additional indicator of the providence of an image and would help people distinguish between an un-manipulated photo and a computer generated one.

The confusion around what is real and what is AI is leading to a distrust of, well anything, but in this instance, institutions – such as Wikipedia. If a well-meaning contributor were to add an AI-generated image to an article, and a reader discovered that the image was not of the subject being described, but computer-generated,that could erode trust in our projects. The same could be said of images found on Commons and used elsewhere. People will begin to distrust what they find and use on Commons.

See also

Event Timeline

So if I'm reading the linked pages correctly, this system is based on cameras, photo editing tools, and other tools involved in editing the image to cryptographically sign the file in some way. How would we decide which signatures are valid and which are not? How does this ensure that this doesn't end up promoting proprietary tools as more "trustworthy" than similar free software tools?

Hey taavi,

So if I'm reading the linked pages correctly, this system is based on cameras, photo editing tools, and other tools involved in editing the image to cryptographically sign the file in some way.

Yes, the video I linked to does a pretty good job explaining the cryptographic side of things. Even as a layperson I think I understand it. :)

How would we decide which signatures are valid and which are not?

There is an open-source API for implementing the verification check.

https://opensource.contentauthenticity.org/docs/introduction and their Github.

How does this ensure that this doesn't end up promoting proprietary tools as more "trustworthy" than similar free software tools?

As far as I understand this is an open initiative that is run by a non-profit.

I hope these answers are helpful.

A major question here is do we actually validate the sigs or do we just display the value? If we do validate and its invalid, do we still display it or do we pretend it doesnt exist or show a warning of some kind?

How does this ensure that this doesn't end up promoting proprietary tools as more "trustworthy" than similar free software tools?

As far as I understand this is an open initiative that is run by a non-profit.

Being an open initiative doesn't really matter much. It seems likely that open source tools may be locked out of such an ecosystem

[Personally I'm not sure if we should care. I'm more in the camp of display whatever metadata exists, and let users decide how to interpret it]

So if I'm reading the linked pages correctly, this system is based on cameras, photo editing tools, and other tools involved in editing the image to cryptographically sign the file in some way. How would we decide which signatures are valid and which are not?

https://opensource.contentauthenticity.org/docs/prod-cert seems to imply that any S/MIME cert is trusted using normal S/MIME PKI. Trusting everyone feels like a big gaping hole if your goal is to detect AI.

Perhaps though the goal is more to know who to blame for AI than to detect it?

I have my doubts about the effectiveness of this whole scheme. Maybe the main benefit is the non-malicious use case to detect files that are accurately labelled as containing AI.

Oh, maybe i misunderstood. https://opensource.contentauthenticity.org/docs/verify-known-cert-list/ says there is a hard coded list.

It feels like trust management in this thing is very hap-hazard.

There is a workshop on this topic at W3C on 12 March 2025: https://www.w3.org/events/workshops/2025/authentic-web-workshop/

(Putting my staff hat on for a second) Thanks @Shizhao, I've passed along this information and someone from the Foundation will be attending.

Hi all, I attended the Authentic Web Mini Workshop, part 1 mentioned above. The W3C is still in the very early stages of accepting proposals for tools that enable verifiability on the web. They are gathering proposals for such technologies with the goal of determining if any of them should become a web standard. The technologies currently identified are:

Proposals will be evaluated by the W3C based on a technical framework using two axes:
What role the tool plays in the web ecosystem:

  1. Content Consumer - Person receiving the information (audience, reader)
  2. Content Provider/Source - Person(s) or organizations delivering content
  3. Content Promoter - Person(s) or organization that amplify the spread of information
  4. Credibility Facilitator - Person(s) or organization who is helping the consumer decide what to trust.
  5. Platform - Technological system, and by extension the person or organization who maintains and controls it.

How the tool enables credibility based on the following factors:

It's worth noting that the W3C Credible Web Community Group and framework above is focused on credibility, which is broad term that encompasses elements of trust and reputation. The proposal outlined above specifically aims to enabled transparency, which is not a judgement on believability, but a display of provenance/origin. This seems like a useful framework when considering this proposal and it's impact/goals.

As a Commons user: it would be extremely helpful for MediaWiki to recognize and call out the presence of C2PA claims, even if the content of these claims is not fully decoded or verified. A number of AI image generation services, such as ChatGPT, use C2PA data to mark the images they generate. Even just displaying the software agent name from these claims would be of great value in identifying AI-generated images.