Page MenuHomePhabricator

[L] Add schema.org license data to file pages
Open, Needs TriagePublic

Description

As a first step, we'll add a JSON-LD script to all file pages on Commons with the license data that will soon be required by Google.

Sample script:

<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "ImageObject",
  "contentUrl": "https://example.com/photos/1x1/black-labrador-puppy.jpg",
  "license": "https://example.com/license",
  "acquireLicensePage": "https://example.com/how-to-use-my-images"
}
</script>

We should also inspect the page size (see this task; if the article script only added 0.5 - 1KB I don't expect this script will be a problem)


Acceptance criteria:

  • A JSON-LD script is present in the source code of file pages containing license and acquireLicensePage attributes
  • license links to the actual license, or to a page with information about the public domain, whichever applies to that image
  • acquireLicensePage should link to the file page
  • File pages on production sites are valid when tested with the rich results testing tool

COVID-19 Deployment Criteria

  • Can you roll back this change without lasting impact?
    1. A recovery plan is required as this will help identify our capacity for recovering from the failure
    2. THIS IS A KEY QUESTION, if you can’t answer it, you shouldn’t deploy
  • Is specialized knowledge required to support this change in production? If so, are there multiple people with this knowledge?

No specialized knowledge needed

  • Is there a way to increase confidence about the correctness of this change?
    1. Reviews (Design, Code, etc)
    2. Testing coverage (unit tests, integration tests)
    3. Manual testing (e.g. Beta, vagrant, docker)

All of the above. We should definitely manual test on beta Commons after merge.

Event Timeline

AnneT created this task.May 29 2020, 8:51 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 29 2020, 8:51 PM
CBogen renamed this task from Add schema.org license data to file pages on Commons to [L] Add schema.org license data to file pages on Commons.Jun 3 2020, 4:19 PM
AnneT added a comment.Jun 4 2020, 8:08 PM

Hey @Ramsey-WMF, I have some questions about this one based on comments in the parent task:

  1. What should we include for acquireLicensePage? This attribute is optional but recommended. It seems to me that we would want to link to the File page, which includes a description of the license. What do you think? As a reminder:

acquireLicensePage
A URL to a page where the user can find information on how to license that image. Here are some examples:

  • A check-out page for that image where the user can select specific resolutions or usage rights
  • A general page that explains how to contact you
  1. Can you confirm that the following fall under public domain and, therefore, we can expect not to get a licenseUrl back for them? (They all have PD in the license template so I'm assuming they are.) If so, does this also mean there will be no acquireLicensePage value?
    • Original work of the US Federal Government
    • Orignal work of NASA
    • First published in the US before 1925
    • Faithful reproduction of a painting in the public domain because the artist died more than 70 years ago
  1. On a similar note, if an item has no licenseUrl or acquireLicensePage, do we want to add the schema script at all or omit it?
  1. See this comment – if we're using the URL of the actual image as the ImageObject URL, do we also need to include the thumbnail attribute since the actual image doesn't appear on the File page? It makes sense, but we might want to double-check with Google.

There are a lot of remaining questions about how to handle this on Article pages, but this is a good start...

Change 604143 had a related patch set uploaded (by Anne Tomasevich; owner: Anne Tomasevich):
[mediawiki/extensions/WikibaseMediaInfo@master] [WIP] Add JSON-LD script with schema.org license info to filepages

https://gerrit.wikimedia.org/r/604143

AnneT added a comment.Jun 11 2020, 8:19 PM

Answers to the above questions:

What should we include for acquireLicensePage?

File page. This attribute has 2 purposes: to serve as a layperson's explanation of a license (as opposed to the potentially legalese-y license itself) and to tell users how they can license an image, e.g. checking out on a stock photo website. Obviously the latter doesn't apply to us but it seems including a link to the file page, which offers a summary of the license, would be appropriate and helpful.

Can you confirm that the following fall under public domain and, therefore, we can expect not to get a licenseUrl back for them?

We're still working out what to do about images in the public domain since there is no license; we may link to an existing page on Commons with info about the public domain or create a new page specifically for this purpose. This should probably involve Keegan.

On a similar note, if an item has no licenseUrl or acquireLicensePage, do we want to add the schema script at all or omit it?

Correct, don't include it

See this comment – if we're using the URL of the actual image as the ImageObject URL, do we also need to include the thumbnail attribute since the actual image doesn't appear on the File page? It makes sense, but we might want to double-check with Google.

The only thing Google needs for the licensable badge is the main ImageObject url, which should be the URL indexed by Google (which appears to be the original file). A reference to the thumbnail that actually appears on the page isn't needed, although we could include one as an organization-wide standard if we want to.

AnneT updated the task description. (Show Details)Jun 11 2020, 8:21 PM

For the initial launch of this feature, we'll use https://commons.wikimedia.org/wiki/Help:Public_domain for the license attribute for images in the public domain.

Change 606428 had a related patch set uploaded (by Anne Tomasevich; owner: Anne Tomasevich):
[mediawiki/extensions/CommonsMetadata@master] Add JSON-LD script with schema.org license info to filepages

https://gerrit.wikimedia.org/r/606428

Change 604143 abandoned by Anne Tomasevich:
[WIP] Add JSON-LD script with schema.org license info to filepages

Reason:
Abandoning in favor of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CommonsMetadata/ /606428

https://gerrit.wikimedia.org/r/604143

AnneT updated the task description. (Show Details)Jun 24 2020, 9:05 PM

Change 606428 merged by jenkins-bot:
[mediawiki/extensions/CommonsMetadata@master] Add JSON-LD script with schema.org license info to filepages

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CommonsMetadata/ /606428

AnneT renamed this task from [L] Add schema.org license data to file pages on Commons to [L] Add schema.org license data to file pages.Jul 9 2020, 1:51 PM
AnneT updated the task description. (Show Details)

Everything looks right on Commons File pages, although I'd like to have the Google team confirm that the data looks good on their end. However, the script isn't present on wikipedia File pages (I've checked Catalan, Hebrew, and English), so I'll look into that.

AnneT removed AnneT as the assignee of this task.Sep 9 2020, 10:34 PM

Based on feedback from people at the Foundation and at Google, this has been successfully completed for Commons filepages. It doesn't yet apply to Wikipedia filepages because the code only supports local files. This functionality would have been extended to files in foreign repos by this patch, but it was abandoned because the overall approach was rejected.

There are two approaches to resolving this:

  1. Update the code to support foreign files on filepages
  2. Incorporate support of foreign files in the resolution to T256666, which includes files on article pages (this is what the abandoned patch aimed to do)

If solution 2 is going to take a long time, it might be worth going for solution 1 in the meantime.

CBogen moved this task from Triage to SDoC on the Structured-Data-Backlog board.Jan 19 2021, 10:20 PM