Page MenuHomePhabricator

CommonsMetadata cannot differentiate between license of the image and other licenses
Open, LowPublic

Description

Some images use license templates for works other than the actual image. There is no way to select the right license currently.

Examples:

When CommonsMetadata selects the wrong license, that could result in a copyright violation.

See this proposal for a possible way of representing license targets in HTML.

Event Timeline

Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added projects: CommonsMetadata, WMF-Legal.
Tgr subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

A little background for those not that familiar with Commons and copyright licenses:

  • License templates (aka Copyright tags) are required for each author
    • In case of derivative works like photographs of artworks we require an license template for artist and for the photographer. In case of 2D artworks (like paintings) we do not required copyrights for the photographer because of Bridgeman Art Library v. Corel Corp. law. However all sculptures and other 3D artworks should have separate copyright statements for each author.
    • Some media like recordings of music also might have multiple authors: composer, lyrics author, performers, etc. and we could have copyright licenses for each
  • Media is supposed to be in Public Domain (PD) or freely licensed in both US and in the country of origin often resulting in separate copyright tags for US and the country of origin
  • Some media is multilicensed where user can pick one of the licenses like GFDL and CC. See Commons:Multi-licensing

As a result a single image can have a lot of separate license templates. Some licenses should be treated as if they were connected by "AND" operator where you have to meet legal requirements of both licenses and some need to treated as if they were connected by "OR" operator where you can pick one. As long as all license templates belong to Public Domain "family" than Media Viewer usually can deal with them properly, but when some licenses are PD and some are not (like in case of CC photo of PD sculpture) than many things are wrong:

  • Media Viewer might be giving incorrect attributions
  • All PD templates call {{Cc-pd-mark-footer}} which adds"This file has been identified as being free of known restrictions under copyright law, including all related and neighboring rights." statement to the template which is not correct if CC licenses are present
  • Each license adds a tracking category, so people looking for PD images might be tempted to use them to find PD images. Which would be wrong if image is also CC.

Discussions related to this issue:

So looking at this, there are a few possible solutions to this:

  • Use the most restrictive licence. This is probably the best option, as at least it would result in users following correct licensing terms instead of incorrectly interpreting something as PD
  • Showing multiple licences in Media Viewer. Much more difficult to implement but would probably work the best in this scenario
  • Not declaring the copyright status. If the licensing is ambiguous, then say some text like "Multiple licences (more info)" or "Unclear copyright status"

Whatever the solution, the current method of determining the licence is insufficient. almost 50,000 files use {{copyright information}} alone, which is a structured way of displaying multiple licences, and pretty much all of these are affected. At the very least a file shouldn't be declared as PD if there is a more restrictive licence on the page.

CommonsMetadata (the extension parsing file description pages into machine-readable metadata for MediaViewer) has license coalescing logic, which just takes the most permissive license (which is the correct behavior for normal multi-licensing). Flipping that around would be very simple, although we'd still show an incorrect license. Forcing the user to click through to see the license information would also be relatively easy, and more correct although less user-friendly and TBH I'm pretty skeptical about the average user being able to to interpret a file description page with multiple licenses correctly. But it would make it not MediaViewer's problem, which I guess you can see as an improvement.

almost 50,000 files use {{copyright information}} alone, which is a structured way of displaying multiple licences

Not a very structured way, unfortunately - it still doesn't tell you what the actual license for reuse is. Still, it at least contextualizes what each license is for. It wouldn't be hard to extract that information, and have CommonsMetadata return all licenses along with what each license stands for, and maybe there's a reasonable way to display that in MediaViewer. Not sure what fraction of mixed-license file pages are covered by that template though.

(FWIW the initial version of MediaViewer tried to play it safe and didn't display any license information at all, just "click here to see the license" link that took users to the file page. Some Commons community members claimed that's in violation of CC licenses. I don't think there were good reasons to think that, and I don't think that argument would resurface, but it's something to be aware of.)

Flipping that around would be very simple, although we'd still show an incorrect license.

Sounds sensible to me. If multiple licenses are detected, a note like "Other licenses may be available, click for details" could be displayed.

Sounds sensible to me. If multiple licenses are detected, a note like "Other licenses may be available, click for details" could be displayed.

Seconding this, sounds more than sensible and would at least mean we're not displaying licenses that are more permissive than the ones which should actually be displayed.