Page MenuHomePhabricator

CommonsMetadata should consider an image non-free if any of the license metadata blocks is nonfree
Closed, ResolvedPublic


English Wikipedia uses two templates on nonfree images: a copyright tag and a nonfree rationale. The copyright tag contains correct license metadata including the nonfree flag; the nonfree rationale contains partial metadata (e.g. "Fair use" as license name but no nonfree tag). As a result, sometimes (depending on which template comes first) the metadata block for the rationale template is preferred, and the image is considered free. (Example: Blade_Runner_poster.jpg)

One way to fix it would be to remove copyright metadata from rationale templates, or make sure they include the nonfree flag; according to this discussion, that's problematic. The other solution is for CommonsMetadata to consider an image nonfree if any one of its license metadata blocks has the nonfree flag.

Event Timeline

Change 282142 had a related patch set uploaded (by Gergő Tisza):
Check all license metadata blocks for nonfree flag

If I understand correctly this patch fixes one issue, but the licenses returned by selectLicense would still differ from page to page depending on the order in which the editor adds the templates. That looks like a nest of bugs to me. Shouldn't selectLicense merge all licenses into one object and return that in order to always return as much data as possible?

Preferably a bunch of defaults also gets merged in (if the parser doesn't do that already) to make sure that a license is consistent whatever an editor decides (not) to do.

The patch merges a single field (nonfree). Not sure how one would merge an author or a license name, for example.

If you would reverse the $sortedLicenses array and then array_reduce that with array_merge you would end up with one license that:

  • Has keys set by the most important license if they are present in that license
  • Any keys not in the most important license being set by other licenses, where more important licenses prevail over less important licenses

This fights (perhaps rare) cases where the function returns incomplete license data, because it is not set by the most important license. This will also set NonFree keys from lesser licenses, but is not limited to just that case.

If you agree with this approach I could make a patch, but bare with me as it would be my first.

It's better to ignore all but one licenses than to make up a license that is not real (e.g. has mismatching short and long name). The nonfree field can be merged safely, so that should be fixed; other fields generally can't. Eventually tools should learn to handle a collection of licenses, but we probably want to wait for the CommonsData project to finish first, to avoid working twice. (Also, the Commons community has so far shown no interest in providing enough information to handle multiple licenses. See e.g. the discussion link in T89692. Again, probably no point in poking now that CommonsData is close.)

Thank you for clarifying. I was under the assumption that one file would have one license and editors use the different blocks only for visual reasons. It sounds like a great idea to wait for CommonsData before making more changes. I think it would be good to merge this in the mean time as many file pages split the NonFree information over multiple templates. Let me know if I could help and otherwise I'll just see the merge notification somewhere in the future I presume.

dr0ptp4kt triaged this task as Medium priority.Aug 21 2017, 4:38 PM
dr0ptp4kt moved this task from Untriaged to Triaged on the Multimedia board.
dr0ptp4kt subscribed.

This is probably pertinent to Structured Data on Commons.

Ramsey-WMF moved this task from Next up to Triaged on the Multimedia board.

Whoops. Mistake.

Change 282142 merged by jenkins-bot:
[mediawiki/extensions/CommonsMetadata@master] Check all license metadata blocks for nonfree flag