Page MenuHomePhabricator

[M] Change haslicense keywords
Closed, ResolvedPublic

Description

Background information

Some files, e.g. https://commons.wikimedia.org/wiki/File:Sarajevo_Cathedral_Rose_Window.jpg, have a copyright licence = attribution only license statement.
That's https://www.wikidata.org/wiki/Q98923445 on wikidata.

We allow filtering based on licenses with special search keyword. They can either be:

  • haslicense:cc-by-sa
  • haslicense:cc-by
  • haslicense:unrestricted
  • haslicense:other (for whetever does have a license, but didn't fit above)

The Special:MediaSeach UI, however, hides this search keyword complexity behind a simple filter with these options

  • Use with attribution
  • Use with attribution and same license
  • No restrictions
  • Other

AFAICT, the "attribution only license" statement in question is not a creative commons license.
Logically, I'd expect it belongs under "Use with attribution" in our UI, but given that it's not a CC license, it probably doesn't belong under "haslicense:cc-by-sa" (which is what drives the results for the "Use with attribution" option.

There's a minor disconnect between the search license keyword option names, and how we describe them in the UI. This causes some licenses to not fit within our model.

Acceptance criteria:

Note that this last criterium implies that there's total flexibility on the part of the community to come up with new license groups. This is acceptable.
There's 1 inherent problem there, though: they will not have i18n messages by default. It is possible to create messages on-wiki, though, so that would then also be in the community's control.
As long as we provide default messages for the existing 4 options, that's fine. Should the community ever decide to want to add another class of licenses, they then have all the tools needed to do so, without requiring an intervention from us.

Event Timeline

Adding @Keegan & @CBogen, who I suspect can best evaluate this. If the task description is not clear, LMK.

IMO to make things clearer the best thing to do would be to change the names of the haslicense keyword options to reflect the names in the UI. Would that have any negative impact/break anything?

e.g.

haslicense:attribution
haslicense:attributionsame
haslicense:unrestricted
haslicense:other

IMO to make things clearer the best thing to do would be to change the names of the haslicense keyword options to reflect the names in the UI. Would that have any negative impact/break anything?

Nope, should be fine as long as we properly coordinate the multiple places where those keywords are already in use.
(We should probably make some changes there anyway - we should probably simply read from the on-wiki config instead of having it hardcoded in multiple places)

(that's an answer from a technical perspective - I don't have an opinion on, or much knowledge about licenses, and how to best categorize them in the first place)

Nope, should be fine as long as we properly coordinate the multiple places where those keywords are already in use.
(We should probably make some changes there anyway - we should probably simply read from the on-wiki config instead of having it hardcoded in multiple places)

(that's an answer from a technical perspective - I don't have an opinion on, or much knowledge about licenses, and how to best categorize them in the first place)

I don't think we've heard any feedback that the categorization itself is problematic, so I think we should stick with what we have (unless @Keegan has another take on it).

@matthiasmullie any objections to repurposing this ticket from a spike to actually making the change?

@matthiasmullie any objections to repurposing this ticket from a spike to actually making the change?

Not at all; I'm happy to make the required changes to the ticket title/description and make it more concrete (unless @Keegan disagrees)
Nitpick: is attributionsame descriptive enough? Personally, I think something like attribution-license or attribution-same-license might be easier to understand (for those who want to use the keyword terms directly instead of the MediaSearch UI)

matthiasmullie renamed this task from [Spike] Investigate disconnect between search keyword license options and their UI descriptive names to Change haslicense keywords.Jan 28 2021, 3:58 PM
matthiasmullie updated the task description. (Show Details)

I have updated the ticket. Please look over the changes (esp. the last acceptance criterium, which would grant community better control)

Personally, I think something like attribution-license or attribution-same-license might be easier to understand (for those who want to use the keyword terms directly instead of the MediaSearch UI)

I'm fine with attribution-same-license. I'll update the AC.

I have updated the ticket. Please look over the changes (esp. the last acceptance criterium, which would grant community better control)

I think this is okay. Does it require a certain user permission to edit?

I think this is okay. Does it require a certain user permission to edit?

Yes, sysop (administrators)

CBogen renamed this task from Change haslicense keywords to [M] Change haslicense keywords.Feb 9 2021, 5:46 PM
CBogen updated the task description. (Show Details)

I'm still trying to understand how the license mapping works here.

We currently have a MediaWiki:Wikibasecirrus-license-mapping page that overrides the translation messages for wikibasecirrus-license-mapping in any given language.

The content of this page is structured like this:

cc-by-sa|
    P275=Q98755364,
    P275=Q98755344,
    P275=Q98755369,
    ...
cc-by|
    P275=Q19125117,
    P275=Q30942811,
    P275=Q18810333,
    ....
unrestricted|
    P275=Q98341313,
    P275=Q98592850,
    ...

I assume this is some kind of data object (PHP associative array?), where cc-by-sa, cc-by, etc. are the keys and the statements (P275=Q98755364, etc) are the values? What does the | mean? Also, why not just do this in JSON?

As far as I can tell, actually retrieving this message in PHP via wfMessage( 'wikibasecirrus-license-mapping' )->parse() will result in a plain string containing the above text. If we want to remove the hard-coded JSON data that populates the license dropdown menu and use the keys here instead, we need to parse the contents of this message first. It looks like WikibaseCirrusSearch does something similar in the HasLicenseFeature.php file. Do we need to introduce similar logic into SpecialMediaSearch.php so that we can read the contents of this file at runtime (picking up any new license types that may have been introduced by the community)?

As far as internationalizing any new license types, ideally message names would follow the same conventions we have already (I assume we'd want to loop through the keys in this file and look for messages defined under "prefix" + "key").

The label for license type cc-by lives under the name
wikibasemediainfo-special-mediasearch-filter-license-cc-by

If the community were to add a new license group called "foo", messages for it should be filed under
wikibasemediainfo-special-mediasearch-filter-license-foo.

But since this message was community-defined, it won't live in the i18n/ folder in our source code – instead it will live at
https://commons.wikimedia.org/w/index.php?title=MediaWiki:Wikibasemediainfo-special-mediasearch-filter-license-foo/en, I assume.

Is this more or less correct? If so: how do we let community admins know about where they should define labels for any new license groupings they might want to add in the future?

As a follow-up to the above, it looks like the CirrusSearch code already has a Util::parseSettingsInMessage() utility function for the purposes of parsing i18n messages into arrays.

Would it be appropriate to just add a use CirrusSearch\Util; line to SpecialMediaSearch.php in order to avoid duplicating this logic, or is introducing this kind of cross-extension dependency something we want to avoid?

I assume this is some kind of data object (PHP associative array?), where cc-by-sa, cc-by, etc. are the keys and the statements (P275=Q98755364, etc) are the values? What does the | mean? Also, why not just do this in JSON?

This is a totally arbitrary format that already exists for several of Cirrus' on-wiki configurable code (e.g. cirrussearch-boost-templates; see CirrusSearch\Util::parseSettingsInMessage)
The only deviation from the other existing messages is that the values can be split up over multiple lines, because the lists are expected to be long.

As far as I can tell, actually retrieving this message in PHP via wfMessage( 'wikibasecirrus-license-mapping' )->parse() will result in a plain string containing the above text. If we want to remove the hard-coded JSON data that populates the license dropdown menu and use the keys here instead, we need to parse the contents of this message first. It looks like WikibaseCirrusSearch does something similar in the HasLicenseFeature.php file. Do we need to introduce similar logic into SpecialMediaSearch.php so that we can read the contents of this file at runtime (picking up any new license types that may have been introduced by the community)?

The current i18n-to-array parsing happens in Wikibase\Search\Elastic\Hooks::onCirrusSearchAddQueryFeatures (this snippet). It would probably make sense to move that bit of code into its own public static function, so that it can then also be called from WikibaseMediaInfo/MediaSearch to retrieve the map.

As far as internationalizing any new license types, ideally message names would follow the same conventions we have already (I assume we'd want to loop through the keys in this file and look for messages defined under "prefix" + "key").
If the community were to add a new license group called "foo", messages for it should be filed under
wikibasemediainfo-special-mediasearch-filter-license-foo.

Correct.
Since these messages will be dynamic (based on whatever the contents of the license map are), we can't expose them to JS via the modules' messages list in extension.json, but will have to iterate them in PHP and expose their results to JS as JsConfigVar, I expect.
I'd prepackage the default set of license messages in en.json/qqq.json.

But since this message was community-defined, it won't live in the i18n/ folder in our source code – instead it will live at
https://commons.wikimedia.org/w/index.php?title=MediaWiki:Wikibasemediainfo-special-mediasearch-filter-license-foo/en, I assume.

The English (or rather: content language) version doesn't need the en subpage. But that's otherwise correct.

Is this more or less correct? If so: how do we let community admins know about where they should define labels for any new license groupings they might want to add in the future?

I don't know whether the license map has already really been announced yet (I think we're waiting for this to be complete before we do). We can add information about it there.
This change is as much for ourselves as it is for community, though: it'll ensure that, if the license map ever needs to change again, the fallout can be addressed at the same time by the same people (community or staff) on the same platform (on-wiki), rather than having to be synchronized across content & code (in multiple places)

Would it be appropriate to just add a use CirrusSearch\Util; line to SpecialMediaSearch.php in order to avoid duplicating this logic, or is introducing this kind of cross-extension dependency something we want to avoid?

CirrusSearch is already a dependency of WikibaseMediaInfo (and will also be for MediaSearch), so we can definitely use its code.

Is this more or less correct? If so: how do we let community admins know about where they should define labels for any new license groupings they might want to add in the future?

I don't know whether the license map has already really been announced yet (I think we're waiting for this to be complete before we do). We can add information about it there.

T272000 is assigned to @Keegan to get the info to the community. I didn't think we needed to wait for this to be done first; if so I'll make sure that holds off. Please let me know.

Change 665458 had a related patch set uploaded (by Eric Gardner; owner: Eric Gardner):
[mediawiki/extensions/WikibaseMediaInfo@master] Get license groups from community-defined wikipage

https://gerrit.wikimedia.org/r/665458

I have a patch up for review which should address the last two acceptance criteria:

  • MediaSearch must change so that it does haslicense:attribution and haslicense:attribution-same-license search calls, instead of the old keywords
  • MediaSearch must read the haslicense keywords from config instead of having them hardcoded

Once this is merged and deployed, I can go and reorganize the mappings page (which will address the rest of the criteria). Let me know if I misunderstand anything about how this all fits together, though.

I think that T275579 will need to be figured out before this task can be completed (without introducing duplicate code in our own extension and making a mess of things, at any rate).

Change 665458 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Programmatically generate search options in PHP

https://gerrit.wikimedia.org/r/665458

Once this patch has gone out to production (by latter half of next week, most likely), I can go in and make changes to the on-wiki interface pages. Users should notice no differences beyond the changing of keywords in the dropdown items and in the URL parameters when they filter by license; behavior will remain essentially unchanged.

Change 668363 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] Make WikibaseCirrusSearch a soft dependency

https://gerrit.wikimedia.org/r/668363

Change 668363 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Make WikibaseCirrusSearch a soft dependency

https://gerrit.wikimedia.org/r/668363

Now that this patch is live on production commons, I have updated the on-wiki license mapping page (https://commons.wikimedia.org/w/index.php?title=MediaWiki%3AWikibasecirrus-license-mapping&type=revision&diff=541481775&oldid=528845059).

I can confirm that the new "attribution", and "attribution-same-license" filter params are present and working on Special:MediaSearch on production.

Etonkovidova added a subscriber: Etonkovidova.

Checked on commons wmf.35 - looks good.

Now that this patch is live on production commons, I have updated the on-wiki license mapping page (https://commons.wikimedia.org/w/index.php?title=MediaWiki%3AWikibasecirrus-license-mapping&type=revision&diff=541481775&oldid=528845059).

I can confirm that the new "attribution", and "attribution-same-license" filter params are present and working on Special:MediaSearch on production.