Improving PageImages filtering of non-free images
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Alsee
	Oct 1 2017, 2:54 AM

Description

In T177122, a short term fix was made for specific instance of PageImage incorrectly returning non-free images. There appears to be some agreement that the fix was inadequate in the broader view. It appears that PageImage filters based on the file page having a particular <span> value applied to the file page. I would describe this method as unreliable, invisible, redundant, and apparently undocumented (at least undocumented on EnWiki).

Can PageImages replace (or supplement) non-free filtering with a check for [[Category:All non-free media]]? This is how the community tracks non-free files. I believe the category would be a far more reliable mechanism. It is also the obvious means the community would use when attempting clean up incorrect usage of a non-free file.

P.S. Wikidata has a list of the equivalent category on 35 wikis. I am unsure whether that covers all wikis that allow non-free media.

Related Objects

Mentioned In: T177122: Non-free images incorrectly appearing in RelatedPages
Mentioned Here: T131896: CommonsMetadata should consider an image non-free if any of the license metadata blocks is nonfree
T177122: Non-free images incorrectly appearing in RelatedPages

Event Timeline

Alsee created this task.Oct 1 2017, 2:54 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 1 2017, 2:54 AM

Alsee updated the task description. (Show Details)Oct 1 2017, 3:00 AM

Alsee mentioned this in T177122: Non-free images incorrectly appearing in RelatedPages.Oct 1 2017, 3:04 AM

Using the the category is easy (there already is a category parser in CommonsMetadata, although it doesn't really do anything useful) but pointless: the category and the invisible markup are generated by the same template ({{non-free media}}) so it wouldn't make any difference. (Except that it was requested that it should not set the metadata in some cases so it doesn't. You can find that discussion here.) As I said elsewhere the problem is that information coming from that template can be overriden by information coming from other templates (there is no reason other templates should do that but apparently it does happen) - this is tracked in T131896, also easy to fix, would fix the issue on other wikis as well and already has a patch prepared so really all that's needed is for someone to take interest and merge it.

As for local documentation, feel free to create it, although given that it works the same way on every wiki I'm not sure what's the point. The central documentation is at https://commons.wikimedia.org/wiki/Commons:Machine-readable_data . (Note that this is a temporary measure. The structured data project will eventually provide less hacky means of tracking license metadata.)

Restricted Application added a project: Multimedia. · View Herald TranscriptOct 1 2017, 5:21 AM

As has already been demonstrated, running two parallel systems to track the same information is unreliable. And you acknowledge that more work would be needed (T131896) trying to fix the span method. Given that "Using the category is easy", I'm puzzled why you'd even bother advocating invisible, unreliable, duplicate, currently-broken method which needs repairs?

Maintaining multiple alternative means for the same thing is unecessary code complexity and confusing UX; a solution exists already, and it is cross-wiki and conforms to Commons standards; to the extent it is broken it needs to be fixed anyway, because it is used on many other wikis. Adding anyother method is unnecessary complexity and unnecessary extra work.

(FWIW the whole license parsing thing is fairly pointless; I wrote it against my better judgement at the time.)

Jdlrobson added a project: Web-Team-Backlog (Tracking).Oct 2 2017, 6:55 PM

Jdlrobson moved this task from Untriaged to Untag on the Web-Team-Backlog (Tracking) board.Oct 3 2017, 3:49 PM

Jdlrobson moved this task from Untag to Move to Backlog on the Web-Team-Backlog (Tracking) board.

As Tgr mentioned, this is related to T131896 , which has a patch that we'll review soon. That should fix the problem temporarily. Long term, this is an issue we'll address as part of Structured Licenses in the Structured Data on Commons project.

• Ramsey-WMF triaged this task as Medium priority.Nov 24 2017, 5:52 PM

Jdlrobson moved this task from Backlog to Selection algorithm on the PageImages board.Jun 18 2020, 5:05 PM

Jdlrobson moved this task from Move to Backlog to Untriaged on the Web-Team-Backlog (Tracking) board.Sep 7 2021, 9:05 PM

Jdlrobson moved this task from Untriaged to Untag on the Web-Team-Backlog (Tracking) board.Sep 8 2021, 3:25 PM

LGoto removed a project: Web-Team-Backlog (Tracking).Nov 9 2021, 4:23 PM

Improving PageImages filtering of non-free imagesOpen, MediumPublicActions

Description

Related Objects

Event Timeline

Improving PageImages filtering of non-free images
Open, MediumPublic
Actions