Page MenuHomePhabricator

Improving PageImages filtering of non-free images
Open, MediumPublic

Description

In T177122, a short term fix was made for specific instance of PageImage incorrectly returning non-free images. There appears to be some agreement that the fix was inadequate in the broader view. It appears that PageImage filters based on the file page having a particular <span> value applied to the file page. I would describe this method as unreliable, invisible, redundant, and apparently undocumented (at least undocumented on EnWiki).

Can PageImages replace (or supplement) non-free filtering with a check for [[Category:All non-free media]]? This is how the community tracks non-free files. I believe the category would be a far more reliable mechanism. It is also the obvious means the community would use when attempting clean up incorrect usage of a non-free file.

P.S. Wikidata has a list of the equivalent category on 35 wikis. I am unsure whether that covers all wikis that allow non-free media.

Event Timeline

Tgr subscribed.

Using the the category is easy (there already is a category parser in CommonsMetadata, although it doesn't really do anything useful) but pointless: the category and the invisible markup are generated by the same template ({{non-free media}}) so it wouldn't make any difference. (Except that it was requested that it should not set the metadata in some cases so it doesn't. You can find that discussion here.) As I said elsewhere the problem is that information coming from that template can be overriden by information coming from other templates (there is no reason other templates should do that but apparently it does happen) - this is tracked in T131896, also easy to fix, would fix the issue on other wikis as well and already has a patch prepared so really all that's needed is for someone to take interest and merge it.

As for local documentation, feel free to create it, although given that it works the same way on every wiki I'm not sure what's the point. The central documentation is at https://commons.wikimedia.org/wiki/Commons:Machine-readable_data . (Note that this is a temporary measure. The structured data project will eventually provide less hacky means of tracking license metadata.)

As has already been demonstrated, running two parallel systems to track the same information is unreliable. And you acknowledge that more work would be needed (T131896) trying to fix the span method. Given that "Using the category is easy", I'm puzzled why you'd even bother advocating invisible, unreliable, duplicate, currently-broken method which needs repairs?

Maintaining multiple alternative means for the same thing is unecessary code complexity and confusing UX; a solution exists already, and it is cross-wiki and conforms to Commons standards; to the extent it is broken it needs to be fixed anyway, because it is used on many other wikis. Adding anyother method is unnecessary complexity and unnecessary extra work.

(FWIW the whole license parsing thing is fairly pointless; I wrote it against my better judgement at the time.)

Ramsey-WMF subscribed.

As Tgr mentioned, this is related to T131896 , which has a patch that we'll review soon. That should fix the problem temporarily. Long term, this is an issue we'll address as part of Structured Licenses in the Structured Data on Commons project.