Page MenuHomePhabricator

Missing file (red link) tracking category support
Closed, ResolvedPublic

Description

I was chatting in #mediawiki about the best method for adding a hidden tracking category to pages in order assist in identifying and tracking pages that embed missing/deleted files. http://en.wikipedia.org/w/index.php?title=Counties_of_Uganda&oldid=340942829 is one example of a page which was badly in need to cleanup. Adding built in support for this type of tracking would make maintaining content a lot easier. Platonides suggested implementing this as part of the parser, which makes perfect sense since its used at every purge/refresh/save process


Version: unspecified
Severity: enhancement

Details

Reference
bz23816

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:08 PM
bzimport set Reference to bz23816.
bzimport added a subscriber: Unknown Object (MLST).

conrad.irwin wrote:

There's [[Special:WantedFiles]] but it seems to be horribly broken at the moment - I imagine that's due to the difficulty of keeping the information up to date automatically with remote file repos

That uses a database query which does not scale well with large wikis, and it tracks files not the articles where they are used. Its a lot easier with a list if pages to fix, than it is to track down each files usage

(In reply to comment #1)

There's [[Special:WantedFiles]] but it seems to be horribly broken at the
moment - I imagine that's due to the difficulty of keeping the information up
to date automatically with remote file repos

It's bug 6220.

I think bug #16112 would take care of this, but I'm not sure. Putting it as a blocker on this.

Not really, fixing bug 16112 would help making the missing files special page easier to work with, but this request serves a different purpose, I find these types of tracking categories (similar to what happens with Special:Cite errors) makes fixing this easier to fix. ~~~~

Created attachment 8432
patch to add a tracking category for broken images.

This is fairly trivial to do if we really want to add a tracking category for this.

So do we really want to add a tracking category for this sort of thing?

Attached:

(In reply to comment #6)

Created attachment 8432 [details]
patch to add a tracking category for broken images.

This is fairly trivial to do if we really want to add a tracking category for
this.

So do we really want to add a tracking category for this sort of thing?

btw, just to clarify - I can commit said patch, I'm just not sure if this is a bug we want to fix, or if its a wontfix type thing due to duplication (with the rather borked) special:wantedfiles.

Attached:

(In reply to comment #7)

btw, just to clarify - I can commit said patch, I'm just not sure if this is a
bug we want to fix, or if its a wontfix type thing due to duplication (with the
rather borked) special:wantedfiles.

Perhaps we could do this, and then also have wantedpages display the results of the category (basically fixing two things in one then.... because a category would be easier if any bots wanted to access it I would assume).

(In reply to comment #7)

btw, just to clarify - I can commit said patch, I'm just not sure if this is a
bug we want to fix, or if its a wontfix type thing due to duplication (with the
rather borked) special:wantedfiles.

Would this have less brokenness than special:wantedfiles? Or would it be similar?

If I understand this (no guarantee that I do), then a hidden category is going to have more consistency than the clean-up that WantedFiles relies on.

(In reply to comment #9)

Would this have less brokenness than special:wantedfiles? Or would it be
similar?

(Without looking at how the patch works) Well if a remote file existed it shouldn't produce a red link so it shouldn't add it to the category.

(In reply to comment #10)

Well if a remote file existed it shouldn't produce a red link so it shouldn't
add it to the category.

So, then, that means "less brokenness", right?

Not being able to read php I would assume that this modifies the parser or similar component so that during parsing of the page it checks to see if a file being used exists (both locally and remotely) and if it does not exists it inserts a tracking category. this would solve the brokenness of wanted files because it does not use a single query (which causes too much stress on large projects) and is constantly updated when a page is re-parsed. (I assume that this is being added via a mediawiki namespace message would give finer grain control of which pages get added to it so that different namespaces could be given different categories)

ok, fixed in r86534.

Note this is slightly different from special:wantedfiles, since wantedfiles is ordered by how many broken links their are to the file.

(In reply to comment #14)

Fix was reverted in trunk

It's been marked fixme, not reverted...

(In reply to comment #15)

(In reply to comment #14)

Fix was reverted in trunk

It's been marked fixme, not reverted...

And the issues have now been resolved, so remarking this bug as fixed.

cheers.

I think this needs to be better documented. Is it already somewhere where I overlooked it? Please add some documentation on mediawiki.org - I am not sure where, but

Release notes only mentions:

  • (bug 23816) A tracking category is now added for any pages with broken images.

I figured out that the attachment uses Mediawiki:broken-file-category which needs to be executed on local wiki to find the local string. Some results:

http://www.mediawiki.org/wiki/Category:Pages_with_broken_file_links
in German:
http://www.mediawiki.org/wiki/Kategorie:Seiten mit defekten Dateilinks
in German Wikipedia changed to:
http://de.wikipedia.org/wiki/Kategorie:Wikipedia:Defekter_Dateilink

(In reply to comment #17)

I think this needs to be better documented. Is it already somewhere where I
overlooked it? Please add some documentation on mediawiki.org - I am not sure
where, but

AFAIK tracking categories are not documented on mww (or anywhere else, actually), they're left to local categorization. Which is not very good, but that's bug 1 I guess.

I figured out that the attachment uses Mediawiki:broken-file-category which
needs to be executed on local wiki to find the local string.

Not really, you just have to follow the usual system: [[:translatewiki:Special:PrefixIndex/MediaWiki:Broken-file-category/]].

Not really, you just have to follow the usual system:
[[:translatewiki:Special:PrefixIndex/MediaWiki:Broken-file-category/]].

(translatewiki interwiki link is not a feature of mediawiki, although present on WMF sites).

Yes, rather than checking only your own translation, you can find all applicable translation under:

http://translatewiki.net/w/i.php?title=Special%3APrefixIndex&prefix=Broken-file-category%2F&namespace=8

However, if logged in, you need to look under the "native" language of the wiki, which may differ from the language in which the Wiki user interface is displayed.

AFAIK tracking categories are not documented on mww (or anywhere else,

actually), Which is not very good, but that's bug 1 I guess.

I agree; created Bug 33448.

(In reply to comment #8)

(In reply to comment #7)

btw, just to clarify - I can commit said patch, I'm just not sure if this is a
bug we want to fix, or if its a wontfix type thing due to duplication (with the
rather borked) special:wantedfiles.

Perhaps we could do this, and then also have wantedpages display the results of
the category (basically fixing two things in one then.... because a category
would be easier if any bots wanted to access it I would assume).

No, bots can access querypages just as easy as categorymembers. Both have APIs:

(In reply to comment #18)

(In reply to comment #17)

I think this needs to be better documented. Is it already somewhere where I
overlooked it? Please add some documentation on mediawiki.org - I am not sure
where, but

AFAIK tracking categories are not documented on mww (or anywhere else,
actually), they're left to local categorization. Which is not very good, but
that's bug 1 I guess.

I've added [[mw:Help:Tracking_categories]] which should hopefully help somewhat