Allow Special:MIMESearch to work under miser mode
Closed, ResolvedPublic

Description

Proposed patch

Special:MIMESearch will work efficiently on Wikimedia if we will add indexing by (img_major_mime, img_minor_mime).


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/wiki/Special:MIMESearch
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=34969

attachment patch-mimesearch.patch ignored as obsolete

bzimport set Reference to bz13438.
vvv created this task.Via LegacyMar 19 2008, 10:07 PM
brion added a comment.Via ConduitMar 19 2008, 10:09 PM

You should be able to use the generic index updater function instead of writing a custom one here.

demon added a comment.Via ConduitJun 26 2008, 8:42 PM

Created attachment 5021
Updated patch, use add_index()

Updated the previous patch to use add_index() per Brion's comments.

attachment test.patch ignored as obsolete

bzimport added a comment.Via ConduitJun 26 2008, 8:48 PM

ayg wrote:

You have unrelated changes to User.php there. Also, I suggest you commit the reformatting of the img_sha1 lines separately and recreate the patch, because they just distract from the patch's actual content.

demon added a comment.Via ConduitJul 7 2008, 4:28 PM

Created attachment 5057
With fixes

Updated previous patch. Removed unrelated changes.

Attached: img_mime.patch

brion added a comment.Via ConduitJul 7 2008, 5:25 PM

This schema tweak will do the job of making the query faster, but IMHO it's not a super great system to begin with. If we have to make a schema change, it might be good to consider the basic issues first:

  • There's no secondary sorting or filtering, which means it's going to be of very limited utility unless you're searching for a particularly exotic type.

The results list will appear in semi-random order, paging through results will be very slow.

A secondary index on name would at least allow for basic ordering and index-based paging.

brion added a comment.Via ConduitDec 28 2008, 9:46 PM

This index *might* be useful sometimes for bulk statistics, but the core case (a sensible way of searching based on mime type) would probably be better served by making some image metadata (including mime type) available to the fulltext search index.

If appropriately integrated, you could then do a search for soemthing like:

image:moon landing mime:application/ogg

and get a sensible result of files with a text match for "moon landing" and a MIME type matching application/ogg.

bzimport added a comment.Via ConduitApr 11 2011, 6:08 AM

brettz9 wrote:

I'm building a tool (at http://brettz9.github.com/xqueryeditor/ ) to allow Ajax browsing of Mediawiki articles, currently for the purpose of performing XQueries against XML stored on wikis, and hopefully for optional local IndexedDB storage as well. It is very unsafe to make these queries at the moment (working on that), but especially after whenever I may be able to get that resolved, I'd want to be able to point people by default to logical locations for a starting point for browsing XML documents at any given Mediawiki wiki.

Currently, when the user chooses a Mediawiki wiki, I'm supplying its root category by default, but it would be great if the API could filter out only those categories belonging to a particular MIME type (or at least if the MIME search worked) so I could avoid my users seeing non-XML pages (though I could parse a page fully into XHTML and expose that once I can figure out how to do that properly through the API). And it would be nice to do all this if this would not require users to manually add categories for these file format types.

(Incidentally, would be great to have the ability to directly edit XML files such as SVG (and TEI--my main interest) with the benefit of diffs and all, rather than needing to treat them as images on the one hand, or to put them directly within articles without the choice of whether to disable wiki markup.)

Catrope added a comment.Via ConduitMay 14 2011, 4:35 PM

The attached patch will probably still apply (almost; the updaters.inc part would have to be done manually I guess), but the index should probably be on (major_mime, minor_mime name) to facilitate paging. Also, Special:MIMESearch's queries should be looked at to see what kind of index we'd actually need, and maybe tweaked to be more reasonable (like, use proper paging instead of OFFSET). I'd also like to change it to no longer be a QueryPage, because parameterized QueryPages don't really make sense.

And of course we should also expose this functionality in the API :)

bzimport added a comment.Via ConduitNov 10 2011, 6:56 AM

sumanah wrote:

+reviewed since folks have given Chad code review

Bawolff added a comment.Via ConduitJun 7 2013, 5:41 PM

I did another attempt at this. With a little bit more complexity on the php side, I believe it is possible to do this efficiently without adding any more indicies.

I agree that searching for mimes can be done in much better way, but this sort of simple use still has its uses. Thus if we can make it work without messing with the indicies, I think we should. (That said, we should still attempt to do something better for searching by mime type in the mysterious future. Fixing this doesn't mean we can't have both)

Please see gerrit change 67468 (Where are thou gerrit notification bot?)

gerritbot added a comment.Via ConduitJul 25 2013, 3:58 AM

Change 67468 merged by jenkins-bot:
Make Special:MIMESearch a non-expensive special page.

https://gerrit.wikimedia.org/r/67468

Umherirrender added a comment.Via ConduitJul 25 2013, 5:02 PM

successfully merged

Gilles added a project: Multimedia.Via WebDec 4 2014, 10:53 AM
Gilles moved this task to Closed on the Multimedia workboard.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.