I think the current patch doesn't work for all cases yet. The gadget replaces certain search terms by their "expanded" variations. Users can combine several subcategory searches with other search terms and with each other.
For instance, this search string 'Music deepcat:Cello deepcat:Person' would be transformed by the gadget to something like this:
Music incategory:id:123|incategory:id:456[...] incategory:id:432|incategory:id:765[...]
So that the search result would be the intersection of categories 'Cello' and 'Person', including subcategories, and pages which contain the word 'Music'.
If I understand the patch correctly, the current regexp wouldn't allow such a search string to exceed the length limit. Can that be fixed?
|mediawiki/extensions/CirrusSearch : master||Bypass max query length if query contains incategory operator.|
So that we can avoid the cat-and-mouse situation where we change the query length exception only to find out afterwards about another kind of query that the gadget can emit that is not covered by the exception, please provide an exhaustive list of all query signatures that your DeepCat gadget can query the API with. Thanks!
For any search string given by a user, the gadget replaces all occurences of 'deepcat:CATEGORY' with the list of the page_ids of CATEGORY and its subcategories recursively, in the form
A query can contain one or more such category lists. Each list has one or more entries. All other search terms and operators are untouched and fed to Cirrus unmodified. The order of search terms is preserved as the user typed them.
User types this search string:
Music +deepcat:Cello -deepcat:Person
Modified search string passed to Cirrus:
Music +incategory:id:123|id:5|id:234 -incategory:id:432|id:7|id:4321
It's probably easiest to just allow the maximum query length to be bypassed if there's an incategory operator anywhere in it, then. In the interests of resolving this once and for all, now would be the time to let me know if that doesn't work. :-)
Stas proposed a solution in two pass.
- limit the query with a rather large value (btw it's already limited by GET no?)
- extract all the special syntax (incategorie, intitle, insource...) as we do already
- check the resulting query size after extraction with the low limit (300)
This would allow users to run long queries with special syntax but a "normal query" would be limited to 300 chars.
- normal queries without special syntax will be limited to 300 chars.
- random garbage queries will be limited to 300 chars.
- a query with : incategory:id:1|id:2|...id:100 with some words would be hard limited to the GET size and maybe another hard limit in cirrus (3000?) but the with some words part will be limited to 300 chars.
Would that solution works?