Page MenuHomePhabricator

Add ability to restrict number of search terms
Open, MediumPublic

Description

Author: afeldman

Description:

Steps to reproduce:

  1. Go to MediaWiki
  2. In the upper right corner, click on the search field.
  3. Make a search query with a string so long that the whole search URL has a length of 8912 bytes.

I. Observed: The search will be accepted. This shows that no regulation is applied, just the standard URL length regulation of Apache exists which does not process URLs longer than 8912 bytes.
II. Expected: Bots frequently submit huge spam documents as search queries. In order to limit system impact, there should be a config option to set the maximum number of terms used in a search. Terms > than that should be stripped off transparently.


Version: 1.20.x
Severity: normal

Details

Reference
bz37223

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:29 AM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz37223.
bzimport added a subscriber: Unknown Object (MLST).

Number of terms (means tokenizing & determining whether operators count as terms or not?), or is just raw length of search query sufficient?

afeldman wrote:

I think raw query length would be a better approach, though we can set it based on the (most OR expensive OR query OR we) AND "want to" OR support.

With this approach, perhaps it could be implemented in the core SearchEngine class so as to be agnostic to the search backend or extension used?

This task is still to be solved, restrictions haven't been added yet. This was tested with Mozilla Firefox version 34.0.5 on Microsoft Windows 8.1.
How to recheck:

  1. Go to MediaWiki
  2. In the upper right corner, click on the search field.
  3. Make a search query with a string so long that the whole search URL has a length of 8912 bytes.
  4. The search will be accepted.
  5. This shows that no regulation is applied, just the standard URL length regulation of Apache exists which does not process URLs longer than 8912 bytes.
Nemo_bis set Security to None.