Page MenuHomePhabricator

Category sort keys can exceed the capacity of cl_sortkey, yet that is not correctly handled
Open, Needs TriagePublic

Description

The cl_sortkey column can only hold 230 bytes, and sort keys generated by the subclasses of Collation can be longer than that, even for the simple "identity" and "uppercase" collations. Unfortunately, this is not correctly handled by MediaWiki. This would result in either an error or silent truncation of the sort key when updating the DB. Specifically:

  • LinksUpdate only truncates the sort key prefixes stored in $this->mCategories (and confusingly referred to as "sort keys") to 255 bytes. It does not truncate actual sort keys at all.
  • MovePage resets sort keys, and it also does not truncate them before sending them to the DB.

Also, there are issues when performing read queries:

CategoryViewer's pagination code does not take truncation (or even the possibility of a duplicate sort key) into account, which could cause pages to be skipped going forward or to repeat going backward, because the truncated sort key sorts before the untruncated sort key. Also, duplication of sort keys could cause pages to repeat going forward or to be skipped going backward.

One way to address this would be to, in addition to truncating the generated sort key before performing the query, add the page ID to the next/previous page URL (e.g. as a third part of the {page,subcat,file}{from,until} value, separated by a pipe character, tab, or other character disallowed in titles that is not a line feed).

Category::getMembers() allows an $offset argument, which is just the sort key and not also the page ID, though I did not find any callers that actually supply an offset.

Some extensions probably also have similar issues.