Page MenuHomePhabricator

For table categorylinks, add cl_from to the cl_sortkey index
Closed, ResolvedPublic

Description

When API enumerates through the list of pages in a given category, it needs a way to resume the query. Sortkey provides a good point from which to continue, but has one drawback - more than one page may have identical one, which may lead to the following bug scenario:

Assuming there are 20 pages in a category, and page 10 and 11 both have identical sortkey, the user's query may request go 10 pages at a time. The sortkey to continue from would be the value of #11, but since it is the same as #10, #10 will be returned twice - in both the first and second resultset. This might even result in an infinite loop - requesting one item at a time would reach #10 and never advance to #11.

Solution: sort by sortkey + cl_from, and store both the sortkey and cl_from as the starting point.

To optimize query execution, cl_sortkey needs to be modified by adding cl_from at the end:

ALTER TABLE wikidb.categorylinks DROP INDEX cl_sortkey,
ADD INDEX cl_sortkey USING BTREE(cl_to, cl_sortkey, cl_from)
, ENGINE = MyISAM;


Version: unspecified
Severity: enhancement

Details

Reference
bz10280

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:52 PM
bzimport set Reference to bz10280.
bzimport added a subscriber: Unknown Object (MLST).

Checked in r23016 schema update. Pending servers update.

Another check-in r23228 - sql table scripts.

jon.1234 wrote:

This seems to contribute to the common http://bugzilla.wikimedia.org/show_bug.cgi?id=4445 bug (index key too long). See my comments there. I've "reopened" this bug but apologise if this was not the correct course of action. Best wishes.

ayg wrote:

Not relevant, that's a separate issue. Re-resolving.