Page MenuHomePhabricator

Avoid MySQL's ENUM type, which makes keyset pagination difficult
Open, Needs TriagePublic

Description

The MediaWiki database schema contains several ENUM columns (e.g. categorylinks.cl_type). ENUM columns can make keyset pagination difficult, as I have illustrated below using an enwiki DB dump from October 2015:

1MariaDB [enwiki_subset]> select cl_type, cl_sortkey, cl_from from categorylinks where cl_to = 'A.F.C._Emley' order by cl_type, cl_sortkey, cl_from\G
2*************************** 1. row ***************************
3 cl_type: page
4cl_sortkey:
5A.F.C. EMLEY
6 cl_from: 5664196
7*************************** 2. row ***************************
8 cl_type: subcat
9cl_sortkey: PLAYERS
10A.F.C. EMLEY PLAYERS
11 cl_from: 18585943
12*************************** 3. row ***************************
13 cl_type: file
14cl_sortkey: AFCEMLEY.PNG
15 cl_from: 19575733
163 rows in set (0.00 sec)
17
18MariaDB [enwiki_subset]> select cl_type, cl_sortkey, cl_from from categorylinks where cl_to = 'A.F.C._Emley' order by cl_type, cl_sortkey, cl_from limit 1\G
19*************************** 1. row ***************************
20 cl_type: page
21cl_sortkey:
22A.F.C. EMLEY
23 cl_from: 5664196
241 row in set (0.00 sec)
25
26MariaDB [enwiki_subset]> select cl_type, cl_sortkey, cl_from from categorylinks where cl_to = 'A.F.C._Emley' and cl_type > 'page' order by cl_type, cl_sortkey, cl_from limit 1\G
27*************************** 1. row ***************************
28 cl_type: subcat
29cl_sortkey: PLAYERS
30A.F.C. EMLEY PLAYERS
31 cl_from: 18585943
321 row in set (0.00 sec)
33
34MariaDB [enwiki_subset]> select cl_type, cl_sortkey, cl_from from categorylinks where cl_to = 'A.F.C._Emley' and cl_type > 'subcat' order by cl_type, cl_sortkey, cl_from limit 1\G
35Empty set (0.00 sec)
36
37MariaDB [enwiki_subset]> select cl_type, cl_sortkey, cl_from from categorylinks where cl_to = 'A.F.C._Emley' and cl_type > 'file' order by cl_type, cl_sortkey, cl_from\G
38*************************** 1. row ***************************
39 cl_type: page
40cl_sortkey:
41A.F.C. EMLEY
42 cl_from: 5664196
43*************************** 2. row ***************************
44 cl_type: subcat
45cl_sortkey: PLAYERS
46A.F.C. EMLEY PLAYERS
47 cl_from: 18585943
482 rows in set (0.00 sec)

In that example, the comparison in the WHERE clause apparently is using the string value, yet the ORDER BY clause is using the internal ID, so the category link from the file page gets missed. In fact, ApiQueryCategoryMembers has a workaround for this involving multiple queries. I noticed this after having started working on a fix for T58041: updateCollation.php script prohibitively slow for very large wikis. So now I realize the change that likely caused that issue not only severely hurt the performance of the script but also its correctness.

For this reason, along with the usual reasons for not using ENUM columns, perhaps we should stop using them, at least for indexed columns, and instead favor tinyint, varbinary, or binary(1).

In core, there are eight ENUM columns in five different tables: categorylinks.cl_type, image.img_media_type, image.img_major_mime, oldimage.oi_media_type, oldimage.oi_major_mime, filearchive.fa_media_type, filearchive.fa_major_mime, and uploadstash.us_media_type.

There are also some ENUM columns in extension tables:

  • ArticleFeedbackv5: aft_feedback.aft_discuss
  • CentralAuth: globaluser.gu_enabled_method, localuser.lu_attached_method, wikiset.ws_type, renameuser_status.ru_status, renameuser_queue.rq_status,
  • CentralNotice: cn_notice_log.notlog_action, cn_template_log.tmplog_action
  • ContentTranslation: cx_translations.translation_status
  • OAI: updates.up_action
  • SacredText: sacredtext_verses.st_religious_text
  • WikiLog: wikilog_comments.wlc_status

I'm not sure whether changing the data types of any of these columns (whether in core or extensions) could ever happen given the backlog of schema changes at WMF, though it's worth considering.

Event Timeline

PleaseStand raised the priority of this task from to Needs Triage.
PleaseStand updated the task description. (Show Details)
PleaseStand added subscribers: PleaseStand, aaron.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 20 2015, 7:37 AM