The MediaWiki database schema contains several ENUM columns (e.g. categorylinks.cl_type). ENUM columns can make keyset pagination difficult, as I have illustrated below using an enwiki DB dump from October 2015:
In that example, the comparison in the WHERE clause apparently is using the string value, yet the ORDER BY clause is using the internal ID, so the category link from the file page gets missed. In fact, ApiQueryCategoryMembers has a workaround for this involving multiple queries. I noticed this after having started working on a fix for T58041: updateCollation.php script prohibitively slow for very large wikis. So now I realize the change that likely caused that issue not only severely hurt the performance of the script but also its correctness.
For this reason, along with the usual reasons for not using ENUM columns, perhaps we should stop using them, at least for indexed columns, and instead favor tinyint, varbinary, or binary(1).
In core, there are eight ENUM columns in five different tables: categorylinks.cl_type, image.img_media_type, image.img_major_mime, oldimage.oi_media_type, oldimage.oi_major_mime, filearchive.fa_media_type, filearchive.fa_major_mime, and uploadstash.us_media_type.
There are also some ENUM columns in extension tables:
- ArticleFeedbackv5: aft_feedback.aft_discuss
- CentralAuth: globaluser.gu_enabled_method, localuser.lu_attached_method, wikiset.ws_type, renameuser_status.ru_status, renameuser_queue.rq_status,
- CentralNotice: cn_notice_log.notlog_action, cn_template_log.tmplog_action
- ContentTranslation: cx_translations.translation_status
- OAI: updates.up_action
- SacredText: sacredtext_verses.st_religious_text
- WikiLog: wikilog_comments.wlc_status
I'm not sure whether changing the data types of any of these columns (whether in core or extensions) could ever happen given the backlog of schema changes at WMF, though it's worth considering.