- Affected components: MediaWiki core and WMF-maintained extensions.
- Policy steward: DBA
- Stakeholders:
Motivation
The MediaWiki database schema contains several ENUM columns (see below in Current uses).
Issues:
- ENUM columns can make keyset pagination difficult, as illustrated in the Oct 2015 example below.
- Not all DBMSes support ENUMs.
- Changing an ENUM requires a schema change, which slows down development.
Requirements
- Agreement that ENUM() is rarely needed in WMF-deployed MediaWiki database schemas.
- Document some of the things it is currently used for and what we recommend instead.
- Document some of the typical use cases for it in the industry, and what we recommend instead.
- Decide whether to allow it at all for new schemas, and if so what the process for that should be.
Exploration
Oct 2015 example
Below is from an enwiki DB dump from October 2015:
The comparison in the WHERE clause apparently is using the string value, yet the ORDER BY clause is using the internal ID, so the category link from the file page gets missed. In fact, ApiQueryCategoryMembers has a workaround for this involving multiple queries. This was noticed as part of T58041. The change that likely caused that issue not only severely hurt the performance of the script, but also its correctness.
Current uses
In core, there are eight ENUM columns in five different tables (see tables.sql):
- categorylinks.cl_type
- image.img_media_type
- image.img_major_mime
- oldimage.oi_media_type
- oldimage.oi_major_mime
- filearchive.fa_media_type
- filearchive.fa_major_mime
- uploadstash.us_media_type.
There are also some ENUM columns in extension tables:
- ArticleFeedbackv5: aft_feedback.aft_discuss
- CentralAuth: globaluser.gu_enabled_method, localuser.lu_attached_method, wikiset.ws_type, renameuser_status.ru_status, renameuser_queue.rq_status,
- CentralNotice: cn_notice_log.notlog_action, cn_template_log.tmplog_action
- ContentTranslation: cx_translations.translation_status
- OAI: updates.up_action
- SacredText: sacredtext_verses.st_religious_text
- WikiLog: wikilog_comments.wlc_status
I'm not sure whether changing the data types of any of these columns (whether in core or extensions) could ever happen given the backlog of schema changes at WMF, though it's worth considering separately from this RFC.
Alternatives
Indexed columns could use tinyint, varbinary, or binary(1) (or more appropriate type).
Encourage normalization of schemas, with justification needed for exceptions.
Virtual columns.