Page MenuHomePhabricator

Error with categorylinks table on update from 1.44.2 to 1.45.1
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Follow normal upgrade procedure.
  • php maintenance/run.php update

What happens?:

  • Error message.
  • Modifying primary key on table categorylinks...Wikimedia\Rdbms\DBQueryError from line 1225 of ...w/includes/libs/rdbms/database/Database.php: Error 1091: Can't DROP 'primary'; check that column/key exists Function: Wikimedia\Rdbms\Database::sourceFile( ...w/sql/mysql/patch-categorylinks-pk.sql ) Query: DROP INDEX primary ON `categorylinks

What should have happened instead?:

  • No error, finish update.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

  • 1.44.2 to 1.45.1

Other information (browser name/version, screenshots, etc.):

Event Timeline

I do not think it is a categorylinks specific issue. It appears that for some reasons wikis ended up without primary keys over time, see T324646 for an example of the same issue with templatelinks.

I fixed the table and got the update to run with no errors, but then cl_target_id turned into this:

| cl_target_id      | bigint unsigned              | NO   | PRI | NULL              |

I then changed cl_target_id to match the documentation on Manual:categorylinks table like so:

| cl_target_id      | bigint unsigned              | YES  | MUL | NULL              |

Rerunning the updater automatically changed it back to the PRI version again. Am I correct to leave it this way?

https://phabricator.wikimedia.org/source/mediawiki/browse/master/sql/mysql/patch-categorylinks-pk.sql

DROP INDEX `primary` ON /*_*/categorylinks;

ALTER TABLE /*_*/categorylinks
  CHANGE cl_target_id cl_target_id BIGINT UNSIGNED NOT NULL;

ALTER TABLE /*_*/categorylinks
  ADD PRIMARY KEY (cl_from, cl_target_id)

In 1.44 your categorylinks table should look like the following. Note the primary key is (cl_from, cl_to) and there already is the new cl_target_id column, but it is unused and nullable.

+-------------------+------------------------------+------+-----+---------------------+-------------------------------+
| Field             | Type                         | Null | Key | Default             | Extra                         |
+-------------------+------------------------------+------+-----+---------------------+-------------------------------+
| cl_from           | int(10) unsigned             | NO   | PRI | 0                   |                               |
| cl_to             | varbinary(255)               | NO   | PRI |                     |                               |
| cl_sortkey        | varbinary(230)               | NO   |     |                     |                               |
| cl_sortkey_prefix | varbinary(255)               | NO   |     |                     |                               |
| cl_timestamp      | timestamp                    | NO   |     | current_timestamp() | on update current_timestamp() |
| cl_collation      | varbinary(32)                | NO   |     |                     |                               |
| cl_type           | enum('page','subcat','file') | NO   |     | page                |                               |
| cl_collation_id   | smallint(5) unsigned         | NO   |     | 0                   |                               |
| cl_target_id      | bigint(20) unsigned          | YES  | MUL | NULL                |                               |
+-------------------+------------------------------+------+-----+---------------------+-------------------------------+

When upgrading from 1.44 to 1.45 there are multiple schema changes happening to categorylinks.

The first one is not really important, it only adds a missing index for the new column.

CREATE INDEX cl_timestamp_id ON /*_*/categorylinks (cl_target_id, cl_timestamp);

Next, the migrateLinksTable script is executed to populate the cl_target_id column.

Then, we run the schema change already mentioned here to make cl_target_id non-nullable and change the primary key from (cl_from, cl_to) to (cl_from, cl_target_id).

DROP INDEX `primary` ON /*_*/categorylinks;

ALTER TABLE /*_*/categorylinks
  CHANGE cl_target_id cl_target_id BIGINT UNSIGNED NOT NULL;

ALTER TABLE /*_*/categorylinks
  ADD PRIMARY KEY (cl_from, cl_target_id);

Finally, we drop the old cl_to column.

DROP INDEX cl_sortkey ON /*_*/categorylinks;

DROP INDEX cl_timestamp ON /*_*/categorylinks;

ALTER TABLE /*_*/categorylinks
  DROP cl_to,
  DROP cl_collation;

Note, there is also a migration happening from cl_collation to cl_collation_id in the same way without it being part of the primary key.

After the upgrade the table should looks like the following in my mariadb. I think the onwiki documentation is not correct, I will try to fix it.

+-------------------+------------------------------+------+-----+---------+-------+
| Field             | Type                         | Null | Key | Default | Extra |
+-------------------+------------------------------+------+-----+---------+-------+
| cl_from           | int(10) unsigned             | NO   | PRI | 0       |       |
| cl_sortkey        | varbinary(230)               | NO   |     |         |       |
| cl_sortkey_prefix | varbinary(255)               | NO   |     |         |       |
| cl_timestamp      | timestamp                    | NO   |     | NULL    |       |
| cl_type           | enum('page','subcat','file') | NO   |     | page    |       |
| cl_collation_id   | smallint(5) unsigned         | NO   |     | 0       |       |
| cl_target_id      | bigint(20) unsigned          | NO   | PRI | NULL    |       |
+-------------------+------------------------------+------+-----+---------+-------+

Regardsless, your original error was due to no primary key being present in the 1.44 table, which should not be the case.

We could maybe change the DROP to DROP IF EXISTS, but this might just result in issues like T332333.

Thank you for your help. Couple more questions:

What is the default for cl_timestamp supposed to be? My table looks like this:

describe categorylinks;
+-------------------+------------------------------+------+-----+-------------------+-----------------------------------------------+
| Field             | Type                         | Null | Key | Default           | Extra                                         |
+-------------------+------------------------------+------+-----+-------------------+-----------------------------------------------+
| cl_from           | int unsigned                 | NO   | PRI | 0                 |                                               |
| cl_sortkey        | varbinary(230)               | NO   |     |                   |                                               |
| cl_timestamp      | timestamp                    | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
| cl_sortkey_prefix | varbinary(255)               | NO   |     |                   |                                               |
| cl_type           | enum('page','subcat','file') | NO   |     | page              |                                               |
| cl_collation_id   | smallint unsigned            | NO   |     | 0                 |                                               |
| cl_target_id      | bigint unsigned              | NO   | PRI | NULL              |                                               |
+-------------------+------------------------------+------+-----+-------------------+-----------------------------------------------+
7 rows in set (0.01 sec)

What does it mean that the default for cl_target_id is NULL but Null is NO?

Thank you for your help. Couple more questions:

What is the default for cl_timestamp supposed to be? My table looks like this:

describe categorylinks;
+-------------------+------------------------------+------+-----+-------------------+-----------------------------------------------+
| Field             | Type                         | Null | Key | Default           | Extra                                         |
+-------------------+------------------------------+------+-----+-------------------+-----------------------------------------------+
| cl_from           | int unsigned                 | NO   | PRI | 0                 |                                               |
| cl_sortkey        | varbinary(230)               | NO   |     |                   |                                               |
| cl_timestamp      | timestamp                    | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
| cl_sortkey_prefix | varbinary(255)               | NO   |     |                   |                                               |
| cl_type           | enum('page','subcat','file') | NO   |     | page              |                                               |
| cl_collation_id   | smallint unsigned            | NO   |     | 0                 |                                               |
| cl_target_id      | bigint unsigned              | NO   | PRI | NULL              |                                               |
+-------------------+------------------------------+------+-----+-------------------+-----------------------------------------------+
7 rows in set (0.01 sec)

The default for cl_timestamp in your table is the correct one.

What does it mean that the default for cl_target_id is NULL but Null is NO?

NULL is just the standard default value shown in DESCRIBE if no other default value was set for that column. You don't need to worry about it, the important thing is that the column is set to be non-nullable.

Thank you and happy new year.