Page MenuHomePhabricator

Find a solution for cl_timestamp
Closed, ResolvedPublic

Description

I mistakenly turned this field from timestamp type in mysql to binary(14). This needs fixing but how the data can be handled?

Currently it's like this:

wikiadmin@10.64.48.109(fawiki)> select cl_timestamp from categorylinks limit 5;
+---------------------+
| cl_timestamp        |
+---------------------+
| 2020-05-13 17:38:42 |
| 2020-04-20 12:02:14 |
| 2020-07-03 21:42:41 |
| 2020-07-19 22:34:03 |
| 2020-06-20 17:57:30 |
+---------------------+
5 rows in set (0.00 sec)

While if it turns to binary(14), the data should be like this:

wikiadmin@10.64.48.109(fawiki)> select pt_timestamp from protected_titles limit 5;
+----------------+
| pt_timestamp   |
+----------------+
| 20080114200342 |
| 20080618090959 |
| 20080626174006 |
| 20080627221357 |
| 20080627222913 |
+----------------+
5 rows in set (0.00 sec)

I couldn't find a maintenance script to do the change and if it exists, how the migration should look like? Unfortunately I'm not 100% sure how mediawiki handles timestamps, looking at the ConvertibleTimestamp class, it seems that it can just read the value and find out the format and return the correct type, so it should be fine by just turning it a string first and then fixing the values. Has anything similar was done before?

Event Timeline

This might give an idea what to do:

MariaDB [client]> select * from categorylinks limit 5;
+---------+-------+------------+-------------------+---------------------+--------------+---------+
| cl_from | cl_to | cl_sortkey | cl_sortkey_prefix | cl_timestamp        | cl_collation | cl_type |
+---------+-------+------------+-------------------+---------------------+--------------+---------+
|       1 | WTF   | MAIN PAGE  |                   | 2020-12-13 17:02:26 | uppercase    | page    |
+---------+-------+------------+-------------------+---------------------+--------------+---------+
1 row in set (0.001 sec)

MariaDB [client]> ALTER TABLE /*_*/categorylinks MODIFY cl_timestamp BINARY(14) NOT NULL;
ERROR 1406 (22001): Data too long for column 'cl_timestamp' at row 1
MariaDB [client]> ALTER TABLE /*_*/categorylinks MODIFY cl_timestamp BINARY(20) NOT NULL;
Query OK, 1 row affected (0.065 sec)               
Records: 1  Duplicates: 0  Warnings: 0

MariaDB [client]> select * from categorylinks limit 5;
+---------+-------+------------+-------------------+----------------------+--------------+---------+
| cl_from | cl_to | cl_sortkey | cl_sortkey_prefix | cl_timestamp         | cl_collation | cl_type |
+---------+-------+------------+-------------------+----------------------+--------------+---------+
|       1 | WTF   | MAIN PAGE  |                   | 2020-12-13 17:02:26  | uppercase    | page    |
+---------+-------+------------+-------------------+----------------------+--------------+---------+
1 row in set (0.002 sec)

MariaDB [client]> select * from categorylinks limit 5;
+---------+-------+------------+-------------------+----------------------+--------------+---------+
| cl_from | cl_to | cl_sortkey | cl_sortkey_prefix | cl_timestamp         | cl_collation | cl_type |
+---------+-------+------------+-------------------+----------------------+--------------+---------+
|       1 | WTF   | MAIN PAGE  |                   | 2020-12-13 17:02:26  | uppercase    | page    |
|      43 | WTF   | WTF        |                   | 20201213170509       | uppercase    | page    |
+---------+-------+------------+-------------------+----------------------+--------------+---------+
2 rows in set (0.001 sec)

Basically first change it to varbinary(20), then changing the values somehow (writing a new maintenance script?) and then reducing the size to binary(14)

Change 651696 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@master] [WIP] Bring back timestamp time of cl_timestamp instead of binary(14)

https://gerrit.wikimedia.org/r/651696

This might give an idea what to do:

MariaDB [client]> select * from categorylinks limit 5;
+---------+-------+------------+-------------------+---------------------+--------------+---------+
| cl_from | cl_to | cl_sortkey | cl_sortkey_prefix | cl_timestamp        | cl_collation | cl_type |
+---------+-------+------------+-------------------+---------------------+--------------+---------+
|       1 | WTF   | MAIN PAGE  |                   | 2020-12-13 17:02:26 | uppercase    | page    |
+---------+-------+------------+-------------------+---------------------+--------------+---------+
1 row in set (0.001 sec)

MariaDB [client]> ALTER TABLE /*_*/categorylinks MODIFY cl_timestamp BINARY(14) NOT NULL;
ERROR 1406 (22001): Data too long for column 'cl_timestamp' at row 1

This is a strict-mode error, so it will not happen in production (T108255), although the values (existing ones) would be badly truncated to fit the column. For instance 2020-12-13 17:02:26 would become 2020-12-13 17: which is clearly invalid timestamp and so would lead to errors in timestamp-related functions. So yeah I agree reverting the change is necessary now.

Basically first change it to varbinary(20), then changing the values somehow (writing a new maintenance script?) and then reducing the size to binary(14)

varbinary(19), would do too as no value should be longer than that, and yes this intermediary seems necessary too. Also whatever is decided here, it applies to the exptime on objectcache if we're to convert it from datetime to mw timestamp too.

Also whatever is decided here, it applies to the exptime on objectcache if we're to convert it from datetime to mw timestamp too.

But exptime is for a cache, meaning no data loss (and update.php truncates the table anyway) so that should be straightforward (drop everything, do the alter table, leave it). It would also not affect production given that we mostly use memcached for that.

Also whatever is decided here, it applies to the exptime on objectcache if we're to convert it from datetime to mw timestamp too.

But exptime is for a cache, meaning no data loss

No data loss, but there would be invalid values in db and the result of that will manifest where such values are used, see below.

(and update.php truncates the table anyway)

That cannot be assumed to be true always. It's configurable, the user can decide not truncate it.

so that should be straightforward (drop everything, do the alter table, leave it).

I don't think so, if the user decides not truncate the table and this proposed schema change went ahead to invalidate the exptime values ( by forcing reducing them to 14 char) that'd still be problematic for MediaWiki even if it were not for WMF.

MariaDB [wikidb]> select keyname, exptime from objectcache limit 1\G
*************************** 1. row ***************************
keyname: wikidb:MWSession:ujr1d46hg7pn0v0fuuf4nmkt2mdojb6r
exptime: 2020-12-31 07:17:37
1 row in set (0.00 sec)

If the above exptime is truncated to 2020-12-31 07:, there would be undesired effect elsewhere. For instance ConvertibleTimestamp::convert( TS_UNIX, $exptime ) would evaluate to false for all existing values. It would not, if the change to binary(14) is not done. The comparison here would also be affected. I agree, it might not be a big problem if these values are no longer accurate/valid, but we should make this change knowing upfront that it's not problem-free.

It would also not affect production given that we mostly use memcached for that.

But it would affect installations not using that?

Change 651696 merged by jenkins-bot:
[mediawiki/core@master] Bring back timestamp time of cl_timestamp instead of binary(14)

https://gerrit.wikimedia.org/r/651696

Ladsgroup claimed this task.

The short-term solution has been found. For long-term work, I'll create a ticket or continue in the standardize ticket.