Page MenuHomePhabricator

fa_deleted_timestamp and fa_timestamp are binary(14) in code but varbinary(14) in production
Closed, ResolvedPublic

Description

Showed up in drift report using abstract schema:

"filearchive fa_deleted_timestamp field-type-mismatch": {
    "s1": [
        "db1083:enwiki",
        "db1084:enwiki",
        "db1169:enwiki",
        "db1118:enwiki",
        "db1163:enwiki",
        "db1106:enwiki",
        "db1164:enwiki",
        "db1119:enwiki",
        "db1134:enwiki"
    ],
    "s2": [
        "db1122:svwiki",
        "db1129:svwiki",
        "db1076:svwiki",
        "db1074:svwiki"
    ],
    "s3": [
        "db1123:aswikibooks",
        "db1166:aswikibooks",
        "db1157:aswikibooks",
        "db1175:aswikibooks",
        "db1112:aswikibooks"
    ],
    "s6": [
        "db1131:ruwiki",
        "db1168:ruwiki",
        "db1085:ruwiki",
        "db1173:ruwiki"
    ],
    "s7": [
        "db1086:cawiki",
        "db1127:cawiki",
        "db1136:cawiki",
        "db1079:cawiki"
    ]
},

At first, I thought it was a mistake in abstraction (T42626: Standardise type of timestamp database fields (MySQL) T230428: Migrate tables.sql to abstract schema) but in the old code it was binary(14) and hasn't changed since 2007.

In production it's

wikiadmin@10.64.32.77(enwiki)> desc filearchive;
+----------------------+-------------------------------------------------------------------------------------------------------------+------+-----+---------+----------------+
| Field                | Type                                                                                                        | Null | Key | Default | Extra          |
+----------------------+-------------------------------------------------------------------------------------------------------------+------+-----+---------+----------------+
| fa_deleted_timestamp | varbinary(14)                                                                                               | YES  | MUL |         |                |

While in the code it's fa_deleted_timestamp BINARY(14) DEFAULT NULL, (and it should be this). The same situation applies to fa_timestamp field as well

ALTER TABLE /*_*/filearchive
  MODIFY fa_deleted_timestamp BINARY(14) DEFAULT NULL,
  MODIFY fa_timestamp BINARY(14) DEFAULT NULL;

Schema change progress:

  • s1
    • eqiad
    • codfw
  • s2
    • eqiad
    • codfw
  • s3
    • eqiad
    • codfw
  • s4
    • eqiad
    • codfw
  • s5
    • eqiad
    • codfw
  • s6
    • eqiad
    • codfw
  • s7
    • eqiad
    • codfw
  • s10
  • labtestwiki

Confirmed that s8 is not affected.

Event Timeline

Ladsgroup renamed this task from fa_deleted_timestamp is binary(14) in code but varbinary(14) in production to fa_deleted_timestamp and fa_timestamp are binary(14) in code but varbinary(14) in production.Mar 10 2021, 11:52 PM
Ladsgroup updated the task description. (Show Details)
Marostegui triaged this task as Medium priority.Mar 11 2021, 7:26 AM
Marostegui moved this task from Triage to Refine on the DBA board.

@Ladsgroup does the DEFAULT ' ' need to stay or it should be DEFAULT NULL as you mentioned?

If that is the case, alter needed:

alter table filearchive modify fa_deleted_timestamp BINARY(14) DEFAULT NULL;  alter table filearchive modify fa_timestamp BINARY(14) DEFAULT NULL;

As we have a s7 failover scheduled for 23rd, I will try to get s7 fixed so the new master has this (this is a not DDL change)

It should be DEFAULT NULL. Empty string is not a valid timestamp (mediawiki core accepts it but not Postgres).

What is needed to move this out of Refine column?

Come up with the alter table statement and identify which hosts really need it in all the sections apart from s1 (as far as I remember your script only checked certain hosts but not all?)

Come up with the alter table statement

The alter table is similar to T268392: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14):

ALTER TABLE /*_*/filearchive
  MODIFY fa_deleted_timestamp BINARY(14) DEFAULT NULL,
  fa_timestamp BINARY(14) DEFAULT NULL;

(Haven't tested it though)

and identify which hosts really need it in all the sections apart from s1 (as far as I remember your script only checked certain hosts but not all?)

It checks all hosts but not all wikis, one wiki (randomly selected) per section. This shows we have s1,s2,s3,s4,s6,s7. I included s4 because it very likely tested it on testcommonswiki and not commonswiki.

Mentioned in SAL (#wikimedia-operations) [2021-07-12T14:58:06Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-12T14:58:13Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-12T15:08:16Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-12T15:08:22Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-12T15:15:54Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-12T15:16:01Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277116

Kormat updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2021-07-12T15:28:24Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-12T15:28:31Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-12T15:31:49Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-12T15:31:55Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-12T16:01:04Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116 - extending downtime

Mentioned in SAL (#wikimedia-operations) [2021-07-12T16:01:11Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116 - extending downtime

Mentioned in SAL (#wikimedia-operations) [2021-07-13T09:00:10Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-13T09:00:16Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277116

Kormat updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2021-07-13T10:21:35Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277116

Mentioned in SAL (#wikimedia-operations) [2021-07-13T10:21:42Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277116

Kormat changed the task status from Open to Stalled.Jul 13 2021, 11:35 AM
Kormat updated the task description. (Show Details)
Kormat moved this task from Refine to Blocked on the DBA board.

Stalling this until we switch back to eqiad.

Marostegui changed the task status from Stalled to Open.Oct 19 2021, 5:36 AM

This can now proceed

Mentioned in SAL (#wikimedia-operations) [2021-10-20T08:50:12Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T08:50:20Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T09:04:58Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T09:05:05Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 T277116

Kormat moved this task from Blocked to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2021-10-20T09:52:34Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T09:52:43Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T09:59:23Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T09:59:32Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T10:13:01Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T10:13:11Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T12:36:58Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 3:00:00 on 14 hosts with reason: Schema change s1 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T12:37:09Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 14 hosts with reason: Schema change s1 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T13:11:31Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 T277116

Mentioned in SAL (#wikimedia-operations) [2021-10-20T13:11:37Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 T277116

Kormat updated the task description. (Show Details)

All done.