Execute the schema change for Partial Blocks
Closed, ResolvedPublic

Description

The Partial Blocks MVP release requires a schema change to function correctly.

  • The ALTER TABLEs to run: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/440871/21/maintenance/archives/patch-ipb_sitewide.sql
  • Where to run those changes: all.dblist
  • When to run those changes: any time
  • If the schema change is backwards compatible: Yes
  • If the schema change has been tested already on some of the test/beta wikis: This is the request to apply it to test wiki.
  • If it involves new columns or tables: There are new tables. No migration of data or deletion of data is required. Data can and should be made available in labs replicas and/or dumps.

More info:

This change was discussed and approved by TechCom in T199917.

The work around the change is in T197144 while the actual patch is here: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/440871

The core change is the addition of a field ipb_sitewide on the ipblocks table and the addition of a table ipblocks_restrictions.

alter table progress:

s1

s2

s3

s4

  • codfw
  • eqiad

s5

s6

s7

  • codfw
  • eqiad

s8

  • codfw
  • eqiad

wikitech

  • labsdb
  • labtestwiki
There are a very large number of changes, so older changes are hidden. Show Older Changes

@Marostegui No problem. I'll get it fixed up. Sorry for the confusion.

No worries at all :-)

aezell updated the task description. (Show Details)Sep 11 2018, 4:13 PM

Thanks for amending the task. I still have some questions.

  1. Why you want this change only deployed to testwiki?
  2. The table creation isn't done by DBAs - as you said it can be replicated to labs, you should be good to proceed with their creation during a normal SWAT slots: https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change
  1. We'd like the change on all production databases but I know "all databases" isn't a valid answer for your work. However, I'm new and don't know all of the possible databases. I'm consulting with my team to get the right answer. In the short term, testwiki and betawiki would be a good start, I suppose.
  2. Cool. We will take care of that and I'll remove that from the description.
aezell updated the task description. (Show Details)Sep 11 2018, 4:26 PM
  1. We'd like the change on all production databases but I know "all databases" isn't a valid answer for your work. However, I'm new and don't know all of the possible databases. I'm consulting with my team to get the right answer. In the short term, testwiki and betawiki would be a good start, I suppose.

You should probably use all.dblist then :-)
That includes testwiki, but keep in mind that we don't normally start with testwiki, as it is part of a larger section. You should test your changes (I assume they are testable) in beta first to make sure that what you are requesting is valid.
Doing a schema change in complex and can take several months. We have around 900 wikis to ALTER, and we normally start for those which are smaller but yet have some traffic, so we can discover things before they hit larger wikis, as sometimes changes on smaller wikis (or without traffic like testwiki really) do not reveal issues, but as soon as the traffic starts to increase we can see things.

So keep in mind that your change will probably hit s5 (dewiki) or s6 (frwiki, jawiki, ruwiki) before hitting testwiki. So please make sure you've tested your changes on beta (again, if they are testable) (you might want to join #wikimedia-tech or ask someone in Release-Engineering-Team to get some help about beta, as we, DBAs, do not maintain beta).

  1. Cool. We will take care of that and I'll remove that from the description.

Thanks!

aezell updated the task description. (Show Details)Sep 12 2018, 7:10 PM

@Marostegui I've updated the description to include all.dblist and the team will SWAT the create table that is required for this project.

Marostegui triaged this task as Normal priority.Sep 17 2018, 2:38 PM

I will try to get this deployed on eqiad before we switch back to it.

Marostegui moved this task from Backlog to Next on the DBA board.Sep 19 2018, 10:17 AM

Change 461531 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2066

https://gerrit.wikimedia.org/r/461531

Marostegui updated the task description. (Show Details)Sep 20 2018, 7:49 AM

Change 461531 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2066

https://gerrit.wikimedia.org/r/461531

I have deployed the schama change on db2066 (s5) - and I will leave it like that for some hours to make sure nothing breaks before going ahead and deploying it on eqiad (passive DC)

Marostegui added a comment.EditedSep 20 2018, 7:57 AM

s5 progress:

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore2001
  • dbstore1002
  • db2094
  • db2089
  • db2084
  • db2075
  • db2066
  • db2059
  • db2052
  • db2038
  • db1124
  • db1113
  • db1110
  • db1102
  • db1100
  • db1097
  • db1096
  • db1082
  • db1070
Marostegui updated the task description. (Show Details)Sep 20 2018, 7:57 AM
Marostegui moved this task from Next to In progress on the DBA board.Sep 20 2018, 8:02 AM
Marostegui moved this task from Backlog to In progress on the Blocked-on-schema-change board.

Confirmed this is backwards compatible. I fully altered s5 on codfw and nothing broke on eqiad (which is the situation we will have once we witch back to eqiad - eqiad will have the change and codfw will not for the rest of sections) and the table is getting new records.

Mentioned in SAL (#wikimedia-operations) [2018-09-20T14:22:42Z] <marostegui> Deploy schema change on s5 eqiad master with replication T204006

Marostegui updated the task description. (Show Details)Sep 20 2018, 2:24 PM
Marostegui added a comment.EditedSep 24 2018, 5:07 AM

I am going to do one more compatibility test on s6, just to be completely sure

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore2001
  • dbstore1002
  • db2095
  • db2089
  • db2087
  • db2076
  • db2067
  • db2060
  • db2053
  • db2046
  • db2039
  • db1125
  • db1113
  • db1098
  • db1096
  • db1093
  • db1088
  • db1085
  • db1061
Marostegui updated the task description. (Show Details)Sep 24 2018, 5:40 AM
Marostegui updated the task description. (Show Details)Sep 24 2018, 6:58 AM
Marostegui updated the task description. (Show Details)Sep 24 2018, 7:18 AM

Mentioned in SAL (#wikimedia-operations) [2018-09-24T15:46:50Z] <marostegui> Deploy schema change on s8 eqiad master with replication - might generate lag - T204006

Marostegui updated the task description. (Show Details)Sep 24 2018, 3:50 PM

Mentioned in SAL (#wikimedia-operations) [2018-09-25T06:42:38Z] <marostegui> Deploy schema change on s4 eqiad master (db1068), might generate lag on s4 eqiad - T204006

Marostegui updated the task description. (Show Details)Sep 25 2018, 6:43 AM

Mentioned in SAL (#wikimedia-operations) [2018-09-25T06:46:11Z] <marostegui> Deploy schema change on s7 eqiad master (db1062), might generate lag on s4 eqiad - T204006

Marostegui updated the task description. (Show Details)Sep 25 2018, 6:47 AM

Mentioned in SAL (#wikimedia-operations) [2018-09-25T06:49:00Z] <marostegui> Deploy schema change on s2 eqiad master (db1066), might generate lag on s2 eqiad - T204006

Marostegui updated the task description. (Show Details)Sep 25 2018, 6:50 AM

Mentioned in SAL (#wikimedia-operations) [2018-09-25T06:52:12Z] <marostegui> Deploy schema change on s1 eqiad master (db1067), might generate lag on s1 eqiad - T204006

Marostegui updated the task description. (Show Details)Sep 25 2018, 6:55 AM

Mentioned in SAL (#wikimedia-operations) [2018-09-25T06:58:50Z] <marostegui> Deploy schema change on labtestwiki - T204006

Marostegui updated the task description. (Show Details)Sep 25 2018, 6:59 AM

Mentioned in SAL (#wikimedia-operations) [2018-09-25T07:00:05Z] <marostegui> Deploy schema change on labswiki (wikitech) m5 master db1073 - T204006

Marostegui updated the task description. (Show Details)Sep 25 2018, 7:00 AM

Mentioned in SAL (#wikimedia-operations) [2018-09-25T07:13:25Z] <marostegui> Deploy schema change on s3 eqiad master (db1075), might generate lag on s3 eqiad - T204006

Marostegui updated the task description. (Show Details)Sep 25 2018, 7:20 AM
Marostegui updated the task description. (Show Details)Sep 25 2018, 7:28 AM
Marostegui added a comment.EditedSep 25 2018, 7:43 AM

All eqiad is done now.
Once we have failed over back to eqiad the 10th October, I will start doing the pending codfw ones and get this over with :-)

Mentioned in SAL (#wikimedia-operations) [2018-09-25T07:54:41Z] <marostegui> Deploy schema change on s4 codfw - T204006

Marostegui updated the task description. (Show Details)Sep 25 2018, 7:54 AM
Marostegui changed the task status from Open to Stalled.Sep 26 2018, 5:18 AM

@Marostegui was the falloever to eqiad completed?

Yes, it was done correctly.
We will see if we can finish this schema change by the end of next week

We have had unexpected fires that might last long, so this might be delayed more than next week. We will do whatever we can, but I cannot promise this will get done next week or the week after.
Sorry.

mark added a subscriber: mark.Oct 11 2018, 2:59 PM

@Marostegui - I've heard from multiple people about unexpected fires delaying Ops/DBA work, but no additional information. Is there a phab ticket we can follow for these fires? These schema changes have been delayed multiple times, so obviously we want to push them along as quickly as possible, but it would probably work better if we can poke you when we know the current fires are out.

@Marostegui - I've heard from multiple people about unexpected fires delaying Ops/DBA work, but no additional information. Is there a phab ticket we can follow for these fires?

Mostly: T206743 T206740

These schema changes have been delayed multiple times, so obviously we want to push them along as quickly as possible, but it would probably work better if we can poke you when we know the current fires are out.

Not sure what was the cause of those delays, this ticket was created the 11th Sept and was missing lots of information and wasn't ready to be taken by the DBAs till 13th Sept.
We had the schema change fully deployed on eqiad by the 25th of Sept and we had to wait for the failover back which happened the 10th. So 50% of the task was done in 12 days, and taking into account that a schema change can take up to 5-6 months, I believe we are in a good shape here.

We were lucky we had time to get this done while eqiad is passive as we had a pretty long list of things to do T189107: DB meta task for next DC failover issues, but we still got able to do it, so now, we "only" have to do it on codfw all at once. Otherwise, and as I said above, this could have taken months, as we'd need to do it slave by slave.

Unfortunately we have had fires that have a lot more priority than this, but I still believe this task will not take months to be completed anyways, as I said, we are 50% done with it.

Now that I see it we are even more than 50% done, as we have also done lots of sections in both DCs already, so only pending 4 sections in eqiad. So I don't think we have delayed this that much considering the ticket was ready for us on the 13th. Sept.

Now that I see it we are even more than 50% done, as we have also done lots of sections in both DCs already, so only pending 4 sections in eqiad. So I don't think we have delayed this that much considering the ticket was ready for us on the 13th. Sept.

Thank you, @Marostegui for anything you can help us with to get this work done quickly. We have some wikis that are very much looking forward to this new Special:Block functionality!

Yes, thanks @Marostegui for providing some context about how these changes normally happen. This is my first one and so I didn't know what to expect.

Marostegui added a comment.EditedOct 19 2018, 5:18 AM

s1 codfw:

  • dbstore2002
  • db2094
  • db2092
  • db2088
  • db2085
  • db2072
  • db2071
  • db2070
  • db2062
  • db2055
  • db2048

Mentioned in SAL (#wikimedia-operations) [2018-10-19T05:25:28Z] <marostegui> Deploy schema change on s1 codfw host by host without replication - T204006

Marostegui updated the task description. (Show Details)Oct 19 2018, 5:46 AM
Marostegui added a comment.EditedOct 19 2018, 5:52 AM

s2 codfw:

  • dbstore2002
  • dbstore2001
  • db2095
  • db2091
  • db2088
  • db2063
  • db2056
  • db2049
  • db2041
  • db2035
Marostegui updated the task description. (Show Details)Oct 19 2018, 5:53 AM
Marostegui updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2018-10-19T05:58:19Z] <marostegui> Deploy schema change on s2 codfw host by host without replication - T204006

Marostegui updated the task description. (Show Details)Oct 19 2018, 6:08 AM
Marostegui added a comment.EditedOct 19 2018, 6:12 AM

s7 codfw:

  • dbstore2001
  • db2095
  • db2087
  • db2086
  • db2077
  • db2068
  • db2061
  • db2054
  • db2047
  • db2040

Mentioned in SAL (#wikimedia-operations) [2018-10-19T06:13:46Z] <marostegui> Deploy schema change on s7 codfw host by host without replication - T204006

Marostegui updated the task description. (Show Details)Oct 19 2018, 6:20 AM
Marostegui added a comment.EditedOct 22 2018, 5:30 AM

s3 codfw:

  • dbstore2002
  • db2094
  • db2074
  • db2057
  • db2050
  • db2036
  • db2043
Marostegui updated the task description. (Show Details)Oct 22 2018, 5:30 AM

Mentioned in SAL (#wikimedia-operations) [2018-10-22T05:31:49Z] <marostegui> Deploy schema change on dbstore2002:3313 - T204006

Mentioned in SAL (#wikimedia-operations) [2018-10-22T05:47:35Z] <marostegui> Deploy schema change on s3 db2074 (and db2094 sanitarium) - T204006

Mentioned in SAL (#wikimedia-operations) [2018-10-22T06:00:28Z] <marostegui> Deploy schema change on db2057 - T204006

Mentioned in SAL (#wikimedia-operations) [2018-10-22T06:11:41Z] <marostegui> Deploy schema change on db2050 - T204006

Mentioned in SAL (#wikimedia-operations) [2018-10-22T06:34:44Z] <marostegui> Deploy schema change on db2036 - T204006

Mentioned in SAL (#wikimedia-operations) [2018-10-22T08:27:44Z] <marostegui> Deploy schema change on db2043 (s3 master) without replication - T204006

Marostegui updated the task description. (Show Details)Oct 22 2018, 8:53 AM
Marostegui closed this task as Resolved.

This is all done

This is all done

THANK YOU!

Mentioned in SAL (#wikimedia-operations) [2018-11-02T07:59:18Z] <jynus> performing alter table on db2050 T208462 T204006

Mentioned in SAL (#wikimedia-operations) [2018-11-02T08:19:38Z] <jynus> performing alter table on dbstore2002 s3 and reducing consistency to improve recovery time T208462 T204006