Page MenuHomePhabricator

Execute the schema change for Partial Blocks
Closed, ResolvedPublic

Description

The Partial Blocks MVP release requires a schema change to function correctly.

  • The ALTER TABLEs to run: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/440871/21/maintenance/archives/patch-ipb_sitewide.sql
  • Where to run those changes: all.dblist
  • When to run those changes: any time
  • If the schema change is backwards compatible: Yes
  • If the schema change has been tested already on some of the test/beta wikis: This is the request to apply it to test wiki.
  • If it involves new columns or tables: There are new tables. No migration of data or deletion of data is required. Data can and should be made available in labs replicas and/or dumps.

More info:

This change was discussed and approved by TechCom in T199917.

The work around the change is in T197144 while the actual patch is here: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/440871

The core change is the addition of a field ipb_sitewide on the ipblocks table and the addition of a table ipblocks_restrictions.

alter table progress:

s1

s2

s3

s4

  • codfw
  • eqiad

s5

s6

s7

  • codfw
  • eqiad

s8

  • codfw
  • eqiad

wikitech

  • labsdb
  • labtestwiki

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Marostegui No problem. I'll get it fixed up. Sorry for the confusion.

No worries at all :-)

Thanks for amending the task. I still have some questions.

  1. Why you want this change only deployed to testwiki?
  2. The table creation isn't done by DBAs - as you said it can be replicated to labs, you should be good to proceed with their creation during a normal SWAT slots: https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change
  1. We'd like the change on all production databases but I know "all databases" isn't a valid answer for your work. However, I'm new and don't know all of the possible databases. I'm consulting with my team to get the right answer. In the short term, testwiki and betawiki would be a good start, I suppose.
  2. Cool. We will take care of that and I'll remove that from the description.
  1. We'd like the change on all production databases but I know "all databases" isn't a valid answer for your work. However, I'm new and don't know all of the possible databases. I'm consulting with my team to get the right answer. In the short term, testwiki and betawiki would be a good start, I suppose.

You should probably use all.dblist then :-)
That includes testwiki, but keep in mind that we don't normally start with testwiki, as it is part of a larger section. You should test your changes (I assume they are testable) in beta first to make sure that what you are requesting is valid.
Doing a schema change in complex and can take several months. We have around 900 wikis to ALTER, and we normally start for those which are smaller but yet have some traffic, so we can discover things before they hit larger wikis, as sometimes changes on smaller wikis (or without traffic like testwiki really) do not reveal issues, but as soon as the traffic starts to increase we can see things.

So keep in mind that your change will probably hit s5 (dewiki) or s6 (frwiki, jawiki, ruwiki) before hitting testwiki. So please make sure you've tested your changes on beta (again, if they are testable) (you might want to join #wikimedia-tech or ask someone in Release-Engineering-Team to get some help about beta, as we, DBAs, do not maintain beta).

  1. Cool. We will take care of that and I'll remove that from the description.

Thanks!

@Marostegui I've updated the description to include all.dblist and the team will SWAT the create table that is required for this project.

I will try to get this deployed on eqiad before we switch back to it.

Change 461531 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2066

https://gerrit.wikimedia.org/r/461531

Change 461531 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2066

https://gerrit.wikimedia.org/r/461531

I have deployed the schama change on db2066 (s5) - and I will leave it like that for some hours to make sure nothing breaks before going ahead and deploying it on eqiad (passive DC)

s5 progress:

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore2001
  • dbstore1002
  • db2094
  • db2089
  • db2084
  • db2075
  • db2066
  • db2059
  • db2052
  • db2038
  • db1124
  • db1113
  • db1110
  • db1102
  • db1100
  • db1097
  • db1096
  • db1082
  • db1070

Confirmed this is backwards compatible. I fully altered s5 on codfw and nothing broke on eqiad (which is the situation we will have once we witch back to eqiad - eqiad will have the change and codfw will not for the rest of sections) and the table is getting new records.

Mentioned in SAL (#wikimedia-operations) [2018-09-20T14:22:42Z] <marostegui> Deploy schema change on s5 eqiad master with replication T204006

I am going to do one more compatibility test on s6, just to be completely sure

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore2001
  • dbstore1002
  • db2095
  • db2089
  • db2087
  • db2076
  • db2067
  • db2060
  • db2053
  • db2046
  • db2039
  • db1125
  • db1113
  • db1098
  • db1096
  • db1093
  • db1088
  • db1085
  • db1061

Mentioned in SAL (#wikimedia-operations) [2018-09-24T15:46:50Z] <marostegui> Deploy schema change on s8 eqiad master with replication - might generate lag - T204006

Mentioned in SAL (#wikimedia-operations) [2018-09-25T06:42:38Z] <marostegui> Deploy schema change on s4 eqiad master (db1068), might generate lag on s4 eqiad - T204006

Mentioned in SAL (#wikimedia-operations) [2018-09-25T06:46:11Z] <marostegui> Deploy schema change on s7 eqiad master (db1062), might generate lag on s4 eqiad - T204006

Mentioned in SAL (#wikimedia-operations) [2018-09-25T06:49:00Z] <marostegui> Deploy schema change on s2 eqiad master (db1066), might generate lag on s2 eqiad - T204006

Mentioned in SAL (#wikimedia-operations) [2018-09-25T06:52:12Z] <marostegui> Deploy schema change on s1 eqiad master (db1067), might generate lag on s1 eqiad - T204006

Mentioned in SAL (#wikimedia-operations) [2018-09-25T06:58:50Z] <marostegui> Deploy schema change on labtestwiki - T204006

Mentioned in SAL (#wikimedia-operations) [2018-09-25T07:00:05Z] <marostegui> Deploy schema change on labswiki (wikitech) m5 master db1073 - T204006

Mentioned in SAL (#wikimedia-operations) [2018-09-25T07:13:25Z] <marostegui> Deploy schema change on s3 eqiad master (db1075), might generate lag on s3 eqiad - T204006

All eqiad is done now.
Once we have failed over back to eqiad the 10th October, I will start doing the pending codfw ones and get this over with :-)

Mentioned in SAL (#wikimedia-operations) [2018-09-25T07:54:41Z] <marostegui> Deploy schema change on s4 codfw - T204006

Yes, it was done correctly.
We will see if we can finish this schema change by the end of next week

We have had unexpected fires that might last long, so this might be delayed more than next week. We will do whatever we can, but I cannot promise this will get done next week or the week after.
Sorry.

@Marostegui - I've heard from multiple people about unexpected fires delaying Ops/DBA work, but no additional information. Is there a phab ticket we can follow for these fires? These schema changes have been delayed multiple times, so obviously we want to push them along as quickly as possible, but it would probably work better if we can poke you when we know the current fires are out.

@Marostegui - I've heard from multiple people about unexpected fires delaying Ops/DBA work, but no additional information. Is there a phab ticket we can follow for these fires?

Mostly: T206743 T206740

These schema changes have been delayed multiple times, so obviously we want to push them along as quickly as possible, but it would probably work better if we can poke you when we know the current fires are out.

Not sure what was the cause of those delays, this ticket was created the 11th Sept and was missing lots of information and wasn't ready to be taken by the DBAs till 13th Sept.
We had the schema change fully deployed on eqiad by the 25th of Sept and we had to wait for the failover back which happened the 10th. So 50% of the task was done in 12 days, and taking into account that a schema change can take up to 5-6 months, I believe we are in a good shape here.

We were lucky we had time to get this done while eqiad is passive as we had a pretty long list of things to do T189107: DB meta task for next DC failover issues, but we still got able to do it, so now, we "only" have to do it on codfw all at once. Otherwise, and as I said above, this could have taken months, as we'd need to do it slave by slave.

Unfortunately we have had fires that have a lot more priority than this, but I still believe this task will not take months to be completed anyways, as I said, we are 50% done with it.

Now that I see it we are even more than 50% done, as we have also done lots of sections in both DCs already, so only pending 4 sections in eqiad. So I don't think we have delayed this that much considering the ticket was ready for us on the 13th. Sept.

Now that I see it we are even more than 50% done, as we have also done lots of sections in both DCs already, so only pending 4 sections in eqiad. So I don't think we have delayed this that much considering the ticket was ready for us on the 13th. Sept.

Thank you, @Marostegui for anything you can help us with to get this work done quickly. We have some wikis that are very much looking forward to this new Special:Block functionality!

Yes, thanks @Marostegui for providing some context about how these changes normally happen. This is my first one and so I didn't know what to expect.

s1 codfw:

  • dbstore2002
  • db2094
  • db2092
  • db2088
  • db2085
  • db2072
  • db2071
  • db2070
  • db2062
  • db2055
  • db2048

Mentioned in SAL (#wikimedia-operations) [2018-10-19T05:25:28Z] <marostegui> Deploy schema change on s1 codfw host by host without replication - T204006

s2 codfw:

  • dbstore2002
  • dbstore2001
  • db2095
  • db2091
  • db2088
  • db2063
  • db2056
  • db2049
  • db2041
  • db2035

Mentioned in SAL (#wikimedia-operations) [2018-10-19T05:58:19Z] <marostegui> Deploy schema change on s2 codfw host by host without replication - T204006

s7 codfw:

  • dbstore2001
  • db2095
  • db2087
  • db2086
  • db2077
  • db2068
  • db2061
  • db2054
  • db2047
  • db2040

Mentioned in SAL (#wikimedia-operations) [2018-10-19T06:13:46Z] <marostegui> Deploy schema change on s7 codfw host by host without replication - T204006

s3 codfw:

  • dbstore2002
  • db2094
  • db2074
  • db2057
  • db2050
  • db2036
  • db2043

Mentioned in SAL (#wikimedia-operations) [2018-10-22T05:31:49Z] <marostegui> Deploy schema change on dbstore2002:3313 - T204006

Mentioned in SAL (#wikimedia-operations) [2018-10-22T05:47:35Z] <marostegui> Deploy schema change on s3 db2074 (and db2094 sanitarium) - T204006

Mentioned in SAL (#wikimedia-operations) [2018-10-22T06:00:28Z] <marostegui> Deploy schema change on db2057 - T204006

Mentioned in SAL (#wikimedia-operations) [2018-10-22T06:11:41Z] <marostegui> Deploy schema change on db2050 - T204006

Mentioned in SAL (#wikimedia-operations) [2018-10-22T06:34:44Z] <marostegui> Deploy schema change on db2036 - T204006

Mentioned in SAL (#wikimedia-operations) [2018-10-22T08:27:44Z] <marostegui> Deploy schema change on db2043 (s3 master) without replication - T204006

Marostegui updated the task description. (Show Details)

This is all done

Mentioned in SAL (#wikimedia-operations) [2018-11-02T07:59:18Z] <jynus> performing alter table on db2050 T208462 T204006

Mentioned in SAL (#wikimedia-operations) [2018-11-02T08:19:38Z] <jynus> performing alter table on dbstore2002 s3 and reducing consistency to improve recovery time T208462 T204006