Page MenuHomePhabricator

Prepare a database test for m3
Closed, ResolvedPublic

Description

For temporary testing to upgrade phabricator version.
Let's grab a host in eqiad and clone it for m3.

Related Objects

StatusSubtypeAssignedTask
ResolvedFeatureAklapper
ResolvedFeatureAklapper
ResolvedFeatureAklapper
OpenNone
Resolvedvalerio.bozzolan
ResolvedBUG REPORTvalerio.bozzolan
StalledNone
OpenNone
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedFeatureAklapper
ResolvedAklapper
ResolvedFeatureAklapper
ResolvedBUG REPORTAklapper
Resolvedbrennen
ResolvedJclark-ctr
OpenNone
ResolvedMarostegui
ResolvedABran-WMF

Event Timeline

Marostegui claimed this task.
Marostegui triaged this task as Medium priority.
Marostegui moved this task from Triage to Ready on the DBA board.

@brennen can you describe a bit exactly what you need?
Do you need just a database to access from any phabX host to perform the schema change upgrade? Would you perform that manually? I am thinking about just cloning the users as well, but I'd need the IP from where you'll be connecting as this host won't go via proxy like the production hosts.

brennen added a subscriber: Dzahn.

I think we could test running the upgrade from phab2002 (the backup server), and yeah I think we'd just run the migration by hand from that host. (cc: @Dzahn for awareness, but I'm pretty sure that's close to what we did last time.)

(Apologies for my slowness in replying! Lost track of this task in the shuffle of other phab notifications.)

Hi @Marostegui could we use this IP, please: 10.64.16.125 (2620:0:861:102:10:64:16:125). This host is called phab1005.eqiad.wmnet and there is no Phabricator on it just yet, not in production.

But this would be a good candidate because then we can test all this without touching production at all. Plus we need to set it up anyways (with a newer distro version) and in the future we want to failover to it.

It would be great if the user can be the same as production but with a different password. Then we can feel safe that nothing can go wrong and later just change the password and db host when we make this host production and don't have to bother you again.

Also it would be in eqiad, as the ticket says it should be.

Thank you both for the information, I will get a host ready hopefully next week. It is most likely going to be db1176.
A quick check shows that there is no firewalls in between or anything like that at the moment, so that's good:

root@phab1005:~# telnet db1176 3306
Trying 10.64.0.143...
Connected to db1176.
Escape character is '^]'.
]
5.5.5-10.6.21-MariaDB-logֆg&&+CPHY�ft6OiDWi+6oNmysql_native_passwordConnection closed by foreign host.

1*************************** 1. row ***************************
2 Slave_IO_State:
3 Master_Host: db1213.eqiad.wmnet
4 Master_User: repl2024
5 Master_Port: 3306
6 Connect_Retry: 60
7 Master_Log_File: db1213-bin.000327
8 Read_Master_Log_Pos: 349590531
9 Relay_Log_File: db2234-relay-bin.000709
10 Relay_Log_Pos: 349590831
11 Relay_Master_Log_File: db1213-bin.000327
12 Slave_IO_Running: No
13 Slave_SQL_Running: No
14 Replicate_Do_DB:
15 Replicate_Ignore_DB:
16 Replicate_Do_Table:
17 Replicate_Ignore_Table:
18 Replicate_Wild_Do_Table:
19 Replicate_Wild_Ignore_Table:
20 Last_Errno: 0
21 Last_Error:
22 Skip_Counter: 0
23 Exec_Master_Log_Pos: 349590531
24 Relay_Log_Space: 349591189
25 Until_Condition: None
26 Until_Log_File:
27 Until_Log_Pos: 0
28 Master_SSL_Allowed: Yes
29 Master_SSL_CA_File:
30 Master_SSL_CA_Path:
31 Master_SSL_Cert:
32 Master_SSL_Cipher:
33 Master_SSL_Key:
34 Seconds_Behind_Master: NULL
35 Master_SSL_Verify_Server_Cert: No
36 Last_IO_Errno: 0
37 Last_IO_Error:
38 Last_SQL_Errno: 0
39 Last_SQL_Error:
40 Replicate_Ignore_Server_Ids:
41 Master_Server_Id: 171970643
42 Master_SSL_Crl:
43 Master_SSL_Crlpath:
44 Using_Gtid: Slave_Pos
45 Gtid_IO_Pos: 171970573-171970573-2689732,0-171970643-1926831053
46 Replicate_Do_Domain_Ids:
47 Replicate_Ignore_Domain_Ids:
48 Parallel_Mode: optimistic
49 SQL_Delay: 0
50 SQL_Remaining_Delay: NULL
51 Slave_SQL_Running_State:
52 Slave_DDL_Groups: 4225
53Slave_Non_Transactional_Groups: 0
54 Slave_Transactional_Groups: 162679255

1*************************** 1. row ***************************
2 File: db1176-bin.000001
3 Position: 176176045
4 Binlog_Do_DB:
5Binlog_Ignore_DB:

@brennen @Dzahn I have set up the host.
The host is db1176, it is replicating from m3 master so you can test things with real data once you do the changes.
It has the production users with a different password (the password has been left at cumin1002:/home/dzahn/phab1005test )

I've tested it from phab1005 and it works:

root@phab1005:~# for i in phabricatorphd phadmin phstats phuser; do echo $i; mysql -u$i -p --host db1176 -e "select user()" ; done
phabricatorphd
Enter password:
+-----------------------------+
| user()                      |
+-----------------------------+
| phabricatorphd@10.64.16.125 |
+-----------------------------+
phadmin
Enter password:
+----------------------+
| user()               |
+----------------------+
| phadmin@10.64.16.125 |
+----------------------+
phstats
Enter password:
+----------------------+
| user()               |
+----------------------+
| phstats@10.64.16.125 |
+----------------------+
phuser
Enter password:
+---------------------+
| user()              |
+---------------------+
| phuser@10.64.16.125 |
+---------------------+
root@phab1005:~#

Is there anything else needed here or can I close this?

Change #1134666 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1176: Add note

https://gerrit.wikimedia.org/r/1134666

Change #1134666 merged by Marostegui:

[operations/puppet@production] db1176: Add note

https://gerrit.wikimedia.org/r/1134666

Thank you @Marostegui ! I can confirm I can connect from phab1005 to db1176 to a phab DB using the password on cumin1002.

I think we can close this.

Let me just do one more thing and create phab shell users on phab1005 so I can properly share the password there.

Change #1134736 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] hieradata: add phab shell user groups on phab1005

https://gerrit.wikimedia.org/r/1134736

Change #1134736 merged by Dzahn:

[operations/puppet@production] hieradata: add phab shell user groups on phab1005

https://gerrit.wikimedia.org/r/1134736

@brennen @Aklapper I added "phab server shell users" to the host phab1005. So while there is no Phabricator installed there yet, you do have shell access now. (along with anyone else who already had shell access on existing prod phab hosts).

I dumped the password Manuel created into /home/aklapper/phab_db_test and /home/brennen/phab_db_test.

You could connect like this, as an example:

mysql -h db1176.eqiad.wmnet -u phadmin -p

with the different users and different database names like in production but all the same password.

Actually installing phabricator on this machine will be T377889.

@brennen @Aklapper @Dzahn what is the status of this? Any ETA on when the tests will be finished?

@brennen @Aklapper @Dzahn what is the status of this? Any ETA on when the tests will be finished?

Hi! @brennen is working on this now and hopes to complete testing soon. Recently, I dropped some things on his plate that have eaten away at his time, sorry for that. If we are unable to complete testing in the next day or so, then it will carry over to the week of Mon 9 Jun.

Thanks for working with us on this, I'm sorry for the delay here. I really appreciate your help and understanding (as do, unknowingly, all phab users :)).

@brennen @Aklapper @Dzahn what is the status of this? Any ETA on when the tests will be finished?

Hi! @brennen is working on this now and hopes to complete testing soon. Recently, I dropped some things on his plate that have eaten away at his time, sorry for that. If we are unable to complete testing in the next day or so, then it will carry over to the week of Mon 9 Jun.

Thanks for working with us on this, I'm sorry for the delay here. I really appreciate your help and understanding (as do, unknowingly, all phab users :)).

Any news on this? We are now trying to schedule some GTID testing and we'd need to know when these hosts will be released.

Thanks!

Any news on this? We are now trying to schedule some GTID testing and we'd need to know when these hosts will be released.

Gah, sorry. Attempting to get this moving today. Once we have a working scap deploy to phab1005 (T377889#10866191), it should be quick.

Update: scap deploy is working; currently waiting on a run of Phab's storage utility to get through a bunch of ANALYZE TABLE stuff and then we should be able to test the migration.

Change #1161048 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: puppetize password for testdb in script-vars

https://gerrit.wikimedia.org/r/1161048

Change #1161051 had a related patch set uploaded (by Dzahn; author: Dzahn):

[labs/private@master] add fake password for phab test db admin user

https://gerrit.wikimedia.org/r/1161051

Change #1161051 merged by Dzahn:

[labs/private@master] add fake password for phab test db admin user

https://gerrit.wikimedia.org/r/1161051

Change #1161048 merged by Dzahn:

[operations/puppet@production] phabricator::migration: puppetize password for testdb in script-vars

https://gerrit.wikimedia.org/r/1161048

Mentioned in SAL (#wikimedia-operations) [2025-06-18T22:14:20Z] <brennen@deploy1003> Started deploy [phabricator/deployment@6af4bb7]: merge-phorge-2024.35 deploy to phab1005 (T390034)

Mentioned in SAL (#wikimedia-operations) [2025-06-18T22:14:46Z] <brennen@deploy1003> Finished deploy [phabricator/deployment@6af4bb7]: merge-phorge-2024.35 deploy to phab1005 (T390034) (duration: 00m 26s)

Ok, this is finally in a state where we can test the migration, but I get:

Applying patch "phabricator:20230917.fileattachment.01.delete.sql" to host "db1176.eqiad.wmnet:3306"...
[2025-06-18 22:16:49] EXCEPTION: (AphrontQueryException) #1290: The MariaDB server is running with the --read-only option so it cannot execute this statement at [<phorge>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:396]

@Marostegui could we get this in a read/write state? We need to be able to delete a bunch of rows (and I think possibly alter a table).

Thanks! Running test migration now.

US holiday today, but just noting that test migration on phab1005 completed in about an hour yesterday, I think we have a reasonable upper bound for how long this upgrade will take in production. Ok from my perspective if the db goes away at this point. If it's still around during US work hours Friday, I might figure out how to do some spot checks on data, but I don't think that's an urgent need.

cc: @Aklapper

@brennen I will remove them on Tuesday EU morning, so you'd still have a bit more days.