Page MenuHomePhabricator

Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4]
Open, NormalPublic

Description

These new proxies are ready to be productionized

eqiad:

  • dbproxy1012 rack: A5 will replace dbproxy1001
  • dbproxy1013 rack: A6 will replace dbproxy1002 (non primary)
  • dbproxy1014 rack: B1 will replace dbproxy1006 (non primary)
  • dbproxy1015 rack: B8 will replace dbproxy1007
  • dbproxy1016 rack: D1 will replace dbproxy1003 (non primary)
  • dbproxy1017 rack: D3 will replace dbproxy1005
  • dbproxy1018 (cloud VLAN) rack: C5 (will replace dbproxy1010)
  • dbproxy1019 (cloud VLAN) rack: C5 (will replace dbproxy1011)
  • dbproxy1020 rack: C5 will replace dbproxy1008
  • dbproxy1021 rack: C8 will go to m5 (currently only has one proxy)

codfw:

  • dbproxy2001 - m1
  • dbproxy2002 - m2
  • dbproxy2003 - m3
  • dbproxy2004- spare

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This is going in the right direction, except that we already have assigned ports for misc services:

m1: 3321
m2: 3322
m3: 3323
m5: 3325

See https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/manifests/mariadb/misc/multiinstance.pp

Then the mapping should be:

dbproxy1012:
  3321: (same services as currently on dbproxy1001)
  3322: (same services as currently on dbproxy1002)
dbproxy1013:
  3322: (same services as currently on dbproxy1002)
  3323: (same services as currently on dbproxy1003)
dbproxy1014:
  3323: (same services as currently on dbproxy1003)
  3306: (same services as currently on dbproxy1004)
dbproxy1015:
  3306: (same services as currently on dbproxy1004)
  3325: (same services as currently on dbproxy1005)
dbproxy1016:
  3325: (same services as currently on dbproxy1005)
  3321: (same services as currently on dbproxy1001)
dbproxy1017:
 3306: spare, nothing configured
 3307: spare, nothing configured

Why we didn't have 3324 for assigned for m4?
I go now and check puppet what needs to have to services per host

jcrespo added a comment.EditedNov 27 2018, 3:09 PM

Why we didn't have 3324 for assigned for m4?

m4 was eventlogging, but architecturally was asked to stop using automatic switch due to its own particularities.

Then the mapping should be:

This is fine to me, except don't reserve the usage for m4, as it is not needed at the time. I wonder if it wouldn't be easier to have all services (available, not in use) on all proxies, and use 4 host for production and 2 for wikireplicas, so we minimize costs and optimize usage. After all, the proxy load is very low, and all hosts being the same will simplify setup. Labsdb proxies would need physical move to put them on cloud support network.

I go now and check puppet what needs to have to services per host

Please do.

BTW, we may need to setup an m6 for T202889, not sure yet.

Banyek moved this task from Backlog to In progress on the User-Banyek board.Dec 3 2018, 5:23 PM
Marostegui removed Banyek as the assignee of this task.Jan 11 2019, 8:37 PM
Marostegui added a subscriber: Banyek.
RobH mentioned this in Unknown Object (Task).Jan 14 2019, 10:21 PM

I think we should try, for now, to use these proxies to replace the current ones (on a 1:1 basis as proposed here T202367#4770331, at least leave the active ones running on a new host), all these will obviously not be enough as we have more to replace. But at least we start using them and decommissioning some old ones (they were scheduled to be decommissioned in 2016).
Being realistic, the puppet refactor will probably not happen anytime soon

Marostegui renamed this task from Productionize dbproxy101[2-7].eqiad.wmnet to Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].May 29 2019, 2:28 AM
Marostegui removed a project: User-Banyek.
Marostegui updated the task description. (Show Details)
jcrespo edited subscribers, added: Dzahn; removed: Banyek.May 29 2019, 7:21 AM

CCing @Dzahn as he expressed interest for this to happen in the past, and finally the hardware is here (no actionable needed from you at the moment).

@jcrespo what did you have in mind for those 4 new proxies in codfw?
Maybe we can allocate 1 of them for m3 so we can unblock T190572?

Right now misc services actively using the proxy are:
m1
m2
m3

And m4 in eqiad (Eventlogging) will most likely be gone next quarter if everything goes fine.

@jcrespo what did you have in mind for those 4 new proxies in codfw?
Maybe we can allocate 1 of them for m3 so we can unblock T190572?
Right now misc services actively using the proxy are:
m1
m2
m3
And m4 in eqiad (Eventlogging) will most likely be gone next quarter if everything goes fine.

Yes, we can migrate away for now, rearchitecture later, leave the 4th as spare. We also need 2 (in the end 3) physical servers for labs, and we shouldn't mix labs and production-misc.

Dzahn added a comment.May 29 2019, 9:32 PM

CCing @Dzahn as he expressed interest for this to happen in the past, and finally the hardware is here (no actionable needed from you at the moment).

Yep. This was nice to see. Thank you!

As per @elukey comments today, m4 will be gone soonish so no need to replace dbproxy1004 and dbproxy1009 (they are not even in use today).

m4-master.eqiad.wmnet is an alias for db1107.eqiad.wmnet.
db1107.eqiad.wmnet has address 10.64.0.214

As per @elukey comments today, m4 will be gone soonish so no need to replace dbproxy1004 and dbproxy1009 (they are not even in use today).

m4-master.eqiad.wmnet is an alias for db1107.eqiad.wmnet.
db1107.eqiad.wmnet has address 10.64.0.214

+1

To recap on this, right now with the amount of proxies we have in use in eqiad we would be able to replace them 1:1 with these new ones, as m5 doesn't use the proxy at the moment.
m1 uses proxy
m2 uses proxy
m3 uses proxy
m4 will go away
m5 does not use a proxy

codfw _initial_ idea was described at: T202367#5220374

Marostegui added a subtask: Unknown Object (Task).Jun 13 2019, 8:43 AM
RobH closed subtask Unknown Object (Task) as Resolved.Jun 13 2019, 9:29 AM
Marostegui updated the task description. (Show Details)Jun 17 2019, 1:23 PM
Marostegui updated the task description. (Show Details)Jun 21 2019, 7:30 AM
Marostegui updated the task description. (Show Details)Jun 21 2019, 9:14 AM
Marostegui updated the task description. (Show Details)

Change 518251 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Provision dbproxy2001 into codfw m1

https://gerrit.wikimedia.org/r/518251

Change 518251 merged by Marostegui:
[operations/puppet@production] mariadb: Provision dbproxy2001 into codfw m1

https://gerrit.wikimedia.org/r/518251

Mentioned in SAL (#wikimedia-operations) [2019-07-22T06:47:18Z] <marostegui> Stop MySQL on db2062 to test dbproxy2001 notification T202367

Marostegui updated the task description. (Show Details)Jul 22 2019, 7:04 AM

I have provisioned dbproxy2001 into m1 codfw - with notifications disabled as it is not an active proxy (or even service)

Marostegui updated the task description. (Show Details)Jul 22 2019, 8:10 AM

Change 524729 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Replace dbproxy1006 with dbproxy1012

https://gerrit.wikimedia.org/r/524729

Mentioned in SAL (#wikimedia-operations) [2019-07-22T09:40:15Z] <marostegui> Deploy grants on m1 to allow connections from dbproxy1014 - T202367

Change 524729 merged by Marostegui:
[operations/puppet@production] mariadb: Replace dbproxy1006 with dbproxy1014 in m1

https://gerrit.wikimedia.org/r/524729

Mentioned in SAL (#wikimedia-operations) [2019-07-22T12:58:29Z] <marostegui> Stop MySQL on db1117:3321 to test dbproxy1014 (replacement for dbproxy1006) on m1 - T202367

Marostegui updated the task description. (Show Details)Jul 22 2019, 1:14 PM

dbproxy1014 is now ready in m1 (as standby) and as a replacement of dbproxy1006 which can be decommissioned.

Marostegui moved this task from Next to In progress on the DBA board.Jul 22 2019, 1:24 PM

Change 524963 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Productionize dbproxy2002 into m2-codfw

https://gerrit.wikimedia.org/r/524963

Mentioned in SAL (#wikimedia-operations) [2019-07-23T07:31:23Z] <marostegui> Deploy grants for dbproxy2002 on m2 - T202367

Change 524963 merged by Marostegui:
[operations/puppet@production] mariadb: Productionize dbproxy2002 into m2-codfw

https://gerrit.wikimedia.org/r/524963

Mentioned in SAL (#wikimedia-operations) [2019-07-23T08:08:29Z] <marostegui> Stop MySQL on db2044 to test dbproxy2002 notifications - T202367

Marostegui updated the task description. (Show Details)Jul 23 2019, 8:13 AM

Change 525042 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbproxy2001: Enable notifications

https://gerrit.wikimedia.org/r/525042

Change 525042 merged by Marostegui:
[operations/puppet@production] dbproxy2001: Enable notifications

https://gerrit.wikimedia.org/r/525042

Marostegui updated the task description. (Show Details)Jul 23 2019, 8:18 AM

Change 525213 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Provision dbproxy1013

https://gerrit.wikimedia.org/r/525213

Mentioned in SAL (#wikimedia-operations) [2019-07-24T07:10:00Z] <marostegui> Deploy grants for dbproxy1013 in m2 - T202367

Change 525213 merged by Marostegui:
[operations/puppet@production] mariadb: Provision dbproxy1013

https://gerrit.wikimedia.org/r/525213

Mentioned in SAL (#wikimedia-operations) [2019-07-24T07:21:20Z] <marostegui> Stop MySQL on db1117:3322 to check dbproxy1013 notifications - T202367

Marostegui updated the task description. (Show Details)Jul 24 2019, 7:27 AM
Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)Jul 25 2019, 4:41 PM

Change 527114 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Point m2-master.codfw to dbproxy2002

https://gerrit.wikimedia.org/r/527114

Paladox added a subscriber: Paladox.Thu, Aug 1, 9:07 PM

Change 527114 merged by Marostegui:
[operations/dns@master] wmnet: Point m2-master.codfw to dbproxy2002

https://gerrit.wikimedia.org/r/527114

Change 527462 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Add m1-master for codfw

https://gerrit.wikimedia.org/r/527462

Marostegui updated the task description. (Show Details)Fri, Aug 2, 12:18 PM
Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)Fri, Aug 2, 12:23 PM

Change 527462 merged by Dzahn:
[operations/dns@master] wmnet: Add m1-master for codfw

https://gerrit.wikimedia.org/r/527462

Change 528734 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Productionize dbproxy2003 into m3-codfw

https://gerrit.wikimedia.org/r/528734

Mentioned in SAL (#wikimedia-operations) [2019-08-07T13:48:00Z] <marostegui> Apply grants for dbproxy1003 on m3 - T202367

Mentioned in SAL (#wikimedia-operations) [2019-08-07T13:48:00Z] <marostegui> Apply grants for dbproxy1003 on m3 - T202367

This was for dbproxy2003 - I have manually corrected that entry on SAL https://wikitech.wikimedia.org/w/index.php?title=Server_Admin_Log&type=revision&diff=1834845&oldid=1834844

elukey removed a subscriber: elukey.Wed, Aug 7, 2:53 PM

Change 528734 merged by Marostegui:
[operations/puppet@production] mariadb: Productionize dbproxy2003 into m3-codfw

https://gerrit.wikimedia.org/r/528734

Dzahn removed a subscriber: Dzahn.Thu, Aug 8, 11:23 PM

Change 529847 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Point m3-master codfw to dbproxy2003

https://gerrit.wikimedia.org/r/529847

Change 529847 merged by Marostegui:
[operations/dns@master] wmnet: Point m3-master codfw to dbproxy2003

https://gerrit.wikimedia.org/r/529847

Marostegui updated the task description. (Show Details)Wed, Aug 14, 6:02 AM

Change 530025 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbproxy2003: Enable notifications

https://gerrit.wikimedia.org/r/530025

Change 530025 merged by Marostegui:
[operations/puppet@production] dbproxy2003: Enable notifications

https://gerrit.wikimedia.org/r/530025

Change 531598 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbproxy1019: Provision dbproxy1019 to replace dbproxy1011

https://gerrit.wikimedia.org/r/531598

Mentioned in SAL (#wikimedia-operations) [2019-08-22T07:46:47Z] <marostegui> Deploy grants on labsdb1009-labsdb1012 to allow connections for haproxy from dbproxy1019 - T202367

Change 531598 merged by Marostegui:
[operations/puppet@production] dbproxy1019: Provision dbproxy1019 to replace dbproxy1011

https://gerrit.wikimedia.org/r/531598

Change 531660 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Allow re-image dbproxy1018,dbproxy1019

https://gerrit.wikimedia.org/r/531660

Change 531660 merged by Marostegui:
[operations/puppet@production] install_server: Allow re-image dbproxy1018,dbproxy1019

https://gerrit.wikimedia.org/r/531660

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

dbproxy1019.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201908220848_marostegui_249574_dbproxy1019_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['dbproxy1019.eqiad.wmnet']

and were ALL successful.

dbproxy1019 is ready to take over dbproxy1011

root@dbproxy1019:~# echo "show stat" | socat /run/haproxy/haproxy.sock stdio
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,
mariadb,FRONTEND,,,0,1,5000,1,189,171,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,0,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,tcp,,0,1,1,,0,0,
mariadb,labsdb1009,0,0,0,1,,1,189,171,,0,,0,0,0,0,UP,1,1,0,0,0,447,0,,1,2,1,,1,,2,0,,1,L7OK,0,0,,,,,,,,,,,0,0,,,,,381,5.5.5-10.1.39-MariaDB,,0,0,0,1,,,,Layer7 check passed,,99999999,20,100000018,,,,,,tcp,,,,,,,,
mariadb,labsdb1010,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,447,0,,1,2,2,,0,,2,0,,0,L7OK,0,0,,,,,,,,,,,0,0,,,,,-1,5.5.5-10.1.39-MariaDB,,0,0,0,0,,,,Layer7 check passed,,99999999,20,100000018,,,,,,tcp,,,,,,,,
mariadb,BACKEND,0,0,0,1,500,1,189,171,0,0,,0,0,0,0,UP,2,2,0,,0,447,0,,1,2,0,,1,,1,0,,1,,,,,,,,,,,,,,0,0,0,0,0,0,381,,,0,0,0,1,,,,,,,,,,,,,,tcp,,,,,,,,
root@cumin1001:~# telnet dbproxy1019.eqiad.wmnet 3306
Trying 10.64.37.28...
Connected to dbproxy1019.eqiad.wmnet.
Escape character is '^]'.
Y
5.5.5-10.1.39-MariaDBv��S#RU5IM'�??�yO_o/=$:7D0bmysql_native_passwordConnection closed by foreign host.

From tools bastion I cannot connect, so I am going to open a ticket with netops to check the ACLs.

marostegui@tools-sgebastion-07:~$ telnet dbproxy1019.eqiad.wmnet 3306
Trying 10.64.37.28...
^C
marostegui@tools-sgebastion-07:~$ telnet dbproxy1011.eqiad.wmnet 3306
Trying 10.64.37.15...
Connected to dbproxy1011.eqiad.wmnet.
Escape character is '^]'.
Y
5.5.5-10.1.39-MariaDB��zyyrT*t.#�??�dPDB_611@0nYmysql_native_passwordConnection closed by foreign host.