Page MenuHomePhabricator

Place m5 proxies in codfw and eqiad
Closed, ResolvedPublic

Description

Wikitech will be moved out from m5 (T167973), and we should restore the proxies to avoid connecting directly to the database.

The eqiad proxies are in place:
dbproxy1017
dbproxy1021

We just need to switch DNS to point to them (and possible review all the grants).

codfw proxy isn't set in place, but it is ready to do so:
dbproxy2004 is set as spare
DNS record needs to be created for m5-master.codfw.wmnet pointing to it.

We need to review those grants too.

  • proxy in eqiad
  • fully reviewed proxy grants in eqiad
  • proxy in codfw
  • fully reviewed proxy grants in codfw

Event Timeline

Marostegui triaged this task as Medium priority.Aug 4 2021, 12:36 PM
Marostegui moved this task from Triage to Ready on the DBA board.

We could work on setting up the codfw proxy anytime, as there's nothing using it at the moment, as the service is active-passive.

@bd808 @Andrew @Bstorm I don't think there's anything that will affect your services on m5 codfw at the moment, but just a heads up.
eqiad one might be a different story if there's something not using m5-master.eqiad.wmnet and just using db1128.eqiad.wmnet...anyways, that change won't happen before the wikitech move (which is scheduled to be done after the DC switch in September)

Change 710266 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Productionize dbproxy2004

https://gerrit.wikimedia.org/r/710266

Change 710266 merged by Marostegui:

[operations/puppet@production] mariadb: Productionize dbproxy2004

https://gerrit.wikimedia.org/r/710266

Change 710267 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] production-m5.sql: Add codfw proxy user

https://gerrit.wikimedia.org/r/710267

Change 710267 merged by Marostegui:

[operations/puppet@production] production-m5.sql: Add codfw proxy user

https://gerrit.wikimedia.org/r/710267

Change 710268 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy2004: Fix port

https://gerrit.wikimedia.org/r/710268

Change 710268 merged by Marostegui:

[operations/puppet@production] dbproxy2004: Fix port

https://gerrit.wikimedia.org/r/710268

dbproxy2004 is now in place but not active yet (as in, there is no CNAME pointing to it), first I need to review all m5 grants as for now I only created the haproxy user to get haproxy up and running:

root@dbproxy2004:~# echo "show stat" | socat /run/haproxy/haproxy.sock stdio
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,
mariadb,FRONTEND,,,0,0,5000,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,0,0,0,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,tcp,,0,0,0,,0,0,
mariadb,db2135,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,5,0,,1,2,1,,0,,2,0,,0,L7OK,0,0,,,,,,,,,,,0,0,,,,,-1,5.5.5-10.4.18-MariaDB-log,,0,0,0,0,,,,Layer7 check passed,,99999999,20,100000018,,,,,,tcp,,,,,,,,
mariadb,db2078:3325,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,0,1,0,0,5,0,,1,2,2,,0,,2,0,,0,L7OK,0,0,,,,,,,,,,,0,0,,,,,-1,5.5.5-10.4.18-MariaDB-log,,0,0,0,0,,,,Layer7 check passed,,2,3,4,,,,,,tcp,,,,,,,,
mariadb,BACKEND,0,0,0,0,500,0,0,0,0,0,,0,0,0,0,UP,1,1,1,,0,5,0,,1,2,0,,0,,1,0,,0,,,,,,,,,,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,,,,,,,,,,,,,,tcp,,,,,,,,

Change 710489 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] production-m5.sql.erb: Add dbproxy2004 grants

https://gerrit.wikimedia.org/r/710489

Change 710489 merged by Marostegui:

[operations/puppet@production] production-m5.sql.erb: Add dbproxy2004 grants

https://gerrit.wikimedia.org/r/710489

Change 710526 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] production-m5.sql: Add more grants to dbproxy2004

https://gerrit.wikimedia.org/r/710526

Change 710526 merged by Marostegui:

[operations/puppet@production] production-m5.sql: Add more grants to dbproxy2004

https://gerrit.wikimedia.org/r/710526

Change 711437 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Add dbproxy2004 as m5-master in codfw

https://gerrit.wikimedia.org/r/711437

Change 711437 merged by Marostegui:

[operations/dns@master] wmnet: Add dbproxy2004 as m5-master in codfw

https://gerrit.wikimedia.org/r/711437

I have merged and deployed m5-master.codfw.wmnet.

# host m5-master.codfw.wmnet
m5-master.codfw.wmnet is an alias for dbproxy2004.codfw.wmnet.
dbproxy2004.codfw.wmnet has address 10.192.48.47

Next: keep reviewing all grants.

Change 712085 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] production-m5.sql: Add mailman grants from dbproxies

https://gerrit.wikimedia.org/r/712085

Change 712085 merged by Marostegui:

[operations/puppet@production] production-m5.sql: Add mailman grants from dbproxies

https://gerrit.wikimedia.org/r/712085

Grants for dbproxy2004 added (also added for eqiad proxies for mailman)

I have reviewed all the eqiad grants for m5 (it is a bit of a pain) and I think they are all in place, so going to mark it as done. I have not reviewed labswiki ones as the proxy will be turned on once labswiki has been moved out.

Change 713384 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy2004: Enable notifications

https://gerrit.wikimedia.org/r/713384

Change 713384 merged by Marostegui:

[operations/puppet@production] dbproxy2004: Enable notifications

https://gerrit.wikimedia.org/r/713384

wikitech was moved to s6. Still a few clean ups needed, but this can happen in a couple of weeks

Scheduled for Wednesday 27th Nov at 14:00 UTC

I am going to decrease the TTL to 1M on Tuesday.

Change 734449 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Decrease TTL for m5-master

https://gerrit.wikimedia.org/r/734449

Change 734449 merged by Marostegui:

[operations/dns@master] wmnet: Decrease TTL for m5-master

https://gerrit.wikimedia.org/r/734449

Change 734806 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Replace m5-master with dbproxy1017

https://gerrit.wikimedia.org/r/734806

Active databases in m5:

+---------------------+
| Database            |
+---------------------+
| labsdbaccounts      |
| mailman3            |
| mailman3web         |
| striker             |
| test_labsdbaccounts |
| toolhub             |
+---------------------+

Mentioned in SAL (#wikimedia-operations) [2021-10-27T14:00:04Z] <marostegui> Replace m5-master so it points to dbproxy1017 - T288093

Change 734806 merged by Marostegui:

[operations/dns@master] wmnet: Replace m5-master with dbproxy1017

https://gerrit.wikimedia.org/r/734806

Change 734976 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] mariadb: Allow lists1001 to talk to m5's dbproxy

https://gerrit.wikimedia.org/r/734976

Change 734976 merged by Legoktm:

[operations/puppet@production] mariadb: Allow lists1001 to talk to m5's dbproxy

https://gerrit.wikimedia.org/r/734976

So we had to revert, we found 3 issues.

root@lists1001:~# telnet dbproxy1017.eqiad.wmnet 3306
Trying 10.64.48.43...
Connected to dbproxy1017.eqiad.wmnet.
Escape character is '^]'
  • Striker was missing a grant (it was fixed on the fly and confirmed it was working)
  • Toolhub was missing egress rules as well, created T294437 to follow up.

I will review all the grants again, especially those not under 10.% rules (striker was using 208.%).

Thanks @Andrew @bd808 and @Legoktm for all the help before, during and after the deployment (and revert!)

I have reviewed the grants and they are all supposed to be in place. There are some of users which I think aren't in use (ie: ceilometer) that will need to be reviewed, cause we might want to do a massive clean up once we've moved to the proxy.
So I am ready to try this migration again. How about Tuesday 2nd at 14:00 UTC again? @Andrew @bd808 @Legoktm would that work?

Change 735092 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] production-m5.sql: Add striker GRANTS

https://gerrit.wikimedia.org/r/735092

Change 735092 merged by Marostegui:

[operations/puppet@production] production-m5.sql: Add striker GRANTS

https://gerrit.wikimedia.org/r/735092

I have reviewed the grants and they are all supposed to be in place. There are some of users which I think aren't in use (ie: ceilometer) that will need to be reviewed, cause we might want to do a massive clean up once we've moved to the proxy.

Feel free to wipe out any references to ceilometer that you find, it's a long-abandoned service.

So I am ready to try this migration again. How about Tuesday 2nd at 14:00 UTC again? @Andrew @bd808 @Legoktm would that work?

That time works for me!

How about Tuesday 2nd at 14:00 UTC again? @Andrew @bd808 @Legoktm would that work?

On the staff calendar that's listed as a US holiday because of election day.

Ah ups! Let's go for Thursday 11th Nov then?

Ah ups! Let's go for Thursday 11th Nov then?

This is going to make you giggle. That date is Armistice Day and also US Veterans Day (WMF US holiday).

Ah ups! Let's go for Thursday 11th Nov then?

This is going to make you giggle. That date is Armistice Day and also US Veterans Day (WMF US holiday).

!!!!!
How about Wednesday 10th 14:00 UTC? I wanted to avoid it as I have the s8 master switch pretty early in the EU morning, but I don't want to delay this a lot more.

How about Wednesday 10th 14:00 UTC?

Works for me :)

Thanks @Legoktm - let's wait to see if this also works for @bd808 and if it does I will send a calendar invite again :)

Thanks @Legoktm - let's wait to see if this also works for @bd808 and if it does I will send a calendar invite again :)

Yes, 2021-11-10T14:00Z works for me.

Change 737837 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Replace m5-master with dbproxy1017

https://gerrit.wikimedia.org/r/737837

Change 737837 merged by Marostegui:

[operations/dns@master] wmnet: Replace m5-master with dbproxy1017

https://gerrit.wikimedia.org/r/737837

This has been done and so far everything has worked fine.
Going to call this resolved! If for any reason this needs to be reverted during the EU night, it is as simple as:

Thank you @bd808, @Andrew and @Legoktm for your patience and testing!