Page MenuHomePhabricator

cxserver: Section Mapping Database (m5) not accessible by certain region
Closed, ResolvedPublic

Description

cxserver's section mapping database (m5) seems down for certain region (ie Europe), while works well in India.

Logstash has some entries of the error:

Error: Could not connect to section mapping databaseError: connect ETIMEDOUT
    at Connection._handleConnectTimeout (/srv/service/node_modules/mysql/lib/Connection.js:409:13)
    at Object.onceWrapper (node:events:627:28)
    at Socket.emit (node:events:513:28)
    at Socket._onTimeout (node:net:526:8)

To test,

https://cxserver.wikimedia.org/v2/suggest/sections/United_States/en/de

should return sections.

Event Timeline

Is it active/active? We don't have have anything in other datacenters, only between eqiad and codfw. I highly doubt it's a database issue.

Ladsgroup added a subscriber: akosiaris.

The databases are working fine, I think k8s can't reach the new proxies again. cc. @akosiaris

This might be related to T337812 I have failed back to dbproxy1017 until it is investigated which FW are needed for that new host.
Please revert https://gerrit.wikimedia.org/r/935692 once it is fixed.

@KartikMistry allow 5-10 minutes for the DNS to spread.

Confirmed this works now: https://cxserver.wikimedia.org/v2/suggest/sections/United_States/en/de
@akosiaris could you figure out what is needed to get dbproxy1027 to work too? Thanks

This might be related to T337812 I have failed back to dbproxy1017 until it is investigated which FW are needed for that new host.
Please revert https://gerrit.wikimedia.org/r/935692 once it is fixed.

@KartikMistry allow 5-10 minutes for the DNS to spread.

Working fine in the Europe region now.

Cool, once the FW has been changed, we'd need to revert that patch and confirm it keeps working as expected. Thanks for the heads up and sorry for the inconveniences!

Change 935746 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] modules: Add a new networkpolicy for base modules

https://gerrit.wikimedia.org/r/935746

Change 935748 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] cxserver: Bump to networkpolicy_1.1.0.tpl

https://gerrit.wikimedia.org/r/935748

Change 935749 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] cxserver: Migrate to the new MariaDB egress functionality

https://gerrit.wikimedia.org/r/935749

Change 935746 merged by jenkins-bot:

[operations/deployment-charts@master] modules: Add a new networkpolicy for base modules

https://gerrit.wikimedia.org/r/935746

Change 935748 merged by jenkins-bot:

[operations/deployment-charts@master] cxserver: Bump to networkpolicy_1.1.0.tpl

https://gerrit.wikimedia.org/r/935748

Change 935749 merged by jenkins-bot:

[operations/deployment-charts@master] cxserver: Migrate to the new MariaDB egress functionality

https://gerrit.wikimedia.org/r/935749

Mentioned in SAL (#wikimedia-operations) [2023-08-29T16:09:43Z] <akosiaris> deploy cxserver mariadb egress functionality. T341117

Fix merged and deployed. Some hiccups aside, it works fine across all 3 environments (staging, production eqiad, production codfw). I 'll resolve this one.