Maintain-dbusers should handle failures due to replicas being in maintenance
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• madhuvishy
	Mar 1 2018, 11:24 PM

Description

When one of the labsdbs is in maint mode, maintain-dbusers errors out. We should probably have the script handle this gracefully.

See also T188508#4011548 and T188508#4013420

Details

	Subject	Repo	Branch	Lines +/-
	wiki replicas: maintain-dbusers to skip offline labsdb servers	operations/puppet	production	+10 -4
	wiki replicas: refactor some python and systemd stuff for maintain-dbusers	operations/puppet	production	+39 -22

Customize query in gerrit

Related Objects

Mentioned Here: T188508: MySQL access not working for wmde-inline-movedparagraphs on tools

Event Timeline

• madhuvishy created this task.Mar 1 2018, 11:24 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 1 2018, 11:24 PM

• madhuvishy triaged this task as Medium priority.Mar 1 2018, 11:24 PM

• madhuvishy added a project: Data-Services.

Quiddity renamed this task from Maintain-dbusers should handle failures due to replicas being in maintanence to Maintain-dbusers should handle failures due to replicas being in maintenance.Mar 1 2018, 11:35 PM

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.Mar 2 2018, 1:42 PM

• Bstorm subscribed.Mar 7 2018, 6:35 PM

• Bstorm claimed this task.Mar 12 2018, 6:34 PM

bd808 added a project: cloud-services-team (Kanban).Apr 9 2018, 9:26 PM

I have improved the ability to handle bad connection, but I am of the opinion that unless the script can be made to read the puppet configuration of another server, it cannot actually do this. The only true indicator is depooling, which is a setting on the dbproxy servers (which is not where this script runs).

edit - actually, I have an idea. I'll start committing things while I consider it.

Change 436328 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: refactor some python and systemd stuff for maintain-dbusers

https://gerrit.wikimedia.org/r/436328

gerritbot added a project: Patch-For-Review.May 30 2018, 5:23 PM

Change 436353 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: maintain-dbusers to skip offline labsdb servers

https://gerrit.wikimedia.org/r/436353

Change 436328 merged by Bstorm:
[operations/puppet@production] wiki replicas: refactor some python and systemd stuff for maintain-dbusers

https://gerrit.wikimedia.org/r/436328

Change 436353 merged by Bstorm:
[operations/puppet@production] wiki replicas: maintain-dbusers to skip offline labsdb servers

https://gerrit.wikimedia.org/r/436353

bd808 moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.Jun 4 2018, 12:00 AM

At this point, I think I've done everything that can practically be done for managing maintenance. Since maintenance is indicated by changes on a database proxy, there is really no way to inform maintain-dbusers of the problem without setting up a service that watches for changes in the haproxy configuration files and commits changes to puppet or sends some kind of feedback to the service. What we should do is simply pay attention to maintenance and remove the server from the config when a labsdb server is going to be locked up or offline. Offline is something the script can now manage at least, and it will fail eventually instead of doing weird loops (totally dead is easier to watch for than loopy).

• Bstorm closed this task as Resolved.Jun 29 2018, 4:12 PM

Maintain-dbusers should handle failures due to replicas being in maintenanceClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Maintain-dbusers should handle failures due to replicas being in maintenance
Closed, ResolvedPublic
Actions