Review labsdb1005 MariaDB configuration against prod standards
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• GTirloni
	Feb 14 2019, 7:11 PM

Description

MariaDB is not maintained for Jessie and labsdb1005 is Jessie.

We're afraid we might be missing some settings that are used in Stretch, which could be causing the latest outages we've seen.

This task is to sync up with DBAs and identify potential improvements (including evaluating upgrade to Stretch).

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• Bstorm	T216208 ToolsDB overload and cleanup
		Resolved		• Bstorm	T216168 Review labsdb1005 MariaDB configuration against prod standards

Event Timeline

• GTirloni created this task.Feb 14 2019, 7:11 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 14 2019, 7:11 PM

Worse, we may very well have the settings that are used in Stretch, and the settings may not apply well to our version of MariaDB.

• Bstorm added a project: Data-Services.Feb 14 2019, 10:13 PM

If our hardware issues clear up, the highest priority is on moving this DB to stretch and friends T216173
Until then...

Besides evaluating against typical standards, a general healthcheck, if possible from @jcrespo or @Marostegui might help things until we can get this moved. Moving things and possibly per-user limits as in T216170 are likely ways forward, however extreme slowness and some odd timeouts were reported before the outage. I don't know what might have been causing them.

Our MariaDB configuration is almost essentially the same in any mariadb version. Our concern with jessie is that upgrades not happening means exposure to vulns (on the mariadb server).

labsdb1005_2019-02-15.png (927×1 px, 236 KB)

labsdb1005_2019-02-15_2.png (916×1 px, 253 KB)

At this moment, I've been getting reports of ridiculously slow query times even against unique indexes. We are still running table repair (against very large tables), so we are hoping that's what's up. However, I'm wondering if the performance of this thing is the real cause of the connection pileup. Hard to tell right now with repairs underway.

There were reports of this happening earlier, but none prove whether the slowness is the chicken or the egg. There are definitely "timeout" errors where there shouldn't be any in some places before the connections filled up.

We also found this T216202, but it's just one disk...

I've stopped the repair, it is not possible to run it with so many ongoing queris, that will create matadata locking. I have also killed some queries.

Ok thanks!

bd808 added a parent task: T216208: ToolsDB overload and cleanup.Feb 15 2019, 12:17 AM

bd808 moved this task from Backlog to ToolsDB on the Data-Services board.Feb 19 2019, 1:09 AM

The question around this issue appears to have been answered, and the master is now on Stretch which helps.

Review labsdb1005 MariaDB configuration against prod standardsClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Review labsdb1005 MariaDB configuration against prod standards
Closed, ResolvedPublic
Actions

Related Objects
Search...