Fri, Apr 13
Tue, Apr 10
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCZwhGWhhv+9QdjhhShbLdSZSV349oFxPH73CfvI0jRsQFXsQIlPQaSeKcFqw+kjhUoxvfgCw3YWoExHTT6jxHUxrOswI6ZVPeicHNBQ4kiRRY4uKE0xpqbdnkbLRSNWyru8zG1aB/uxpkhsQhwnUZ9fpGtDkXzX1In8NZ7X9jMQB6yrHFxqK/549WELGnpscL79lX7uKM2Ri/+v61th7kuDyn6VjsIMSLdt46dKoW9WgQ2UgkjEh67HOZd1FYt4V+OaQcNr2JtHj7nSI6YsXx9TQnBrQVqWQXk63AFNxw4uD7xFVByc4FIqefIYjHqHANRWpRmaNOcj6LaBTqXZUBSmtYRiLkXUhqhr1Tf1NiE75UjGKhknucpywXTYI02HaTdEcdxfN4C9guI+ojxwUKrIMEk9Wz3qcYzyN0QZmCL/6EcRxjEUzYDpEt0tMBRsRqE5Qp0TLPuDsK5trY1rtdzy/HckqmSik9N1p2WQ941SWs2EEiFji1jiCM4N8gwy1r6mf9xo5LWRVY/LtNYbCf/2EfW3mjreP9MaOGI+vedcS8I4sd6O3VP8WPpXZtoBU1+EKLhEHvfp/E9qYYr6iWIltCFySi67fWlv83cUNezJ6uMrDR++g8ANkFJKEWSHJzdVyrtf2fiwNNyIPrkEawHAcKHsZsVGdzkP9Xr8eBb7Q== sean@laptop
Thu, Apr 5
Updated to a non-wmf email in phabricator profile settings.
Wed, Apr 4
Nov 2 2016
Aug 28 2015
Aug 24 2015
Regarding implementing this on replicas: nothing special is needed. Change on the master will propagate (but rate-limit everything, to avoid replag).
A white list is fine. No real difference from DBA perspective, so +1.
@mforns, yes, feel free.
Aug 13 2015
Aug 7 2015
It's correct in that Mediawiki still supports MySQL 5.0.2  with a default SQL_MODE setting . Previous discussions about enforcing strict mode havn't gone far, mainly due to questions about MW extensions.
Jul 30 2015
Just to clarify, I don't think we've seen actual OOM killer on s[1-7], right? The only front-line production concern is swapping on s3 db1035?
Also, all bot driven, eg: tide543.microsoft.com
Continuing on db1042 right now and queries regularly hitting 5min limit. The LIMIT 10405000,501 is most of the problem here; shouldn't be possible to generate such a search.
Jul 28 2015
At Yuvi's request on IRC I added 'labsdbadmin'@'10.64.37.7' with the same permissions/password as @jcrespo added for 'labsdbadmin'@'10.64.37.6', since the catastrophic failure of the latter.
Jul 27 2015
+1 to the provisioning.
Jul 24 2015
@jcrespo, no, I did not out-of-band change or use skip counter. I found the machine exactly as you described on IRC, and only did research by dumping logs to get the query examples shown in ticket description.
Jul 23 2015
Jul 22 2015
Jul 20 2015
All s1-7 slaves are now 10.0.
@jcrespo, ouch, so maybe something is broken in our physical backup process (was it xtrabackup?) that required the SQL_SLAVE_SKIP_COUNTER after reinstall on 2015-06-29? Scary...
Jul 17 2015
Jul 15 2015
For now, I will depool db1022.
So, I need to get some more information from Jaime about what occurred on the weekend (he is unwell and on leave since then), but looking over the logs I see:
Jul 14 2015
Jul 13 2015
Also, yes, time to plan another batch.
Jul 10 2015
Jul 9 2015
Jul 8 2015
Jul 7 2015
Jul 6 2015
A bunch of "unauthenticated user" in processlist still makes me suspect the thread pool, since that symptom has been seen on prod slaves with thread_pool_size=16 (but not the immediate all-connections-fail, which is indeed odd).
Wasn't sure if you wanted to change that process :)
Jul 4 2015
Doing this to most production DBs seems straight forward. Pain points, due just to complexity, will be on M[1-4].
Jul 3 2015
(4) == EINTR on connect. Presumably the max_connections you observed, which in turn possibly something to do with:
Jul 2 2015
Tried a restart of dbstore2002, but s7 replication behavior was unchanged: Yes/Yes for replication threads, master exec position advancing, yet no changes appearing.
Jul 1 2015
Jun 30 2015
Jun 24 2015
Jun 19 2015
Jun 18 2015
Oops, sorry, this was fixed a while back.
Schema change done on x1-master flowdb.
Jun 17 2015
Jun 16 2015
Jun 15 2015
Jun 11 2015
We'll use something other than !log, to maintain -operations supremacy. This is more about quickly posting notes on database hosts -- and tendril is just an obvious place, for us -- since db maintenance tasks tend to be slow, have multiple stages, and likely to be handed over to the next shift.
Jun 10 2015
So, to be clear, this isn't intended to:
This would need a bit of downtime to migrate data, but in the order of hours, not days. Not especially difficult, and would slightly simplify the sanitarium/labs setup (though most of the complexity there is not the private wikis, which are easy to blanket-filter). +1
Jun 8 2015
If the box is taken down, keep the list in the loop.
Jun 7 2015
I did indeed do a similar fix to get s6 going, and then sinned slightly by setting slave_exec_mode=idempotent to keep it alive for the weekend.
Jun 6 2015
- Move vslow/dump to db1037 with load 0
- Reinstall db1022 as trusty (or try out Jessie? we should do this eventually, and wmf-mariadb10 package is fine on Jessie)
- Check which s6 slaves have small ibdata global tablespaces. If they're all still large (from pre file-per-table days), also consider a fresh s6 dump/reload on db1022 from which we can eventually reclone the others.