Page MenuHomePhabricator

labsdb chat july 18/19

Authored By
chasemp
Jul 19 2016, 9:15 PM
Size
1 KB
Referenced Files
None
Subscribers
None

labsdb chat july 18/19

Labs MySQL infrastructure
3 replica dbs, 2 toolsdbs
Problems:
1. Load
- Cores usually at 100% https://grafana.wikimedia.org/dashboard/db/server-board?from=1468766304245&to=1468852464245&var-server=labsdb1003&var-network=eth0 + swapping
- Crashy
2. No HA solution
3. RAID0 - disk goes, data and everything is gone (happened to labsdb1002)
4. Current HA solution is to change what the DNS entries point to
problematic because:
1. Not transparent to users because of user dbs not replicated
5. Few tools/users take up majority of resources
6. TokuDB, used for labs, crashes frequently. Was used because it was able to compress things better, which we needed because of the large number of shards on single machines. Also causes lag / bogus results sometimes.
7. Lag spikes on things like updating tables that don't have indexes / replicas getting 'stuck' and needing restarting. Sometimes corruption (but that is probably a mediawiki issue)
8. Having lots of accounts with separate grants makes auditing difficult.
9. Users can't run EXPLAIN queries to check the theoretical efficiency of their SQL
10. Sanitizing needs to be both: more secure and more automatic
Proposed solutions:
1. Switch to InnoDB compressed, ditch TokuDB. Needs more disk but we have them now. Will make re-imports from prod easier too
2. RAID 10 not RAID 0
3. Possibly use HAProxy, but might need L7 proxying instead. How to handle user dbs on replicas (*large pain point* for HA)?
4. Use mariaddb 10.1 "roles" to manage common permissions (<https://mariadb.com/kb/en/mariadb/roles-overview/>)

File Metadata

Mime Type
text/plain; charset=utf-8
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3876391
Default Alt Text
labsdb chat july 18/19 (1 KB)

Event Timeline