once provisioned, we should migrate graphite there
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Eevans | T134016 RESTBase Cassandra cluster: Increase instance count to 3 | |||
Invalid | fgiunchedi | T85451 scale graphite deployment (tracking) | |||
Resolved | fgiunchedi | T85909 migrate graphite to new hardware | |||
Restricted Task | |||||
Restricted Task | |||||
Restricted Task |
Event Timeline
Change 187663 had a related patch set uploaded (by Filippo Giunchedi):
introduce graphite raid10-lvm configuration
Change 187664 had a related patch set uploaded (by Filippo Giunchedi):
provision graphite[12]001
Change 187663 merged by Filippo Giunchedi:
introduce graphite raid10-lvm configuration
Change 187683 had a related patch set uploaded (by Filippo Giunchedi):
graphite: explicit install python-twisted-core
Change 187690 had a related patch set uploaded (by Filippo Giunchedi):
graphite: format /var/lib/carbon
currently running rsync to transfer metrics changed in the last month to graphite1001, there's ~380k metrics changed in the last 30d and a parallel rsync is churning at ~3/s so ETA for the initial sync is ~1.5 days
Change 187683 merged by Filippo Giunchedi:
graphite: explicit install python-twisted-core
Change 188035 had a related patch set (by Filippo Giunchedi) published:
graphite: move to graphite1001
Change 188036 had a related patch set (by Filippo Giunchedi) published:
graphite: move to graphite1001
changes 188035 and 188036 should be enough to change traffic over from tungsten to graphite1001, the plan is to merge those and wait for dns and puppet to propagate and traffic to move to graphite1001
backfilling data is tricker however, carbonate doesn't lock whisper files by default so there's a chance for corruption if both carbonate and carbon-cache want to update the same file, see also https://github.com/jssjr/carbonate/issues/19
note also that gdash won't be migrated at the moment, I've run into some (I think) ruby 1.8 -> 1.9 and rubygems which I don't want to get blocked by:
root@graphite1001:/var/log/upstart# tail -15 /var/log/upstart/uwsgi_app-gdash.log ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux] your server socket listen backlog is limited to 100 connections your mercy for graceful operations on workers is 60 seconds mapped 161968 bytes (158 KB) for 1 cores *** Operational MODE: single process *** /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- gdash (LoadError) from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require' from /etc/gdash/config.ru:2:in `block in <main>' from /usr/lib/ruby/vendor_ruby/rack/builder.rb:55:in `instance_eval' from /usr/lib/ruby/vendor_ruby/rack/builder.rb:55:in `initialize' from /etc/gdash/config.ru:in `new' from /etc/gdash/config.ru:in `<main>' from /usr/lib/ruby/vendor_ruby/rack/builder.rb:49:in `eval' from /usr/lib/ruby/vendor_ruby/rack/builder.rb:49:in `new_from_string' from /usr/lib/ruby/vendor_ruby/rack/builder.rb:40:in `parse_file'
Change 188069 had a related patch set uploaded (by Filippo Giunchedi):
Make gdash's uWSGI config.ru Ruby 1.9-compatible
Change 188539 had a related patch set uploaded (by Filippo Giunchedi):
graphite: move to graphite1001
Change 188563 had a related patch set uploaded (by Filippo Giunchedi):
gdash: move from tungsten to graphite1001
Change 188567 had a related patch set uploaded (by Filippo Giunchedi):
webperf: handle missing 'duration' in schema
Change 188788 had a related patch set uploaded (by Filippo Giunchedi):
graphite: move gdash performance to graphite1001
Change 188788 merged by Filippo Giunchedi:
graphite: move gdash performance to graphite1001
Change 188069 merged by Filippo Giunchedi:
Make gdash's uWSGI config.ru Ruby 1.9-compatible
Change 189504 had a related patch set uploaded (by Filippo Giunchedi):
gdash: fix graphite disk dashboard sda->md1
Change 189504 merged by Filippo Giunchedi:
gdash: fix graphite disk dashboard sda->md1
graphite1001 in service at the moment, waiting for graphite2001 to be online to resolve this
also pending is backfill of metrics from tungsten via carbonate, but see https://github.com/jssjr/carbonate/issues/47 on why we can't do it straight away (or without shutting carbon-cache down anyway)
resolving this, graphite2001 has been deployed and I've moved the backfilling of metrics from tungsten to T90591 where it belong