Fri, Sep 20
Thu, Sep 19
TTBMK, this is the only remaining blocker to fully moving sessionstore to production.
Wed, Sep 18
restbase2012 is decommissioned and can be reimaged at any time.
After startup, Kask doesn't log much outside of exceptional errors; To invoke log output try issuing a HEAD request:
Tue, Sep 17
restbase2011 is fully decommissioned and ready to be reimaged.
The p99 looks suspiciously close to the cross DC latency, is there any way it is related?
Mon, Sep 16
restbase2010 is ready to be reimaged.
Fri, Sep 13
restbase2009 is fully decommissioned and ready to be reimaged.
Thu, Sep 12
restbase1018 is decommissioned and ready to be reimaged.
Tue, Sep 10
restbase-dev1005 has been decommissioned and is ready to be reimaged.
Part of what has made this so odd is that it only ever occurred to this one instance, so good/bad news, this has now been observed on 2009-b as well.
I've started the decommission of -dev1005-b quite late in my evening; It should be complete by EU morning. If there is no output from running ssh restbase-dev1004.eqiad.wmnet -- c-any-nt status -r | grep 1005, then the node can be taken down for reimage.
Mon, Sep 9
Fri, Sep 6
Thu, Sep 5
restbase-dev1004 has been decommissioned and can come down for a re-image at any time.
Wed, Sep 4
Tue, Sep 3
Fri, Aug 30
Thu, Aug 29
The timeline of events goes like this:
This instance has gone down again, we should dig deeper.
Wed, Aug 28
This imbalance seems to resolved itself.
I do not know how it came to pass that machines are getting setup without reserved space, but given how long this issue has been open (and since I'm still unsure how to best go about Puppetizing this), I think we should accept this as a gift and close the issue.
Interestingly, reserved space on the main data volumes in the production cluster already have zero reserved blocks.
Host has been rebooted and there are no ACPI errors present.
Tue, Aug 27
Mon, Aug 26
OK, time for our yearly update of this ticket!
We're up to 3.5.0 of the driver at this time; Let's close this unless we can establish that there is more to do.
Let's just do this already.
I used some 10% time to have a (cursory) look at the Datastax PHP driver. Some observations:
Fri, Aug 23
Aug 22 2019
I've only looked at this briefly, but some observations:
- The JVM seems to have spontaneously and uncerimoniously self-destructed:
- No logged (fatal) exceptions
- No crash log or heap dump
- None of the usual signs of distress preceding the event
- StatusLogger frequency
- Major GCs
- Load avg, cpu, IO
- High latency immediately preceding the event shared by 2009-b