Eevans (Eric Evans)
Senior Software Engineer

Projects (13)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Feb 27 2015, 10:47 PM (164 w, 1 d)
Availability
Available
IRC Nick
urandom
LDAP User
Eevans
MediaWiki User
Unknown

Recent Activity

Fri, Apr 20

Eevans added a comment to T192689: Unchecked storage growth(?).

OK, I've started mining some random SSTables in the wikipedia_T_parsoid__ng_html keyspace, and I've found some examples that definitely seem to support the prevailing theory.

Fri, Apr 20, 9:37 PM · User-mobrovac, User-Eevans, Cassandra, Services (doing)
Eevans updated the task description for T192689: Unchecked storage growth(?).
Fri, Apr 20, 9:06 PM · User-mobrovac, User-Eevans, Cassandra, Services (doing)
Eevans updated the task description for T192689: Unchecked storage growth(?).
Fri, Apr 20, 9:05 PM · User-mobrovac, User-Eevans, Cassandra, Services (doing)
Eevans triaged T192689: Unchecked storage growth(?) as High priority.
Fri, Apr 20, 9:03 PM · User-mobrovac, User-Eevans, Cassandra, Services (doing)
Eevans created T192689: Unchecked storage growth(?).
Fri, Apr 20, 9:02 PM · User-mobrovac, User-Eevans, Cassandra, Services (doing)

Thu, Apr 19

Eevans added a comment to T189822: Replace 5 Samsung SSD 850 devices w/ 4 1.6T Intel or HP SSDs.

The decommission of restbase1010-c was discontinued after other instances in the rack began to fail (1016-{a,b,c}, 1010-b & 1007-a). The failures in question all seem to be the result of a JVM out of memory (as opposed to an application OutOfMemory exception).

Thu, Apr 19, 7:36 PM · ops-eqiad, Services (blocked), Operations, hardware-requests, Cassandra, User-Eevans

Wed, Apr 18

Eevans edited P7013 Masterwork From Distant Lands.
Wed, Apr 18, 7:58 PM
Eevans edited P7012 Masterwork From Distant Lands.
Wed, Apr 18, 7:57 PM
Eevans updated subscribers of T192456: Prometheus metrics missing for some hosts.

For the machines affected, executing curl against the exporter URL just hangs indefinitely. I attempted to restart 1011-a to no avail. I then live-hacked cassandra-env.sh to roll back the exporter jar to the 0.8 version we used before, and it is now working. More investigation is needed.

Wed, Apr 18, 3:08 PM · Services (doing), Cassandra
Eevans triaged T192456: Prometheus metrics missing for some hosts as High priority.
Wed, Apr 18, 2:36 PM · Services (doing), Cassandra
Eevans created T192456: Prometheus metrics missing for some hosts.
Wed, Apr 18, 2:36 PM · Services (doing), Cassandra

Tue, Apr 17

Eevans added a comment to T189822: Replace 5 Samsung SSD 850 devices w/ 4 1.6T Intel or HP SSDs.

@mobrovac The 5 ssds arrived for restbase1010. Do you need to schedule down time to replace?

We'll need to decommission the 3 instances running there first. After the disks are swapped, we'll need someone to do the re-image before we can bootstrap. I'm guessing that will be @fgiunchedi, so we should probably confirm his availability before we begin.

Tue, Apr 17, 4:34 PM · ops-eqiad, Services (blocked), Operations, hardware-requests, Cassandra, User-Eevans
Eevans added a comment to T189822: Replace 5 Samsung SSD 850 devices w/ 4 1.6T Intel or HP SSDs.

@mobrovac The 5 ssds arrived for restbase1010. Do you need to schedule down time to replace?

Tue, Apr 17, 4:31 PM · ops-eqiad, Services (blocked), Operations, hardware-requests, Cassandra, User-Eevans

Fri, Apr 13

Eevans updated the task description for T191660: Script to collect forensic data from Cassandra hosts.
Fri, Apr 13, 4:28 PM · Services (next), Cassandra, User-Eevans
Eevans renamed T186751: Restablish RESTBase dev environment with Cassandra 3.11.2 from Reset RESTBase dev environment to Restablish RESTBase dev environment with Cassandra 3.11.2.
Fri, Apr 13, 1:40 PM · Services (doing), User-Eevans
Eevans updated the task description for T186751: Restablish RESTBase dev environment with Cassandra 3.11.2.
Fri, Apr 13, 1:30 PM · Services (doing), User-Eevans
Eevans added a comment to T186751: Restablish RESTBase dev environment with Cassandra 3.11.2.

At this point I'm fairly certain that this isn't a memory leak in the conventional sense. A bug in change-propagation had prevented sampling from working, and the dev cluster (with < 20% the capacity of production), was seeing throughput levels in excess of 5x what have in production. The excessive heap utilization would seem to be the result of additional per-thread state associated with this higher throughput. This is still worth pursuing upstream, since this is not how an application should degrade in the face of high load, but since it effects 3.11.0 as well, I think we can remove this as a blocker to a 3.11.2 upgrade

Fri, Apr 13, 1:30 PM · Services (doing), User-Eevans
Eevans renamed T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 from Evaluate new upstream Cassandra release: 3.11.2 to Upgrade RESTBase cluster to Cassandra release: 3.11.2.
Fri, Apr 13, 1:29 PM · User-Eevans, Services (next), Cassandra

Thu, Apr 12

Eevans added a comment to T192112: Consider using default JVM G1GC settings in the RESTBase Cassandra cluster.

Let's start with a couple of canaries in restbase1010 and restbase2003 respectively.

Thu, Apr 12, 11:39 PM · Patch-For-Review, Cassandra, Services (doing), User-Eevans
Eevans triaged T192112: Consider using default JVM G1GC settings in the RESTBase Cassandra cluster as Normal priority.

Let's start with a couple of canaries in restbase1010 and restbase2003 respectively.

Thu, Apr 12, 8:57 PM · Patch-For-Review, Cassandra, Services (doing), User-Eevans
Eevans created T192112: Consider using default JVM G1GC settings in the RESTBase Cassandra cluster.
Thu, Apr 12, 8:56 PM · Patch-For-Review, Cassandra, Services (doing), User-Eevans

Wed, Apr 11

Eevans added a comment to T186751: Restablish RESTBase dev environment with Cassandra 3.11.2.

fyi: T189050 / T189050#4124163 means you should not have worry anymore about scheduling downtimes for these services when on the dev environment.

Wed, Apr 11, 7:33 PM · Services (doing), User-Eevans

Fri, Apr 6

mobrovac awarded T191660: Script to collect forensic data from Cassandra hosts a 100 token.
Fri, Apr 6, 8:52 PM · Services (next), Cassandra, User-Eevans
Eevans triaged T191662: Document RESTBase cluster severity / response information as Normal priority.
Fri, Apr 6, 7:53 PM · Services (next), Cassandra, RESTBase, User-Eevans
Eevans created T191662: Document RESTBase cluster severity / response information.
Fri, Apr 6, 7:53 PM · Services (next), Cassandra, RESTBase, User-Eevans
Eevans triaged T191660: Script to collect forensic data from Cassandra hosts as Low priority.
Fri, Apr 6, 7:43 PM · Services (next), Cassandra, User-Eevans
Eevans created T191660: Script to collect forensic data from Cassandra hosts.
Fri, Apr 6, 7:43 PM · Services (next), Cassandra, User-Eevans
Eevans triaged T191659: Configure a threshold for earlier notification of /srv/cassandra/instance-data as Low priority.
Fri, Apr 6, 7:36 PM · User-fgiunchedi, Operations, Services (next), RESTBase-Cassandra, User-Eevans, Cassandra
Eevans created T191659: Configure a threshold for earlier notification of /srv/cassandra/instance-data.
Fri, Apr 6, 7:36 PM · User-fgiunchedi, Operations, Services (next), RESTBase-Cassandra, User-Eevans, Cassandra
Eevans triaged T191627: Remove Cassandra 2.2.6 packages from jessie-wikimedia/thirdparty apt repo as Normal priority.
Fri, Apr 6, 1:28 PM · Services (watching), Operations, User-Eevans, Discovery, Maps, Cassandra
Eevans added a project to T191627: Remove Cassandra 2.2.6 packages from jessie-wikimedia/thirdparty apt repo: User-Eevans.
Fri, Apr 6, 1:01 PM · Services (watching), Operations, User-Eevans, Discovery, Maps, Cassandra
Eevans created T191627: Remove Cassandra 2.2.6 packages from jessie-wikimedia/thirdparty apt repo.
Fri, Apr 6, 1:01 PM · Services (watching), Operations, User-Eevans, Discovery, Maps, Cassandra

Wed, Apr 4

Eevans added a comment to T186751: Restablish RESTBase dev environment with Cassandra 3.11.2.

Status update:

Wed, Apr 4, 4:58 PM · Services (doing), User-Eevans

Tue, Apr 3

Eevans added a comment to T186751: Restablish RESTBase dev environment with Cassandra 3.11.2.

The cluster is now running 3.11.2 (release).

Tue, Apr 3, 6:32 PM · Services (doing), User-Eevans
Eevans added a comment to T191315: Cassandra Graphite metrics space usage audit and cleanup.

FWIW: The RESTBase cluster has been disabled for some time. I just disabled the RESTBase Dev cluster as well.

Tue, Apr 3, 4:56 PM · User-fgiunchedi, Services (watching), Graphite, Operations

Thu, Mar 29

Eevans added a comment to T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2.

Opened CASSANDRA-14355 to track a memory leak encountered in the dev environment (reproduced on 3.11.0 and 3.11.2).

Thu, Mar 29, 9:24 PM · User-Eevans, Services (next), Cassandra

Tue, Mar 27

Eevans closed T190869: restbase2007 crash as Resolved.

I'm not sure what else we can do to troubleshoot this; Hopefully this is a one-time freak occurrence. If this recurs, we can reopen this ticket, and try again.

Tue, Mar 27, 7:32 PM · Cassandra, Services (done), User-Eevans
Eevans triaged T190869: restbase2007 crash as Normal priority.
Tue, Mar 27, 7:30 PM · Cassandra, Services (done), User-Eevans
Eevans created T190869: restbase2007 crash.
Tue, Mar 27, 7:30 PM · Cassandra, Services (done), User-Eevans
Eevans added a comment to T189822: Replace 5 Samsung SSD 850 devices w/ 4 1.6T Intel or HP SSDs.
Tue, Mar 27, 5:19 PM · ops-eqiad, Services (blocked), Operations, hardware-requests, Cassandra, User-Eevans
Eevans added a comment to T189822: Replace 5 Samsung SSD 850 devices w/ 4 1.6T Intel or HP SSDs.

I'm a bit confused on which system will get the replacement SSDs installed? I'm guessing its restbase2007 or restbase2008, as those each have 5 Samsung 850 EVO SSDs.

I'll generate a sub-task with pricing, since it cannot be in a public task. On that sub-task, I'll request a quote from Dasher for the Intel SSDs, since the restbase200[78] systems are in warranty until 2019-04-22.

Tue, Mar 27, 4:58 PM · ops-eqiad, Services (blocked), Operations, hardware-requests, Cassandra, User-Eevans
Eevans added a comment to T186567: Deprecate cassandra-metrics-collector?.

I propose that we first start with the AQS cluster by installing the Prometheus exporter and cassandra-metrics-collector side-by-side, and see where we stand with the dashboards. If it turns out to be straightforward to make the dashboards work interchangeably, then we can consider either a) doing the same with Maps, or b) upgrading Maps to Cassandra 2.2.

AQS is now running the jmx agent and its metrics are available in the prometheus analytics instance (not the services one). @Eevans I'd love to keep one set of dashboard shared between restbase and aqs/maps to avoid duplication of efforts, whenever you have time let's chat about the differences between 2.2 and 3.x.

Tue, Mar 27, 4:12 PM · Cassandra, Services (next), User-Eevans

Mar 22 2018

Eevans added a comment to T189529: Test/upload new cassandra 2.2.6 package (wmf3).

Thanks! When I try to push the .changes I get that I am missing orig.tar.gz:

root@install1002:/srv/wikimedia# reprepro -C component/cassandra22 --ignore=wrongdistribution include jessie-wikimedia ~elukey/cassandra/cassandra_2.2.6-wmf3_amd64.changes
.changes put in a distribution not listed within it!
Ignoring as --ignore=wrongdistribution given.
Unable to find pool/component/cassandra22/c/cassandra/cassandra_2.2.6.orig.tar.gz needed by cassandra_2.2.6-wmf3.dsc!
Perhaps you forgot to give dpkg-buildpackage the -sa option,
 or you could try --ignore=missingfile to guess possible files to use.
Deleting files just added to the pool but not used.
(to avoid use --keepunusednewfiles next time)
There have been errors!

This is probably my ignorance about debian build process, but are we missing the file or should I try the option to skip it?

Mar 22 2018, 2:58 PM · User-Elukey, Operations, Services (doing), Cassandra, User-Eevans
Eevans updated the task description for T189529: Test/upload new cassandra 2.2.6 package (wmf3).
Mar 22 2018, 2:56 PM · User-Elukey, Operations, Services (doing), Cassandra, User-Eevans
Eevans updated the task description for T186751: Restablish RESTBase dev environment with Cassandra 3.11.2.
Mar 22 2018, 2:48 PM · Services (doing), User-Eevans

Mar 21 2018

Eevans updated the task description for T189529: Test/upload new cassandra 2.2.6 package (wmf3).
Mar 21 2018, 11:58 PM · User-Elukey, Operations, Services (doing), Cassandra, User-Eevans
Eevans added a comment to T189529: Test/upload new cassandra 2.2.6 package (wmf3).

A new package that actually applies the patch within has been uploaded; Sorry for the wasted cycles!

Mar 21 2018, 11:57 PM · User-Elukey, Operations, Services (doing), Cassandra, User-Eevans
Eevans updated the task description for T189529: Test/upload new cassandra 2.2.6 package (wmf3).
Mar 21 2018, 11:56 PM · User-Elukey, Operations, Services (doing), Cassandra, User-Eevans
Eevans updated the language for P6871 Masterwork From Distant Lands from autodetect to yaml.
Mar 21 2018, 3:22 PM
Eevans edited P6871 Masterwork From Distant Lands.
Mar 21 2018, 3:21 PM
Eevans updated the language for P6870 Masterwork From Distant Lands from autodetect to yaml.
Mar 21 2018, 3:03 PM
Eevans edited P6870 Masterwork From Distant Lands.
Mar 21 2018, 3:02 PM
Eevans updated the language for P6869 Masterwork From Distant Lands from autodetect to yaml.
Mar 21 2018, 2:16 PM
Eevans edited P6869 Masterwork From Distant Lands.
Mar 21 2018, 2:16 PM

Mar 20 2018

Eevans updated the task description for T186751: Restablish RESTBase dev environment with Cassandra 3.11.2.
Mar 20 2018, 6:34 PM · Services (doing), User-Eevans
Eevans edited P6865 Masterwork From Distant Lands.
Mar 20 2018, 5:14 PM
Eevans added a comment to T189889: Excessive number of idle Cassandra connections.

Without spelunking too deeply into the NodeJS driver code, I'd say it is to a) intelligently route requests to a coordinator that has the data (and in doing so, eliminate a hop of routing stretch), b) to load-balance among the subset of nodes selected in (a), and c) to route around failures in doing so (these are the common conventions, anyway). This would all require the full compliment of nodes to do correctly.

Right, I meant I don't know why all of these connections are established from the get-go. A better approach IMHO would be to keep the amount of connections minimal at start-up and then open new ones as needed. This would mean that the first queries to hit certain nodes would be slower, but I think that's a good balance. One could argue that in doing so all of the connections would be established eventually, but given the amount of idle connections, I don't think this would happen in practice in our case.

Mar 20 2018, 4:09 PM · Services (next), RESTBase-Cassandra, RESTBase, Cassandra, User-Eevans

Mar 19 2018

elukey awarded T189529: Test/upload new cassandra 2.2.6 package (wmf3) a Love token.
Mar 19 2018, 4:03 PM · User-Elukey, Operations, Services (doing), Cassandra, User-Eevans
Eevans added a comment to T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2.

Times up, 8u162-b12-1~bpo8+1 has been made available.

Mar 19 2018, 3:56 PM · User-Eevans, Services (next), Cassandra
Eevans added a comment to T189529: Test/upload new cassandra 2.2.6 package (wmf3).

Tried to (manually via dpkg -i) install cassandra 2.2.6-wmf3 on aqs1004:

elukey@aqs1004:~$ dpkg -l | grep cassandra
ii  cassandra                                2.2.6-wmf3                       all          distributed storage system for structured data
ii  cassandra-tools                          2.2.6-wmf3                       all          distributed storage system for structured data
ii  cassandra-tools-wmf                      1.0.1-1                          all          add-ons to make Wikimedia Cassandra operations easier

But the patch for /usr/sbin/cassandra seems not applied, the following snippet is still uncommeted:

# see CASSANDRA-7254
"$JAVA" -cp "$CLASSPATH" $JVM_OPTS 2>&1 | grep -q 'Error: Exception thrown by the agent : java.lang.NullPointerException'
if [ $? -ne "1" ]; then
    echo Unable to bind JMX, is Cassandra already running?
    exit 1;
fi
Mar 19 2018, 1:54 PM · User-Elukey, Operations, Services (doing), Cassandra, User-Eevans
Eevans added a comment to T189889: Excessive number of idle Cassandra connections.

Agreed that we have to upgrade the driver, but just hiding the metrics does not sound like a solution to me here. Given the amount of connections kept open, I think we should explore two possibilities:

  • increase the heart-beat interval (perhaps something like 120s would be enough)
Mar 19 2018, 1:49 PM · Services (next), RESTBase-Cassandra, RESTBase, Cassandra, User-Eevans

Mar 16 2018

Eevans updated the task description for T189889: Excessive number of idle Cassandra connections.
Mar 16 2018, 4:40 PM · Services (next), RESTBase-Cassandra, RESTBase, Cassandra, User-Eevans
Eevans updated the task description for T189889: Excessive number of idle Cassandra connections.
Mar 16 2018, 4:40 PM · Services (next), RESTBase-Cassandra, RESTBase, Cassandra, User-Eevans
Eevans triaged T189889: Excessive number of idle Cassandra connections as Low priority.
Mar 16 2018, 4:39 PM · Services (next), RESTBase-Cassandra, RESTBase, Cassandra, User-Eevans
Eevans created T189889: Excessive number of idle Cassandra connections.
Mar 16 2018, 4:39 PM · Services (next), RESTBase-Cassandra, RESTBase, Cassandra, User-Eevans

Mar 15 2018

Eevans triaged T189822: Replace 5 Samsung SSD 850 devices w/ 4 1.6T Intel or HP SSDs as Normal priority.
Mar 15 2018, 9:52 PM · ops-eqiad, Services (blocked), Operations, hardware-requests, Cassandra, User-Eevans

Mar 14 2018

Eevans updated the task description for T186751: Restablish RESTBase dev environment with Cassandra 3.11.2.
Mar 14 2018, 3:46 PM · Services (doing), User-Eevans

Mar 13 2018

Eevans updated the task description for T186562: Reimage JBO-RAID0 configured RESTBase HP machines.
Mar 13 2018, 6:54 PM · User-Eevans, Services (doing), Cassandra

Mar 12 2018

Eevans triaged T189529: Test/upload new cassandra 2.2.6 package (wmf3) as Normal priority.
Mar 12 2018, 9:38 PM · User-Elukey, Operations, Services (doing), Cassandra, User-Eevans
Eevans updated the language for P6840 Masterwork From Distant Lands from autodetect to diff.
Mar 12 2018, 9:35 PM
Eevans edited P6840 Masterwork From Distant Lands.
Mar 12 2018, 9:34 PM

Mar 9 2018

Eevans updated the task description for T189057: Understand (and if possible, improve) performance of new storage strategy.
Mar 9 2018, 9:56 PM · Services (doing), RESTBase-Cassandra, Cassandra, User-Eevans

Mar 7 2018

Eevans added a comment to T189057: Understand (and if possible, improve) performance of new storage strategy.

Another example:

Mar 7 2018, 6:56 PM · Services (doing), RESTBase-Cassandra, Cassandra, User-Eevans
Eevans edited P6817 session ID: 9899ec10-2198-11e8-b3f3-416aaf7799e3.
Mar 7 2018, 6:37 PM
Eevans created P6817 session ID: 9899ec10-2198-11e8-b3f3-416aaf7799e3.
Mar 7 2018, 6:26 PM
Eevans added a comment to T189057: Understand (and if possible, improve) performance of new storage strategy.

Some preliminary information from query trace examination:

Mar 7 2018, 3:33 PM · Services (doing), RESTBase-Cassandra, Cassandra, User-Eevans
Eevans created P6812 da1f0550-214b-11e8-b786-237a199d4b45.
Mar 7 2018, 2:09 PM
Eevans updated the title for P6811 45e329e0-216c-11e8-b786-237a199d4b45 from fa6d0170-219d-11e8-a28a-1d583f325411 to 45e329e0-216c-11e8-b786-237a199d4b45.
Mar 7 2018, 2:08 PM
Eevans created P6811 45e329e0-216c-11e8-b786-237a199d4b45.
Mar 7 2018, 2:07 PM
Eevans created P6810 fa6d0170-219d-11e8-a28a-1d583f325411.
Mar 7 2018, 2:05 PM
Eevans updated the language for P6809 sessions from shell to autodetect.
Mar 7 2018, 2:04 PM
Eevans updated the language for P6809 sessions from autodetect to shell.
Mar 7 2018, 2:03 PM
Eevans created P6809 sessions.
Mar 7 2018, 2:02 PM

Mar 6 2018

Eevans added a subtask for T183745: FY17/18 Q3 Program 7 Services Goal: Full migration to Cassandra 3: T189057: Understand (and if possible, improve) performance of new storage strategy.
Mar 6 2018, 10:53 PM · User-Eevans, RESTBase-Cassandra, RESTBase, Cassandra, Services (doing), Goal
Eevans added a parent task for T189057: Understand (and if possible, improve) performance of new storage strategy: T183745: FY17/18 Q3 Program 7 Services Goal: Full migration to Cassandra 3.
Mar 6 2018, 10:53 PM · Services (doing), RESTBase-Cassandra, Cassandra, User-Eevans
Eevans added a parent task for T186562: Reimage JBO-RAID0 configured RESTBase HP machines: T189057: Understand (and if possible, improve) performance of new storage strategy.
Mar 6 2018, 10:53 PM · User-Eevans, Services (doing), Cassandra
Eevans added a subtask for T189057: Understand (and if possible, improve) performance of new storage strategy: T186562: Reimage JBO-RAID0 configured RESTBase HP machines.
Mar 6 2018, 10:53 PM · Services (doing), RESTBase-Cassandra, Cassandra, User-Eevans
Eevans added projects to T189057: Understand (and if possible, improve) performance of new storage strategy: Services, User-Eevans, Cassandra, RESTBase-Cassandra.
Mar 6 2018, 10:52 PM · Services (doing), RESTBase-Cassandra, Cassandra, User-Eevans
Eevans triaged T189057: Understand (and if possible, improve) performance of new storage strategy as Normal priority.
Mar 6 2018, 10:52 PM · Services (doing), RESTBase-Cassandra, Cassandra, User-Eevans
Eevans updated the task description for T189057: Understand (and if possible, improve) performance of new storage strategy.
Mar 6 2018, 10:51 PM · Services (doing), RESTBase-Cassandra, Cassandra, User-Eevans
Eevans created T189057: Understand (and if possible, improve) performance of new storage strategy.
Mar 6 2018, 10:49 PM · Services (doing), RESTBase-Cassandra, Cassandra, User-Eevans
Eevans closed T178177: Investigate aberrant Cassandra columnfamily read latency of restbase101{0,2,4} as Resolved.

At this point, I think we've established that more reasonable performance is possible by configuring a JBOD in HBA mode, instead of as a collection of single-disk RAID0s. Many of the existing HP nodes in the RESTBase cluster have already been configured this way, and all that remains is to re-image the 9 nodes stood up prior to this conclusion. That work is being tracked in T186562: Reimage JBO-RAID0 configured RESTBase HP machines, so I'll close this issue. If anyone objects (for example, they're convinced that more can/should be done), then feel free to re-open.

Mar 6 2018, 10:29 PM · User-Eevans, Services (doing), Cassandra
Eevans closed T178177: Investigate aberrant Cassandra columnfamily read latency of restbase101{0,2,4}, a subtask of T183745: FY17/18 Q3 Program 7 Services Goal: Full migration to Cassandra 3, as Resolved.
Mar 6 2018, 10:29 PM · User-Eevans, RESTBase-Cassandra, RESTBase, Cassandra, Services (doing), Goal
Eevans added a comment to T185494: Degraded RAID on restbase-dev1006.

I removed sdc1, sdc2, and sdc3 from md0, md1, and md2 respectively, and rebooted believing that might be the easiest way to correct the device ordering (the new drive showed as sde). Instead, the machine didn't come back up (and I don't have console access).

Mar 6 2018, 8:52 PM · User-Eevans, ops-eqiad, Operations
Eevans added a comment to T185494: Degraded RAID on restbase-dev1006.

I removed sdc1, sdc2, and sdc3 from md0, md1, and md2 respectively, and rebooted believing that might be the easiest way to correct the device ordering (the new drive showed as sde). Instead, the machine didn't come back up (and I don't have console access).

Mar 6 2018, 8:50 PM · User-Eevans, ops-eqiad, Operations

Mar 2 2018

Eevans added a project to T92471: enable authenticated access to Cassandra JMX: User-Eevans.
Mar 2 2018, 5:57 PM · User-Eevans, Services (next), Cassandra, Operations, Patch-For-Review
Eevans added a comment to T92471: enable authenticated access to Cassandra JMX.

@Eevans @fgiunchedi is there something left to be done here?

Mar 2 2018, 5:56 PM · User-Eevans, Services (next), Cassandra, Operations, Patch-For-Review

Feb 26 2018

Eevans added a project to T188295: Improve multi-content-bucket design: User-Eevans.
Feb 26 2018, 7:57 PM · User-Eevans, RESTBase, Services (designing)

Feb 22 2018

Eevans added a parent task for T184795: Add the prometheus jmx agent to AQS Cassandra: T186567: Deprecate cassandra-metrics-collector?.
Feb 22 2018, 9:13 PM · Patch-For-Review, Analytics-Kanban, User-Elukey
Eevans added a subtask for T186567: Deprecate cassandra-metrics-collector?: T184795: Add the prometheus jmx agent to AQS Cassandra.
Feb 22 2018, 9:13 PM · Cassandra, Services (next), User-Eevans