Page MenuHomePhabricator

Eevans (Eric Evans)
Senior Software Engineer

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Feb 27 2015, 10:47 PM (360 w, 5 d)
Availability
Available
IRC Nick
urandom
LDAP User
Eevans
MediaWiki User
Unknown

Recent Activity

Tue, Jan 25

Eevans updated subscribers of T299871: Degraded RAID on restbase2011.

Hi @Eevans - since the refresh for this host was just installed via T294377, are you ok if we ignore this alert and resolve the ticket? Thanks, Willy

Tue, Jan 25, 8:27 PM · SRE, ops-codfw

Mon, Jan 24

Eevans added a comment to T298516: Investigate high levels of garbage collection on new AQS nodes.

@BTullis: Regarding which node(s) to (re)pool (presumably tomorrow, Jan 25?)...

Mon, Jan 24, 7:43 PM · Analytics-Clusters, Platform Team Workboards (Platform Engineering Reliability), Data-Engineering-Kanban, Epic, Data-Engineering, Cassandra

Sat, Jan 22

Eevans added a comment to T299735: Pageviews integration testing.

See: Draft: Proof-of-concept Javascript integration tests

Sat, Jan 22, 12:36 AM · Data-Engineering, User-Eevans, Platform Engineering Roadmap

Fri, Jan 21

Eevans added a comment to T298516: Investigate high levels of garbage collection on new AQS nodes.

Scratch that previous theory...

Now I look at it more closely, the heap exhaustion seems to correlate much more closely with when aqs1010 was pooled and receiving queries.
The strange thing is that only one host was pooled, but all hosts exhibited the behaviour.

image.png (843×1 px, 689 KB)

My theory is that it is this bug which is affecting us: Memory leak in CompressedChunkReader

Fri, Jan 21, 5:18 PM · Analytics-Clusters, Platform Team Workboards (Platform Engineering Reliability), Data-Engineering-Kanban, Epic, Data-Engineering, Cassandra
Eevans updated the task description for T299729: Implement one golang AQS microservice.
Fri, Jan 21, 4:56 PM · Data-Engineering-Kanban, Data-Engineering
Eevans created T299735: Pageviews integration testing.
Fri, Jan 21, 1:20 AM · Data-Engineering, User-Eevans, Platform Engineering Roadmap
Eevans created T299734: Implement top-per-country endpoint of the pageviews API.
Fri, Jan 21, 1:14 AM · Data-Engineering, User-Eevans, Platform Engineering Roadmap
Eevans created T299733: Implement top-by-country endpoint of the pageviews API.
Fri, Jan 21, 1:12 AM · Data-Engineering, User-Eevans, Platform Engineering Roadmap
Eevans created T299732: Implement top endpoint of the pageviews API.
Fri, Jan 21, 1:10 AM · Data-Engineering, User-Eevans, Platform Engineering Roadmap
Eevans edited parent tasks for T299731: Implement aggregate endpoint of the pageviews API, added: T288296: AQS 2.0: Implement pageviews endpoints; removed: T263489: AQS 2.0.
Fri, Jan 21, 1:09 AM · User-Eevans, Code-Health-Objective, Platform Engineering Roadmap, Platform Team Initiatives (API Gateway), Analytics, Epic
Eevans removed a subtask for T263489: AQS 2.0: T299731: Implement aggregate endpoint of the pageviews API.
Fri, Jan 21, 1:09 AM · User-Eevans, Code-Health-Objective, Platform Engineering Roadmap, Platform Team Initiatives (API Gateway), Analytics, Epic
Eevans added a subtask for T288296: AQS 2.0: Implement pageviews endpoints: T299731: Implement aggregate endpoint of the pageviews API.
Fri, Jan 21, 1:09 AM · Data-Engineering, User-Eevans, Platform Engineering Roadmap
Eevans created T299731: Implement aggregate endpoint of the pageviews API.
Fri, Jan 21, 1:07 AM · User-Eevans, Code-Health-Objective, Platform Engineering Roadmap, Platform Team Initiatives (API Gateway), Analytics, Epic

Wed, Jan 19

Eevans added a comment to T298516: Investigate high levels of garbage collection on new AQS nodes.

By way of an update:

Wed, Jan 19, 5:13 PM · Analytics-Clusters, Platform Team Workboards (Platform Engineering Reliability), Data-Engineering-Kanban, Epic, Data-Engineering, Cassandra

Fri, Jan 14

Eevans added a comment to T298805: Import Debian package of Cassandra 3.11.11 as 'dev' version.

I added component/cassandradev for buster and stretch. For the import we can either pull them from the Apache Cassandra repository via Secure Apt (then we need to setup the PGP keys used to sign the repo) or alternatively given that you're part of upstream and these debs, maybe alternatively just send me the sha1 sums via mail and I'll doublecheck them before the import?

Fri, Jan 14, 4:41 PM · Data-Engineering, Platform Engineering, Generated Data Platform, SRE

Mon, Jan 10

Eevans added a comment to T298516: Investigate high levels of garbage collection on new AQS nodes.

I have created the heap dump file. I had to chown to the cassandra user, as even root couldn't connect to the running JVM.

root@aqs1014:~# sudo su - cassandra -s /bin/bash
cassandra@aqs1014:~$ jmap -dump:live,format=b,file=/srv/cassandra-b/tmp/aqs1014-b-dump202201071450.hprof 4468
Dumping heap to /srv/cassandra-b/tmp/aqs1014-b-dump202201071450.hprof ...
Heap dump file created

The file is 13 GB in size.

Mon, Jan 10, 8:38 PM · Analytics-Clusters, Platform Team Workboards (Platform Engineering Reliability), Data-Engineering-Kanban, Epic, Data-Engineering, Cassandra
Eevans added a comment to T298516: Investigate high levels of garbage collection on new AQS nodes.

[ ... ]
If anyone else has any ideas on how to get more useful information out of the dump file, that would be great.

Mon, Jan 10, 8:33 PM · Analytics-Clusters, Platform Team Workboards (Platform Engineering Reliability), Data-Engineering-Kanban, Epic, Data-Engineering, Cassandra

Fri, Jan 7

Eevans created T298805: Import Debian package of Cassandra 3.11.11 as 'dev' version.
Fri, Jan 7, 8:00 PM · Data-Engineering, Platform Engineering, Generated Data Platform, SRE
Eevans added a comment to T298516: Investigate high levels of garbage collection on new AQS nodes.

I have placed the resulting 2.6 GB bzip2 file at aqs1014.eqiad.wmnet:/home/btullis/aqs1014-b-dump202201071450.hprof.bz2 in case you'd like to take a copy off-host for analysis @Eevans.
I'm currently transferring it to my workstation for analysis. The original, uncompressed 13 GB file is also still in /srv/cassandra-b/tmp.

If this host goes into a period of sustained GC I will take another heap dump using a similar method.

Fri, Jan 7, 4:14 PM · Analytics-Clusters, Platform Team Workboards (Platform Engineering Reliability), Data-Engineering-Kanban, Epic, Data-Engineering, Cassandra

Thu, Jan 6

Eevans added a comment to T298516: Investigate high levels of garbage collection on new AQS nodes.

Could we give the upgrade a try to see if it resolves the memory leak, and if not only then proceed with the profiling?

Thu, Jan 6, 3:07 AM · Analytics-Clusters, Platform Team Workboards (Platform Engineering Reliability), Data-Engineering-Kanban, Epic, Data-Engineering, Cassandra

Wed, Jan 5

Eevans added a comment to T298516: Investigate high levels of garbage collection on new AQS nodes.

A heap dump is likely to be the best means of identifying what is holding up all of this memory. For this to be most effective, we'll want the heap from a JVM that is prominently exhibiting the problem, which means letting at least one node go for a minimum of one week, preferably two.

Wed, Jan 5, 9:43 PM · Analytics-Clusters, Platform Team Workboards (Platform Engineering Reliability), Data-Engineering-Kanban, Epic, Data-Engineering, Cassandra
Eevans added a comment to T298516: Investigate high levels of garbage collection on new AQS nodes.

Looking at the last 30days of heap usage for 1010-a...

Wed, Jan 5, 4:33 PM · Analytics-Clusters, Platform Team Workboards (Platform Engineering Reliability), Data-Engineering-Kanban, Epic, Data-Engineering, Cassandra

Dec 20 2021

Eevans added a comment to T293809: Define Capacity Management Process.

To summarize: We're projected to be at ~73% by the end of this fiscal year, and the ~15GB (~5GB * 3) of the Image Suggestions dataset is little more than a rounding error against this.

Dec 20 2021, 3:03 PM · Generated Data Platform

Dec 17 2021

Eevans added a comment to T297944: Set up regular-repairs for AQS cassandra cluster tables.

Repair is such a...complex, subject. So much so that I'm not sure how to do justice to all of the considerations in a phab comment. :/

Dec 17 2021, 7:33 PM · Generated Data Platform, Data-Engineering
Eevans added a comment to T281517: 📊[PLACEHOLDER] We should implement a data loader for Cassandra.

I assume we're looking at the latter, and will need to create database roles and corresponding credentials that match the job configuration passed to the loader, is this correct?

The loader take a user and password as parameters, and this allows both of the approaches you describe above (single or multi tenant). I think we should aim at the having a multi-tenant configuration, possibly per team owning datasets? This would prevent errors as you describe.
For the moment we use a single tenant (we were a single team loading data :) and I'm supportive of not doing multi-tenant for now as we're talking about only one dataset from a different team. I would like however that we ultimately try to implement proper multi-tenant configuration, and this would also entail changes on the loading side to provide correct credentials as secrets.

Dec 17 2021, 5:21 PM · Generated Data Platform, Image-Suggestion-API, Image-Suggestions
Eevans added a comment to T293809: Define Capacity Management Process.

Ok, now for the image suggestions use-case:

Dec 17 2021, 1:08 AM · Generated Data Platform

Dec 16 2021

Eevans added a comment to T281517: 📊[PLACEHOLDER] We should implement a data loader for Cassandra.

I'm wondering how to tie this into the notion of tenancy for backing stores (Cassandra, for the time being). For example: Will we have a single tenant (read: a unique user w/ credentials et al) for the data loading process, or will we have many (presumably, one for each dataset being loaded)? In a world with a platform that permits arbitrary teams to own scheduled jobs that persist output to a backing store -in a more-or-less- self-service fashion, we would want to ensure that an aberrant change or misconfiguration of one job, cannot inadvertently step on the data of another (which separate credentials & permissions would provide).

Dec 16 2021, 8:29 PM · Generated Data Platform, Image-Suggestion-API, Image-Suggestions

Dec 7 2021

Eevans renamed T293809: Define Capacity Management Process from Define Data Management Process to Define Capacity Management Process.
Dec 7 2021, 2:57 PM · Generated Data Platform

Dec 1 2021

Eevans edited P17914 queries.sh.
Dec 1 2021, 1:22 AM
Eevans created P17914 queries.sh.
Dec 1 2021, 1:20 AM

Nov 18 2021

Eevans updated the language for P17775 schema.cql from autodetect to sql.
Nov 18 2021, 5:47 PM
Eevans created P17775 schema.cql.
Nov 18 2021, 5:47 PM
Eevans edited P17774 query.sh.
Nov 18 2021, 5:34 PM
Eevans created P17774 query.sh.
Nov 18 2021, 5:31 PM

Nov 17 2021

Eevans added a comment to T295897: Automated application of grants for Cassandra.

To be clear: will these be applied on every Puppet run, or only when the file has changed?

Nov 17 2021, 8:00 PM · Patch-For-Review, Platform Team Workboards (Platform Engineering Reliability), Generated Data Platform, Cassandra

Nov 15 2021

Eevans updated the task description for T263489: AQS 2.0.
Nov 15 2021, 8:06 PM · User-Eevans, Code-Health-Objective, Platform Engineering Roadmap, Platform Team Initiatives (API Gateway), Analytics, Epic
Eevans added a comment to T280042: New database request: image_matching.

@gmodena @Eevans What should happen to this task now? Should it close?

Nov 15 2021, 2:51 PM · Generated Data Platform
Eevans added a comment to T243544: Cassandra PHP language driver packaging (Debian).

@Eevans @hnowlan Should this go in Data Platform's backlog... ?

Nov 15 2021, 2:50 PM · Generated Data Platform, User-Eevans

Nov 10 2021

Eevans updated the task description for T293808: Design Image Recommendations Schema.
Nov 10 2021, 3:01 PM · Generated Data Platform

Nov 8 2021

Eevans updated the task description for T293808: Design Image Recommendations Schema.
Nov 8 2021, 7:41 PM · Generated Data Platform
Eevans edited P17599 schema.cql.
Nov 8 2021, 7:39 PM

Nov 3 2021

Eevans added a comment to T293809: Define Capacity Management Process.

Hi folks - Sorry for the late answer I was off the past two days.
I did some work in sizing for AQS last year, you can read it here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/AQS/Scaling/2020/Cluster_Expansion
I think that ~5Tb per year for AQS is a good gentle overestimate.

Nov 3 2021, 9:01 PM · Generated Data Platform
Eevans added a comment to T293808: Design Image Recommendations Schema.

[ ... ]
For anyone who skipped to the bottom, the service can still support requesting suggestions by page title. But it will convert title to page_id outside the dataset, probably via the Action API. This carries with it a risk that pages may be renamed, so the page_title => page_id relationship may have changed after the dataset was generated. We'll document this consideration so callers are aware of this possibility.

Nov 3 2021, 6:05 PM · Generated Data Platform

Nov 2 2021

Eevans added a comment to T293808: Design Image Recommendations Schema.

[ ... ]
So, we have pages, almost any of which could have recommendations suggestions, even ones that already have images. Then we have (qualifying) pages that are unillustrated, for which IMA will attempt to generate suggestions for. And finally, we have those that it was successfully able to do so for. If we are saying that this data set models unillustrated pages, and any corresponding image suggestions IMA was able to make, then we're OKish (the wisdom of distinguishing between the latter by a non-nil suggestion, notwithstanding). If however we later make refinements to IMA, or add one or more additional suggestion algorithms, any of which that is able to make suggestions for already illustrated pages, then we'll have a data model unable to make that distinction. I'm willing to cross that bridge when we come to it if everyone else, I just wanted to point it out.

To restate what you said (hopefully fairly), we have:

  1. pages, almost any of which could have suggestions, even ones that already have images
  2. qualifying unillustrated pages for which IMA will attempt to generate suggestions
  3. pages IMA was able to generate suggestions for

I'm not sure I understand what distinction you're making between #1 and #2, so let's dig into that. The service doesn't know about pages in general, in an all-pages-on-a-wiki sense. It only knows about pages that are in the dataset provided to it. And the service doesn't know or care if these pages are unillustrated or if they already have images. I think of them as "under-illustrated" pages.

I guess what I'm not following is what issue arises with the data model if we includes pages that already have images. If the IMA decides that an existing page that already has one or more images needs more, what breaks?

Nov 2 2021, 12:01 AM · Generated Data Platform

Nov 1 2021

Eevans added a comment to T293808: Design Image Recommendations Schema.

[ ... ]

@Eevans , I share your discomfort with the current approach where some suggestions lack, well, suggestions. The empty image_id thing was always a bit hacky, and if we can do better as we move from an experimental prototype phase to something more resembling a real production service, I'm all in favor of it. But it does seem to me that "pages in need of an image" is a valuable and useful set of data regardless of whether we have Image Matching Algorithm suggestions for all those pages. And some clients have specifically requested pages from that broader set, so that that can use MediaSearch suggestions rather than Image Matching Algorithm suggestions.

Nov 1 2021, 9:57 PM · Generated Data Platform
Eevans added a comment to T293809: Define Capacity Management Process.

Starting with our extant use-case(s):

Nov 1 2021, 9:34 PM · Generated Data Platform
Eevans updated the task description for T293808: Design Image Recommendations Schema.
Nov 1 2021, 7:33 PM · Generated Data Platform
Eevans updated the task description for T293808: Design Image Recommendations Schema.
Nov 1 2021, 7:27 PM · Generated Data Platform
Eevans updated the task description for T293808: Design Image Recommendations Schema.
Nov 1 2021, 7:18 PM · Generated Data Platform
Eevans edited P17599 schema.cql.
Nov 1 2021, 7:12 PM
Eevans edited P17599 schema.cql.
Nov 1 2021, 7:10 PM
Eevans edited P17599 schema.cql.
Nov 1 2021, 7:06 PM
Eevans edited P17599 schema.cql.
Nov 1 2021, 7:00 PM
Eevans updated the task description for T293808: Design Image Recommendations Schema.
Nov 1 2021, 6:57 PM · Generated Data Platform
Eevans added a comment to T293808: Design Image Recommendations Schema.

@kostajh - what would happen if a page title changes between the image rec output and someone viewing the image rec then calling the API?

I should have clarified in my previous comment – we're only using page titles with the API for local development and beta wikis, where it's not (easily) possible to get the page ID in those wikis to match their production equivalents (e.g. page "Foo" on my local wiki has ID 1010 but on enwiki it is ID 9132808). So page title renames wouldn't really be a problem that anyone has to spend engineering effort on; if I want recommendations for page "Foo" in my local wiki and that's renamed to "FooBar" in production, then I'll just rename it in my local wiki.

Nov 1 2021, 6:47 PM · Generated Data Platform
Eevans added a comment to T294377: Q2:(Need By: TBD) rack/setup/install restbase202[456].codfw.wmnet.

@Eevans just for curiosity, any reason we have no restbase hosts in row A ?

Nov 1 2021, 6:19 PM · Platform Team Workboards (Platform Engineering Reliability), SRE, RESTBase, ops-codfw, DC-Ops

Oct 29 2021

Eevans updated subscribers of T293809: Define Capacity Management Process.

@JAllemandou do you have an prior art when it comes to AQS capacity planning, month-over-month growth, etc?

Oct 29 2021, 5:42 PM · Generated Data Platform

Oct 25 2021

Eevans updated the task description for T293808: Design Image Recommendations Schema.
Oct 25 2021, 8:12 PM · Generated Data Platform
Eevans created P17599 schema.cql.
Oct 25 2021, 7:44 PM
Eevans added a comment to T293808: Design Image Recommendations Schema.

I would propose the following:

Oct 25 2021, 4:58 PM · Generated Data Platform
Eevans added a comment to T293808: Design Image Recommendations Schema.

[...]

  • Judging by the enwiki TSV file, only ~8.6% of the rows have an image, is this correct? If so, are we expecting to store entries without an image (read: without a valid recommendation)?

That's correct, and use case specific. For the Structured Data PoC, the API team expected a dataset with

  1. a list of all unillustrated articles detected on a wiki.
  2. at most three candidate images that match an unillustrated article.

An empty image_ids denotes the case of "unillustrated article with no recommendations". This semantic was required (IIRC) to compare this dataset with an Elasticsearch (MediaSearch) result sets. See https://phabricator.wikimedia.org/T274798.

Oct 25 2021, 4:44 PM · Generated Data Platform

Oct 21 2021

Eevans edited P17570 imagerec.cql.
Oct 21 2021, 8:22 PM
Eevans edited P17571 imagerec_data.cql.
Oct 21 2021, 8:19 PM
Eevans edited P17570 imagerec.cql.
Oct 21 2021, 8:16 PM
Eevans edited P17571 imagerec_data.cql.
Oct 21 2021, 8:14 PM
Eevans edited P17570 imagerec.cql.
Oct 21 2021, 8:14 PM
Eevans added a comment to T293808: Design Image Recommendations Schema.

Based on this Superset query, the dataset seems to consist of the following:

Oct 21 2021, 8:12 PM · Generated Data Platform
Eevans added a comment to T293808: Design Image Recommendations Schema.

[ ... ]

Second question, please see this thread here. I asked the same question and Gabriele confirmed the client team explicitly asked for these to be stored.

Oct 21 2021, 8:02 PM · Generated Data Platform
Eevans added a comment to T293808: Design Image Recommendations Schema.

Based on the TSV files in imagerec_prod.tar.bz2, the dataset seems to consist of the following:

Oct 21 2021, 7:45 PM · Generated Data Platform
Eevans edited P17571 imagerec_data.cql.
Oct 21 2021, 5:00 PM
Eevans edited P17570 imagerec.cql.
Oct 21 2021, 4:54 PM
Eevans edited P17570 imagerec.cql.
Oct 21 2021, 4:53 PM
Eevans edited P17570 imagerec.cql.
Oct 21 2021, 4:43 PM
Eevans edited P17570 imagerec.cql.
Oct 21 2021, 4:43 PM
Eevans edited P17571 imagerec_data.cql.
Oct 21 2021, 4:37 PM
Eevans edited P17570 imagerec.cql.
Oct 21 2021, 4:37 PM
Eevans edited P17571 imagerec_data.cql.
Oct 21 2021, 4:03 PM
Eevans edited P17571 imagerec_data.cql.
Oct 21 2021, 3:56 PM
Eevans created P17571 imagerec_data.cql.
Oct 21 2021, 3:50 PM
Eevans edited P17570 imagerec.cql.
Oct 21 2021, 3:44 PM
Eevans edited P17570 imagerec.cql.
Oct 21 2021, 3:42 PM
Eevans created P17570 imagerec.cql.
Oct 21 2021, 3:41 PM

Oct 18 2021

Eevans added a comment to T235299: Cassandra cluster management support for multi-tenancy.

There has been a bit of out-of-band discussion between Hugh and I, and we should probably attempt to summarize that here:

Oct 18 2021, 8:08 PM · Patch-For-Review, Generated Data Platform, Platform Team Workboards (Platform Engineering Reliability), Platform Engineering (Icebox), Cassandra, User-Eevans
Eevans updated the task description for T235299: Cassandra cluster management support for multi-tenancy.
Oct 18 2021, 7:28 PM · Patch-For-Review, Generated Data Platform, Platform Team Workboards (Platform Engineering Reliability), Platform Engineering (Icebox), Cassandra, User-Eevans

Oct 14 2021

Eevans awarded T291738: Degraded RAID on sessionstore1003 a Cookie token.
Oct 14 2021, 9:17 PM · Platform Engineering, SRE

Oct 13 2021

Eevans reopened T291738: Degraded RAID on sessionstore1003 as "Open".

The arrays are still degraded (the device state is removed); I think there is still more yet to be done

Oct 13 2021, 9:05 PM · Platform Engineering, SRE
Eevans edited P17477 Overriding NotFound & MethodNotAllowed errors.
Oct 13 2021, 8:14 PM
Eevans created P17477 Overriding NotFound & MethodNotAllowed errors.
Oct 13 2021, 8:12 PM

Oct 12 2021

Eevans added a comment to T141541: Certs from cassandra-ca-manager should have the FQDN in cert's CN.

The attempted FQDN-use method appears to have failed - Cassandra claims there is an issue with the keystore format despite it being the same format/method as before:

INFO  [main] 2021-10-05 11:43:09,282 IndexSummaryManager.java:80 - Initializing index summary manager with a memory pool size of 614 MB and a resize interval of 60 minutes
ERROR [main] 2021-10-05 11:43:09,297 CassandraDaemon.java:749 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: Unable to create ssl socket
        at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:701) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:681) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:665) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:796) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
Caused by: java.io.IOException: Error creating the initializing the SSL Context
        at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:697) ~[apache-cassandra-3.11.4.jar:3.11.4]
        ... 8 common frames omitted
Caused by: java.io.IOException: Invalid keystore format
        at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:666) ~[na:1.8.0_302]
        at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:57) ~[na:1.8.0_302]
        at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224) ~[na:1.8.0_302]
        at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:71) ~[na:1.8.0_302]
        at java.security.KeyStore.load(KeyStore.java:1445) ~[na:1.8.0_302]
        at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:179) ~[apache-cassandra-3.11.4.jar:3.11.4]
        ... 10 common frames omitted

Could there be an issue with mismatching between hostname and FQDN? FQDN certificates still exist on the puppet master for investigation but keytool doesn't show many discrepancies between them despite the obvious change of CN. For now I am reverting.

Oct 12 2021, 4:47 PM · Platform Team Workboards (Platform Engineering Reliability), Platform Team Legacy (Later), Services (later), Cassandra

Oct 7 2021

Eevans created P17439 Command-Line Input.
Oct 7 2021, 7:29 PM
Eevans created P17438 Command-Line Input.
Oct 7 2021, 7:28 PM
Eevans committed rMSGO732b3426bf39: Merge Github project repo into gerrit:master (authored by Eevans).
Merge Github project repo into gerrit:master
Oct 7 2021, 4:25 PM
Eevans committed rMSGOa5a5fe3087e0: Relocate `go.mod`(s) to top-level directory (authored by Eevans).
Relocate `go.mod`(s) to top-level directory
Oct 7 2021, 4:25 PM
Eevans committed rMSGO887da072f743: middleware: New middleware for Prometheus metrics (authored by Eevans).
middleware: New middleware for Prometheus metrics
Oct 7 2021, 4:25 PM
Eevans committed rMSGO70117881c551: middleware: README.md formatting nit (authored by Eevans).
middleware: README.md formatting nit
Oct 7 2021, 4:25 PM
Eevans committed rMSGO758b98647f4b: Remove service type (for now) (authored by Eevans).
Remove service type (for now)
Oct 7 2021, 4:25 PM
Eevans committed rMSGO09cf3d79acb3: NewLogger: Use a string argument for log level (authored by Eevans).
NewLogger: Use a string argument for log level
Oct 7 2021, 4:25 PM
Eevans committed rMSGOaeb3184bc514: logger: `Logger#Log` method (authored by Eevans).
logger: `Logger#Log` method
Oct 7 2021, 4:25 PM
Eevans committed rMSGO60d06314e3a3: logger: README.md nit (authored by Eevans).
logger: README.md nit
Oct 7 2021, 4:25 PM
Eevans committed rMSGO8c4bc10fe438: Simplified request-scoped logging (authored by Eevans).
Simplified request-scoped logging
Oct 7 2021, 4:25 PM