- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Mar 7 2019
Mar 5 2019
For my part, I'm Good with either approach; I really like the idea of not having an exceptional deployment for session storage. However, one of the objectives here was isolation, and while I think I'm satisfied we're still achieving that, the latter of the two proposed strategies would seem to trade-away the total isolation we'd have if everything were on dedicated iron.
Mar 1 2019
Feb 28 2019
There appears to be some possibility that instead of running the Kask service stand-alone on the session storage cluster, we may run it inside a security-fenced k8s environment.
Feb 27 2019
Feb 26 2019
Feb 25 2019
Feb 13 2019
Feb 12 2019
Feb 11 2019
To summarize a (IRC) discussion w/ @Joe today:
Feb 8 2019
Feb 7 2019
Feb 6 2019
Kask now implements a logger that is a simple wrapper around syslog, w/ CEE compatible messages. A question has come up though about which log levels/severities to implement, and when/how they should be used.
Feb 5 2019
Feb 4 2019
Jan 20 2019
Jan 16 2019
Jan 15 2019
In T209110#4880850, @fgiunchedi wrote:My two cents WRT logging would be that messages should end up on syslog, either by way of syslog() (i.e. writing to /dev/log) or emitted to stdout/stderr, which then would be picked up by journald (or k8s/docker) and ultimately received by rsyslog and ingested into the logging pipeline.
For posterity sake, here are the Prometheus metrics provided by default:
Jan 14 2019
Jan 10 2019
In T212418#4870100, @Cmjohnson wrote:@Eevans I am going to have to power it back on and let it go for a few days to see if the error returns, will that present an issue for you?
In T212418#4869959, @Cmjohnson wrote:I need to move DIMM around and do standard troubleshooting. Is this server able to be powered off and down in icinga?
Jan 8 2019
We're currently in the process of force-removing these instances. We'll need to coordinate when the host comes back up, as we'll have to re-bootstrap all 3 instances.
Jan 7 2019
Jan 3 2019
Is there any status update, or ETA on this?
Dec 30 2018
Dec 21 2018
Dec 20 2018
Not long back, we were alarmed to see a very high rate of range-slice requests (a type of query our app does not perform). I wasn't able to find a ticket, but this turned out to be the driver using a SELECT * FROM ... on a small system table as a sort of heartbeat. The queries themselves were harmless, but the number of them was shocking until we realized that every Node worker had it's own connection pool, and that ${workers} * ${hosts} was a very large number.
Dec 19 2018
In T212129#4834329, @mark wrote:I am getting the impression here that some things are being rushed and finalized without time for a proper discussion between people/teams about the different possible solutions and their impact, after this new discovery. Is that because goals are due to be posted now?
Dec 18 2018
In T212129#4832455, @Joe wrote:[ ... ]
This needs a thorough discussion ASAP.
Dec 13 2018
In T206010#4820737, @kchapman wrote:TechCom has approved this, noting performance discussions are outside of the scope of this RFC.
Dec 12 2018
This is done!
In T211721#4818580, @Joe wrote:I was asking because looking at what's currently stored in the "service", I see both mwsession objects (that are created I guess by the user session), and objects that have the form $wiki:echo:(alert|seen|message) which seem to be created by... Flow?
There is a huge number of such objects, which is a problem in itself - we have 2M echo objects compared to just 16k session objects.
In T211721#4816810, @Joe wrote:I don't think we need to overthink this, but knowing what kind of latency increase we can expect might drive us to choices of implementation technologies different from the ones we currently picked.
So we need to understand two things:
- How many times per user request we access the session data. Right now we separate every tiny piece of it and so it might mean we make - say - 50 requests to the session redis per user request.
- How much time we spend performing those requests to redis collectively.
Any number we come up with as Service Level Objective needs to start from those two pieces of data. AIUI neither of those things is currently measured, too.