Mon, Feb 17
Thu, Feb 13
Wed, Feb 12
First off: I have prototype code that supports UDP Echo and SSE, but not Kafka. It's not something that it's fully ready or tested yet. This has been developed over weekends/holidays etc., as a fun project -- and I can't promise I'll find spare time to add more stuff to it right now. Someone that can commit to it -staff or volunteer- should pick it up at some point and maybe also add Kafka in the process. We still have an open item and pending conversation on where ownership for the service itself lies.
The way this works now is that the entire MW fleet sends UDP packets to a specific IP (kraz) using the so-called "echo" protocol (= #channel<tab>message). We could theoretically switch this to a multicast address in order to get the ability of having multiple listeners (all connecting to separate IRC servers, each on each listener's localhost perhaps?), but noone has invested the time to do this and set up those multiple frontends.
Fri, Feb 7
Please file a procurement task for Willy/Rob to execute on :)
Thu, Jan 23
Correct. Also check the export templates (in the admin interface) for references to those fields.
(@Volans is not in Traffic), but regardles... judging from @BBlack comments before the flurry of Gerrit commits, it seems like I misunderstood where this lies. This is not blocked on Traffic, but with DC Ops. Reassigning to @RobH and apologies for the added confusion!
Traffic team, ping? This task has been open since August last year and as I was just saying on IRC, cp1008 is a constant outlier in all of our reports, projections, planning etc. Its purchase date is Jan 27th, 2011, 9 years ago almost to the day :)
Wed, Jan 22
Hey - this was a Q2 task but it hasn't seen an update in a while. What's the status?
Jan 20 2020
@ayounsi, what's the status here?
Jan 17 2020
Could we import into Netbox now, and then change & document the setup at our convenience? It feels like documenting the existing situation and changing it are orthogonal to each other - any reason to block one on the other?
What is the status of this?
I've seen this issue before, and if I recall correctly, it was an issue with the Python 3.4 backport. I think the latest backport for 3.4.10-1~stretch1 should fix it.
Jan 16 2020
I think increasing the availability and resilience of this service is an excellent idea! However, adding more servers to per site feels like a requirement, and a standard Pybal/IPVS setup sounds much more appropriate than anycast for this use case.
Jan 14 2020
Splitting the internal apt repository from the install roles/servers sounds good -- it's more of a historical artifact than anything else. You probably know this already but do note that the install server does not provide just TFTP, but also HTTP (and that is actually favored these days), so we would need to have a webserver running on the install servers.
Jan 13 2020
@Volans, out of curiosity, why was this required? Note that the concept of "rows" doesn't apply in this site, it's just two racks next to each other :)
This task is about preparing "Phame to support heavy traffic for a Tech Department blog", which is not the plan anymore. We should probably decline this task in favor of another more-generic task ("set up a tech department blog"). @Bmueller, @srodlund, thoughts?
Jan 10 2020
I've updated the aforementioned apt repository with 3.8.1-2~buster1 packages Someone in SRE that's more familiar with how we do things these days (maybe @MoritzMuehlenhoff?) can update our reprepro to include that.
Dec 21 2019
- The canonical location is nowadays https://people.debian.org/~paravoid/python-all/ (which I maintain on my free time). We (Wikimedia) probably should set up a reprepro import for that.
- The above repository has 3.8.0 beta4 for buster, I'll need to update that for a more recent version (currently looks like 3.8.1). I can do so soon-ish.
- That said, I don't have any intentions to backport 3.8 to stretch.
The owner field will have to stay with us for a little while longer (until the end of Q4). The other two ("Support until" and "Support contract") can be dropped at our earliest convenience. Adjustments need to be made in at least the export templates and maybe even reports. @Volans and/or @crusnov, that's now over to you. (Hopefully the backups work in case we later realize it's a mistake)
Dec 16 2019
Thanks @ayounsi! Appreciate the follow up. What exactly did you ask them to do in this last communication?
Dec 13 2019
Note that R440s comprise 23.5% of the whole fleet, 84.1% of all servers purchased in the last 12 months, and 67.5% of all servers purchased in the last 24 months (I wish I had a graph!). Given this sample size, this may be just correlated to R440s and not specifically tied to them.
Thanks @Krinkle, very much appreciate all this! I have code from a couple of weeks ago that basically implements all this: consuming from SSE and formatting into IRC logging messages, but by using log_action_comment. It needs some more polishing and repository creation etc. I'll add you as code reviewer once I find some time to work on something better than Gist; hopefully during the end of year holidays.
Dec 11 2019
Dec 6 2019
Dec 5 2019
Dec 3 2019
Dec 2 2019
Nov 28 2019
The nl-ams-as14907 anchor is now fully online and has ID #6671.
Nov 26 2019
I think conceptually this belongs together with EventStreams, as a product offering and, by extension, to the same owners and maintainers. This is just another (non-HTTP) API for streaming events, like RCStream was, and its fate and evolution should be viewed together as a whole. For example, a valid product decision -now or in the future- may be "we'll sunset this by date X, and we recommend users to migrate to Y".
Nov 25 2019
Some of these were not done - I suspect partially because my ranges were misparsed as individual items (should had made that clearer, apologies!). The following are still missing asset tags:
I went digging in RT and fixed it for all of them except the old/unracked/offline sdtpa PDUs.
@mark swapped the optic with a new one and the link is now reenabled. This is being monitored for another 24-36h and will be resolved then.
The Anchor is now installed, connected to the SCS, and we see a getty on serial with the right hostname. It's also now responsive to IPv4 pings but not IPv6 (which matches our previous experiene with regards to the initial install).
What's the status of this task?
Nov 24 2019
Nov 21 2019
/debian/ is for the official Debian mirror -- and note that we are part of the ftp.us.debian.org/http.us.debian.org rotation. So no, we should not pollute that namespace (and I don't think that would work anyway, ftpsync would just delete it all in the next sync).
I believe there has been progress here since the last update. @crusnov what's the latest?
I'll decline, on the basis that this will be converted to use SSO soon-ish, and there's no point in going over two migrations :)
We don't have specific criteria, it's on a case-by-case basis. This particular one sounds fine to me, let's do it! :)
Update: given the upcoming follow-up visit to esams next week, I requested a new image from RIPE. I got it today, and it can be found in the same place, as "anchor.nl-ams-as14907-v2.img".
Nov 20 2019
This is an excerpt of the backlog overnight:
01:17 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 02:31 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 02:43 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 03:00 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 03:23 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 04:14 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 04:48 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 05:16 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 06:19 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 06:42 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 07:33 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 09:56 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence 11:11 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
Nov 19 2019
What's the latest here? Please keep the task updated :)
Nov 15 2019
Now that the PDU migration in eqiad has been completed, all that's left in this task is to record and document the modles for:
- eqiad's row D (rows A/B as well as C are all documented now)
It looks like there are proposed patches for this, so perhaps we're not too far off? This ties to an exploration we're doing with a vendor so it's relatively time-sensitive. Thanks a lot!
Nov 5 2019
Thanks @herron! Should we resolve this?
Nov 4 2019
Nov 1 2019
Anycasting NTP sounds a good idea in general, but a) should be kept in a separate task b) it doesn't sound like a priority IMHO at this time. Things work OK, and that sounds like a time investment that won't pay off right now.
Oct 31 2019
Indeed, and in fact procurement task alone would be enough to identify the batch. Is that what you were looking for @MoritzMuehlenhoff? How could we make this more visible?
@BBlack, what were your plans here? Can others in SRE help with some of that perhaps?
@RobH what's the status of this?