Page MenuHomePhabricator

faidon (Faidon Liambotis)
SRE

Projects (13)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 10:21 AM (299 w, 6 d)
Availability
Available
IRC Nick
paravoid
LDAP User
Faidon Liambotis
MediaWiki User
Faidon Liambotis (WMF) [ Global Accounts ]

Recent Activity

Thu, Jul 2

faidon added a comment to T254332: Add more dimensions in the netflow/pmacct/Druid pipeline.

So - how do we make progress here? Any thoughts on who/how? :) Some of these features could really make a tremendous amount of difference to our network operations and future planning, so I'm super excited about seeing these into fruition!

Thu, Jul 2, 5:23 PM · Analytics, Operations, netops
faidon updated the task description for T254332: Add more dimensions in the netflow/pmacct/Druid pipeline.
Thu, Jul 2, 5:22 PM · Analytics, Operations, netops

Wed, Jul 1

faidon added a comment to T252577: Maxmind data update issues for DNS (and others?).

I was bitten by this again today - ping!

Wed, Jul 1, 5:29 PM · Operations, Traffic

Fri, Jun 26

faidon triaged T256498: Return asw-c8-codfw to spares as Low priority.
Fri, Jun 26, 6:07 PM · ops-codfw, Operations

Thu, Jun 25

faidon added a comment to T254332: Add more dimensions in the netflow/pmacct/Druid pipeline.

To add to the above, I'm also wondering how difficult it would be to also include AS *names*, e.g. coming from the MaxMind GeoIP ASN database. I think we've used that database before, maybe for pageview data? Could we perhaps use Druid lookups for this to avoid adding another (identical) dimension to the data set?

Thu, Jun 25, 12:09 AM · Analytics, Operations, netops

Wed, Jun 24

faidon closed T219486: Send peering requests to AS with the worst TTFB as Resolved.

I took a look at that list above. It's really not very actionable -- most of these are very large networks that have a restrictive settlement-free peering policy. For the few that remain, we have either established peerings already or have sent unanswered peering requests, which mostly means that they are not actively peering or we are too small for them to care about.

Wed, Jun 24, 11:30 PM · Traffic, Performance-Team, Operations
faidon updated subscribers of T254332: Add more dimensions in the netflow/pmacct/Druid pipeline.
Wed, Jun 24, 10:15 PM · Analytics, Operations, netops

Thu, Jun 18

faidon updated the task description for T245161: Track down and replace very old HW.
Thu, Jun 18, 10:31 AM · DC-Ops
faidon updated the task description for T245161: Track down and replace very old HW.
Thu, Jun 18, 10:25 AM · DC-Ops

Thu, Jun 11

faidon added a comment to T254818: Requesting access to PROD for lmata (SRE).

Approved.

Thu, Jun 11, 10:53 AM · Operations, SRE-Access-Requests

Jun 4 2020

faidon added a comment to T251536: Peer with SFMIX at ulsfo (May 2020).

This is now set up on SFMIX's end and up:

On your side please plumb 206.197.187.82/24 and 2001:504:30::ba01:4907:1/64. Usual sane BGP peering rules apply - no broadcast traffic (DHCP, CDP, etc), see https://sfmix.org/connect/guide.

We request at least one required BGP session (to our looking glass) and optional sessions for the route servers
The looking glass is AS12276 at 206.197.187.1 and 2001:504:30::ba01:2276:1. You should announce all your routes to the looking glass, but expect no routes to be announced to you.

We'll push out configs to support these peers this evening.

Jun 4 2020, 7:53 AM · netops, Operations

Jun 3 2020

faidon created T254332: Add more dimensions in the netflow/pmacct/Druid pipeline.
Jun 3 2020, 9:40 AM · Analytics, Operations, netops

May 19 2020

faidon added a comment to T225121: (Need by: 2019-09-30) upgrade msw1-eqiad from EX4200 to EX4300.

Are there any updates to this task and any particular reasons it's been held up? While this was never super urgent, we're now at the ~one year mark since this was ordered and delivered to the data center. Plus I think because at the time the upgrade was imminent, we only bought support for the new switch and not the old, so we're operating with unsupported HW right now. It'd be great if this were to be completed soon. Thanks!

May 19 2020, 9:22 AM · netops, Operations, ops-eqiad

May 15 2020

faidon added a comment to T247881: Three ports on asw2-d-eqiad are not working as expected.

If three ports are permanently failed, I'm not sure how we could ever trust that switch again. Perhaps it's better to do a painful but planned replacement rather than have it fail at some inconvenient time and having to rush a replacement then?

May 15 2020, 12:16 PM · ops-eqiad, Operations, netops

May 12 2020

faidon added a comment to T252577: Maxmind data update issues for DNS (and others?).

I know that historically MaxMind has claimed they update the data roughly on a weekly basis, and maybe in this case it was a normal weekly update and we're just misaligned with their weeks? In any case, the current geoipdate seems to be smart enough to checksum the existing databases and not re-download pointless duplicates, so we could probably run it more often on the puppetmasters.

May 12 2020, 6:45 PM · Operations, Traffic

May 8 2020

faidon added a subtask for T251536: Peer with SFMIX at ulsfo (May 2020): Unknown Object (Task).
May 8 2020, 12:10 PM · netops, Operations
faidon removed a subtask for T251536: Peer with SFMIX at ulsfo (May 2020): Unknown Object (Task).
May 8 2020, 12:10 PM · netops, Operations
faidon added a comment to T251536: Peer with SFMIX at ulsfo (May 2020).

LoA received and cross-connect task created.

May 8 2020, 12:10 PM · netops, Operations
faidon renamed T251536: Peer with SFMIX at ulsfo (May 2020) from Peer with SFMIX at ulsfo to Peer with SFMIX at ulsfo (May 2020).
May 8 2020, 12:09 PM · netops, Operations
faidon added a subtask for T251536: Peer with SFMIX at ulsfo (May 2020): Unknown Object (Task).
May 8 2020, 12:09 PM · netops, Operations

Apr 30 2020

faidon added a comment to T251536: Peer with SFMIX at ulsfo (May 2020).

I just submitted their form.

Apr 30 2020, 4:00 PM · netops, Operations
faidon triaged T251536: Peer with SFMIX at ulsfo (May 2020) as Medium priority.
Apr 30 2020, 3:42 PM · netops, Operations

Apr 27 2020

faidon added a comment to T200277: OSPF metrics.

Interesting idea! Couple of notes:

  • What do you mean by "virtual links" and Netbox not supporting them? Is that VLANs for our transports over the PtMP VPLS?
  • What do you envision the difference to be between "primary" and "preferred"? (I know you said TBD, but curious :)
  • It'd be interesting to see how this would look like before we start adding the fields. That may help us figure out what the right values for those fields may be. Would it make sense to list our links in a Phaste or spreadsheet or something and figure out if the output makes sense?
Apr 27 2020, 11:21 AM · Operations, netops

Apr 14 2020

faidon closed T212878: Netbox racks consistency report as Declined.

I think the original intention of this will be addressed by periodic audits that we'll eventually do. I'll decline this for the reasons I mentioned above, but if anyone feels strongly about this, feel free to reopen :)

Apr 14 2020, 7:01 PM · netbox, Operations
faidon updated subscribers of T249916: access request on cumin[1-2]001 for John Clark.

So breaking down the (very reasonable!) ask, I think there are afew different things at play here:

  • Access to iDRAC/iLO so that John can e.g. look at HW status and get reports that vendors ask for. This in turn requires:
    • Access to the password store. There is already a "dcops" group with the right access, so we can have John added there. Should be simple, as far as I can tell.
    • Access to the mgmt IP network remotely. Right now that's firewalled to the cumin hosts, access to which ties to a bigger project (see below). However, that's perhaps an unnecessary dependency and maybe we can easily work around that (e.g. with a separate bastion for mgmt?). @MoritzMuehlenhoff, @jbond any thoughts here?
  • Access to execute cumin cookbooks, like reimaging. That right now is tied to global root, which is a privilege that we can't easily grant. Fixing that limitation has been on our radar, including the PoC work that was part of our Q3 OKRs (T244840). It's definitely not there yet and it's going to take a few months to fully materialize, unfortunately.
Apr 14 2020, 6:04 PM · Operations, SRE-Access-Requests, DC-Ops
faidon renamed T250136: Homer: manage transit BGP sessions from Homr: manage transit BGP sessions to Homer: manage transit BGP sessions.
Apr 14 2020, 9:27 AM · netops, Operations

Apr 13 2020

faidon added a comment to T166368: Wipe of spare/replacement disks.

If I understand it correctly, this task is specifically about a box that was returned to the spare pool and then was reallocated for a new purpose but kept its old data. We should definitely wipe in those cases. I think that has been standard practice in the past, but perhaps not well-documented or applied uniformly? I'm not sure, something to dig in more for sure :)

Apr 13 2020, 5:16 PM · DC-Ops, Operations
faidon added a project to T250053: Netbox report accounting icinga alert: DC-Ops.
Apr 13 2020, 9:55 AM · ops-eqiad, DC-Ops, Operations
faidon added a project to T250054: Netbox report coherence_rack Icinga alert: DC-Ops.
Apr 13 2020, 9:55 AM · DC-Ops, ops-ulsfo, Operations, ops-eqiad

Apr 11 2020

faidon added a comment to T203003: Keyholder phab repo duplicate work.

The master branch of operations/software/keyholder is not ready for a release at this time, so please don't tag, package or deploy this at this state. There are a bunch of pending changes in Gerrit for about a year, plus more that I've queued up locally (because it's hard to manage dozens of dependent git commits with Gerrit…). If y'all are willing to review these I can clean them up and prepare a release; if not, then I can pick this up and make some progress. Let me know!

Apr 11 2020, 7:10 AM · Release-Engineering-Team (Deployment services), Release-Engineering-Team-TODO, Keyholder, Operations

Apr 8 2020

faidon renamed T249653: Netbox: restore two deleted entries from backups from restore two deleted entries to Netbox: restore two deleted entries from backups.
Apr 8 2020, 8:53 AM · netbox

Apr 3 2020

faidon added a comment to T235886: IRR updates needed.

We found that the prefixes 185.15.56.0/22 and 2a02:ec80::/29 are in use but not documented in the RIPE Database as assignments.

After discussing it with John, the deeper issue might be that they are "ALLOCATED PA" while they should be "ASSIGNED PI".

Apr 3 2020, 11:54 PM · Operations, netops

Apr 2 2020

faidon added a comment to T238305: Servers freezing across the caching cluster.

Ah! That's awesome to hear. May I suggest to resolve this (and the associated "upgrade firmware"?) task then, and reopen if we have another one of these?

Apr 2 2020, 7:35 PM · Traffic, Operations

Apr 1 2020

faidon removed a subtask for T243167: Upgrade BIOS and IDRAC firmware on R440 cp systems: T244127: cp3057 crash (was: network down).
Apr 1 2020, 9:41 PM · DC-Ops, Traffic, Operations, ops-esams
faidon removed a parent task for T244127: cp3057 crash (was: network down): T243167: Upgrade BIOS and IDRAC firmware on R440 cp systems.
Apr 1 2020, 9:41 PM · ops-esams, Operations, Traffic
faidon added a subtask for T238305: Servers freezing across the caching cluster: T244127: cp3057 crash (was: network down).
Apr 1 2020, 9:40 PM · Traffic, Operations
faidon added a parent task for T244127: cp3057 crash (was: network down): T238305: Servers freezing across the caching cluster.
Apr 1 2020, 9:40 PM · ops-esams, Operations, Traffic
faidon added a subtask for T238305: Servers freezing across the caching cluster: T243167: Upgrade BIOS and IDRAC firmware on R440 cp systems.
Apr 1 2020, 9:40 PM · Traffic, Operations
faidon added a parent task for T243167: Upgrade BIOS and IDRAC firmware on R440 cp systems: T238305: Servers freezing across the caching cluster.
Apr 1 2020, 9:40 PM · DC-Ops, Traffic, Operations, ops-esams
faidon added a comment to T238305: Servers freezing across the caching cluster.

What's the latest here? I haven't heard about these crashes lately but it may just be that I missed it. Do we know more about this now?

Apr 1 2020, 9:40 PM · Traffic, Operations
faidon merged T241306: cp3051 crashed into T238305: Servers freezing across the caching cluster.
Apr 1 2020, 9:32 PM · Traffic, Operations
faidon merged task T241306: cp3051 crashed into T238305: Servers freezing across the caching cluster.
Apr 1 2020, 9:32 PM · Traffic, Operations
faidon merged T240425: cp3055 crashed into T238305: Servers freezing across the caching cluster.
Apr 1 2020, 9:32 PM · Traffic, Operations
faidon merged task T240425: cp3055 crashed into T238305: Servers freezing across the caching cluster.
Apr 1 2020, 9:32 PM · Traffic, Operations
faidon merged T244127: cp3057 crash (was: network down) into T238305: Servers freezing across the caching cluster.
Apr 1 2020, 9:32 PM · Traffic, Operations
faidon merged task T244127: cp3057 crash (was: network down) into T238305: Servers freezing across the caching cluster.
Apr 1 2020, 9:32 PM · ops-esams, Operations, Traffic

Mar 27 2020

faidon updated the task description for T245161: Track down and replace very old HW.
Mar 27 2020, 6:45 PM · DC-Ops
faidon reassigned T237466: Remove unused custom fields from Netbox from crusnov to wiki_willy.

@wiki_willy is finalizing the end of our leasing agreement. Once that's done, we'd be the "owner" of all of those assets, and thus we can remove the "owner" field from Netbox. Reassigning to Willy to let us know when that's done :)

Mar 27 2020, 4:04 PM · SRE-tools, DC-Ops, netbox

Mar 26 2020

Krinkle awarded T245161: Track down and replace very old HW a Burninate token.
Mar 26 2020, 9:23 PM · DC-Ops

Mar 19 2020

faidon added a comment to T213843: Juniper network device audit - all sites.

Ok! From https://wikitech.wikimedia.org/wiki/Server_Lifecycle#States I thought that if a device was not in netbox it was not in our possession anymore.

Mar 19 2020, 1:55 PM · DC-Ops, netops, Operations

Mar 18 2020

faidon reopened T245606: CloudVPS: enable BGP in the neutron transport network as "Open".

Reopening this per IRC, and given this is a prod/WMCS task affecting prod in major ways.

Mar 18 2020, 2:58 PM · netops, cloud-services-team (Kanban), Operations
faidon reopened T245606: CloudVPS: enable BGP in the neutron transport network, a subtask of T244727: CloudVPS: networking improvements, as Open.
Mar 18 2020, 2:58 PM · cloud-services-team (Kanban), Epic

Mar 17 2020

faidon added a comment to T245161: Track down and replace very old HW.

I just discovered that cloudmetrics1001 is old (2015) and need replacement https://netbox.wikimedia.org/dcim/devices/182/

Mar 17 2020, 2:20 PM · DC-Ops

Mar 15 2020

faidon added a comment to T247646: migrate racktables to a buster VM (was: decom racktables?).

Good question!

Mar 15 2020, 4:37 PM · Operations

Mar 12 2020

faidon added a comment to T247245: Test Performance of Marian NMT translation in stat cluster.

Oh, that sounds perfect, let's do that :) We should also try with a build with the right make flags etc. (something like TARGET=SKYLAKEX like the FAQ says). Thanks all!

Mar 12 2020, 11:40 AM · Language-Team (Language-2020-Focus-Sprint), ContentTranslation, Analytics

Mar 11 2020

faidon added a comment to T247245: Test Performance of Marian NMT translation in stat cluster.

OK, so to recap, I read two concerns:

Mar 11 2020, 7:06 PM · Language-Team (Language-2020-Focus-Sprint), ContentTranslation, Analytics

Mar 6 2020

faidon added a comment to T246564: Netbox has incorrect email address for GTT.

We have one global account, migrated from a previous system. I wasn't able to find how to create individual accounts, so that will do I guess :)

Mar 6 2020, 12:01 AM · netops, Operations

Mar 3 2020

RobH awarded T239244: Netbox report check for no position set in rack a Like token.
Mar 3 2020, 5:22 PM · netbox, Operations

Feb 20 2020

faidon added a comment to T244849: Add SSO support to netbox.

On a practical level we already maintain a fork, so if any changes are needed they can be integrated into our fork (we should wait until the post-upgrade ~this week though).

Feb 20 2020, 5:47 PM · netbox, Operations
faidon added a comment to T245711: Add tenant for Cloud Services?.

WMCS hosts are in the production VLANs, managed by the production puppet etc. Practically speaking, we use tenants to exclude fr-tech/OIT/RIPE hosts from reports (that e.g. alert if an active host is not present in PuppetDB or vice-versa), and will likely also use it to exclude them from the in-progress IP assignment/bootstrapping work. If we were to assign a tenant to those hosts, we'd have to special-case it pretty much everywhere to treat it like the "production" tenant (which is now the "null" tenant).

Feb 20 2020, 5:37 PM · cloud-services-team (Kanban), netbox
faidon reassigned T214024: Two test hosts for SREs from faidon to RobH.

OK, it sounds like @akosiaris and @MoritzMuehlenhoff have coordinated with each other and they can share those two hosts as SRE test hosts.

Feb 20 2020, 5:00 PM · Operations, hardware-requests
faidon added a comment to T229710: read-only user netbox permissions regression.

I'm not sure that @faidon actually intended such narrow restrictions when we resolved T208267: Requesting access to netbox for bd808.

Feb 20 2020, 11:59 AM · netbox

Feb 17 2020

faidon added a comment to T245161: Track down and replace very old HW.

Regarding dbproxy1001, which is not on that list (but I guess it should be), it is still pending waiting for on-site steps: T244463: Decommission dbproxy1001.eqiad.wmnet
Other dbproxy current status:
dbproxy1004 and dbproxy1009 are fully decommissioned: T228768
dbproxy1005 is fully decommissioned: T231967
dbproxy1006 is waiting for on-site steps: T233207

Feb 17 2020, 1:11 PM · DC-Ops

Feb 13 2020

faidon triaged T245161: Track down and replace very old HW as High priority.
Feb 13 2020, 3:36 PM · DC-Ops
faidon updated subscribers of T146455: Decommission labsdb1002.

@Jclark-ctr @wiki_willy what's the status here? It sounds like a decom that was only partial and that only needs a few more steps to finalize perhaps?

Feb 13 2020, 2:30 PM · hardware-requests, Patch-For-Review, ops-eqiad, Operations

Feb 12 2020

faidon added a comment to T234234: Port architecture of irc-recentchanges to Kafka.

First off: I have prototype code that supports UDP Echo and SSE, but not Kafka. It's not something that it's fully ready or tested yet. This has been developed over weekends/holidays etc., as a fun project -- and I can't promise I'll find spare time to add more stuff to it right now. Someone that can commit to it -staff or volunteer- should pick it up at some point and maybe also add Kafka in the process. We still have an open item and pending conversation on where ownership for the service itself lies.

Feb 12 2020, 12:39 AM · Patch-For-Review, User-Elukey, Analytics
faidon added a comment to T244719: Create a replacement for kraz.wikimedia.org.

The way this works now is that the entire MW fleet sends UDP packets to a specific IP (kraz) using the so-called "echo" protocol (= #channel<tab>message). We could theoretically switch this to a multicast address in order to get the ability of having multiple listeners (all connecting to separate IRC servers, each on each listener's localhost perhaps?), but noone has invested the time to do this and set up those multiple frontends.

Feb 12 2020, 12:34 AM · serviceops, Operations, vm-requests, User-Elukey, Analytics

Feb 7 2020

faidon added a comment to T244497: cr3-knams:xe-0/1/3 down.

Please file a procurement task for Willy/Rob to execute on :)

Feb 7 2020, 1:42 PM · netops, Operations

Jan 23 2020

faidon added a comment to T237466: Remove unused custom fields from Netbox.

Correct. Also check the export templates (in the admin interface) for references to those fields.

Jan 23 2020, 5:42 PM · SRE-tools, DC-Ops, netbox
faidon assigned T229586: decommission cp1008, cp1071, cp1072, cp1073, cp1074, cp1099 to RobH.

(@Volans is not in Traffic), but regardles... judging from @BBlack comments before the flurry of Gerrit commits, it seems like I misunderstood where this lies. This is not blocked on Traffic, but with DC Ops. Reassigning to @RobH and apologies for the added confusion!

Jan 23 2020, 5:27 PM · ops-eqiad, decommission-hardware, Operations
faidon added a comment to T229586: decommission cp1008, cp1071, cp1072, cp1073, cp1074, cp1099.

Traffic team, ping? This task has been open since August last year and as I was just saying on IRC, cp1008 is a constant outlier in all of our reports, projections, planning etc. Its purchase date is Jan 27th, 2011, 9 years ago almost to the day :)

Jan 23 2020, 12:59 PM · ops-eqiad, decommission-hardware, Operations

Jan 22 2020

faidon added a comment to T242250: rack/setup/install ps[12]-60[34]-eqsin.

Hey - this was a Q2 task but it hasn't seen an update in a while. What's the status?

Jan 22 2020, 3:44 PM · Operations, ops-eqsin
faidon added a comment to T243288: Retire the Tor relay.

To your last point: the WMCS Terms of Use explicitly lists "network proxy" in the "prohibited activities" section -and even names Tor specifically as the first example of such an activity- so running a node in a Cloud VPS is not an option here. This policy has been there since the inception of the Labs/WMCS ToS, and while I can't speak to the rationale behind it, I can say that prohibiting remains a good idea today: running proxies, whether in WMCS or in the production realm can be a messy business, and one that we don't have the capacity to support as an org.

Jan 22 2020, 10:35 AM · Tor, Operations

Jan 20 2020

faidon added a comment to T213843: Juniper network device audit - all sites.

@ayounsi, what's the status here?

Jan 20 2020, 10:39 AM · DC-Ops, netops, Operations

Jan 17 2020

faidon added a comment to T184066: rack/setup/install ps[12]-oe1[456]-esams.

Could we import into Netbox now, and then change & document the setup at our convenience? It feels like documenting the existing situation and changing it are orthogonal to each other - any reason to block one on the other?

Jan 17 2020, 5:09 PM · Operations, ops-esams
faidon added a comment to T184066: rack/setup/install ps[12]-oe1[456]-esams.

What is the status of this?

Jan 17 2020, 12:12 PM · Operations, ops-esams
faidon added a comment to T237466: Remove unused custom fields from Netbox.

Owner was due to the leasing situation (as it lists Farnam as the only option), so I defer to @faidon's call on when that can go.

Jan 17 2020, 9:48 AM · SRE-tools, DC-Ops, netbox
faidon added a comment to T243048: python3.4 broken on deployment-logstash2.

I've seen this issue before, and if I recall correctly, it was an issue with the Python 3.4 backport. I think the latest backport for 3.4.10-1~stretch1 should fix it.

Jan 17 2020, 9:46 AM · Operations, Beta-Cluster-Infrastructure

Jan 16 2020

faidon added a comment to T242715: Webproxies are a SPOF.

I think increasing the availability and resilience of this service is an excellent idea! However, adding more servers to per site feels like a requirement, and a standard Pybal/IPVS setup sounds much more appropriate than anycast for this use case.

Jan 16 2020, 12:18 PM · Operations

Jan 14 2020

faidon added a comment to T242602: Sort out plan for install* servers in edge sites.

Splitting the internal apt repository from the install roles/servers sounds good -- it's more of a historical artifact than anything else. You probably know this already but do note that the install server does not provide just TFTP, but also HTTP (and that is actually favored these days), so we would need to have a webserver running on the install servers.

Jan 14 2020, 5:48 AM · Patch-For-Review, Operations

Jan 13 2020

faidon added a comment to T242412: ulsfo doesn't have any rack group set in Netbox.

@Volans, out of curiosity, why was this required? Note that the concept of "rows" doesn't apply in this site, it's just two racks next to each other :)

Jan 13 2020, 8:54 AM · DC-Ops, netbox
faidon added a comment to T226044: Prepare Phame to support heavy traffic for a Tech Department blog.

This task is about preparing "Phame to support heavy traffic for a Tech Department blog", which is not the plan anymore. We should probably decline this task in favor of another more-generic task ("set up a tech department blog"). @Bmueller, @srodlund, thoughts?

Jan 13 2020, 8:09 AM · Release-Engineering-Team-TODO, Release-Engineering-Team (Development services), Operations, Traffic, Phabricator

Jan 10 2020

faidon added a project to T241195: Add python3.8 to buster-wikimedia pyall component: Operations.
Jan 10 2020, 5:41 PM · Operations, Continuous-Integration-Infrastructure
faidon updated subscribers of T241195: Add python3.8 to buster-wikimedia pyall component.

I've updated the aforementioned apt repository with 3.8.1-2~buster1 packages Someone in SRE that's more familiar with how we do things these days (maybe @MoritzMuehlenhoff?) can update our reprepro to include that.

Jan 10 2020, 5:40 PM · Operations, Continuous-Integration-Infrastructure

Dec 21 2019

faidon added a comment to T241195: Add python3.8 to buster-wikimedia pyall component.
  • The canonical location is nowadays https://people.debian.org/~paravoid/python-all/ (which I maintain on my free time). We (Wikimedia) probably should set up a reprepro import for that.
  • The above repository has 3.8.0 beta4 for buster, I'll need to update that for a more recent version (currently looks like 3.8.1). I can do so soon-ish.
  • That said, I don't have any intentions to backport 3.8 to stretch.
Dec 21 2019, 1:30 AM · Operations, Continuous-Integration-Infrastructure
faidon updated subscribers of T237466: Remove unused custom fields from Netbox.

The owner field will have to stay with us for a little while longer (until the end of Q4). The other two ("Support until" and "Support contract") can be dropped at our earliest convenience. Adjustments need to be made in at least the export templates and maybe even reports. @Volans and/or @crusnov, that's now over to you. (Hopefully the backups work in case we later realize it's a mistake)

Dec 21 2019, 12:18 AM · SRE-tools, DC-Ops, netbox
faidon triaged T241289: Netbox report check for inventory items purchase date/task as Medium priority.
Dec 21 2019, 12:16 AM · SRE-tools, netbox

Dec 16 2019

faidon added a comment to T213843: Juniper network device audit - all sites.

Thanks @ayounsi! Appreciate the follow up. What exactly did you ask them to do in this last communication?

Dec 16 2019, 11:27 AM · DC-Ops, netops, Operations

Dec 13 2019

faidon added a comment to T238305: Servers freezing across the caching cluster.

Note that R440s comprise 23.5% of the whole fleet, 84.1% of all servers purchased in the last 12 months, and 67.5% of all servers purchased in the last 24 months (I wish I had a graph!). Given this sample size, this may be just correlated to R440s and not specifically tied to them.

Dec 13 2019, 3:55 PM · Operations, Traffic
faidon added a comment to T234234: Port architecture of irc-recentchanges to Kafka.

Thanks @Krinkle, very much appreciate all this! I have code from a couple of weeks ago that basically implements all this: consuming from SSE and formatting into IRC logging messages, but by using log_action_comment. It needs some more polishing and repository creation etc. I'll add you as code reviewer once I find some time to work on something better than Gist; hopefully during the end of year holidays.

Dec 13 2019, 1:31 AM · Patch-For-Review, User-Elukey, Analytics

Dec 11 2019

faidon added a comment to T240181: Documentation improvements for Eventstreams.

BTW, I don't think the IRC recentchanges stuff needs to consider historical consumption. The current IRC service doesn't support that now. I think we can always start consuming from latest offset (-1).

Dec 11 2019, 11:09 AM · Analytics-Kanban, Event-Platform, User-Elukey, Analytics

Dec 6 2019

Dzahn awarded T185319: IRC RecentChanges feed: code stewardship request a Barnstar token.
Dec 6 2019, 10:04 PM · Tools, Operations, Analytics, Wikimedia-IRC-RC-Server, Code-Stewardship-Reviews

Dec 5 2019

faidon committed rOSNEe8e17119d50e: Add three new Sentry PDU expansion units (authored by faidon).
Add three new Sentry PDU expansion units
Dec 5 2019, 11:20 PM
faidon committed rOSNE72d431eb5841: Remove esams exclusion (authored by ayounsi).
Remove esams exclusion
Dec 5 2019, 11:20 PM
faidon committed rOSNE4e051033cb6e: accounting: map the Date field as well (authored by faidon).
accounting: map the Date field as well
Dec 5 2019, 11:20 PM
faidon committed rOSNE56ed882328e1: librenms: exclude another PDU secondary model (authored by faidon).
librenms: exclude another PDU secondary model
Dec 5 2019, 11:20 PM
faidon committed rOSNE992af999f1ef: coherence: optimize by reducing database queries (authored by faidon).
coherence: optimize by reducing database queries
Dec 5 2019, 11:20 PM
faidon committed rOSNE5a06bde0dd25: Add "accounting" report (authored by faidon).
Add "accounting" report
Dec 5 2019, 11:20 PM
faidon committed rOSNE1613f49eb192: Remove the oldhardware report (authored by faidon).
Remove the oldhardware report
Dec 5 2019, 11:20 PM
faidon committed rOSNE4f957f2de0da: Further fixes to the coherence report (authored by crusnov).
Further fixes to the coherence report
Dec 5 2019, 11:19 PM