Page MenuHomePhabricator

faidon (Faidon Liambotis)
SRE

Projects (13)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 10:21 AM (280 w, 1 d)
Availability
Available
IRC Nick
paravoid
LDAP User
Faidon Liambotis
MediaWiki User
Faidon Liambotis (WMF) [ Global Accounts ]

Recent Activity

Mon, Feb 17

faidon added a comment to T245161: Track down and replace very old HW.

Regarding dbproxy1001, which is not on that list (but I guess it should be), it is still pending waiting for on-site steps: T244463: Decommission dbproxy1001.eqiad.wmnet
Other dbproxy current status:
dbproxy1004 and dbproxy1009 are fully decommissioned: T228768
dbproxy1005 is fully decommissioned: T231967
dbproxy1006 is waiting for on-site steps: T233207

Mon, Feb 17, 1:11 PM · DC-Ops

Thu, Feb 13

faidon triaged T245161: Track down and replace very old HW as High priority.
Thu, Feb 13, 3:36 PM · DC-Ops
faidon updated subscribers of T146455: Decommission labsdb1002.

@Jclark-ctr @wiki_willy what's the status here? It sounds like a decom that was only partial and that only needs a few more steps to finalize perhaps?

Thu, Feb 13, 2:30 PM · hardware-requests, Patch-For-Review, ops-eqiad, Operations

Wed, Feb 12

faidon added a comment to T234234: Redesign architecture of irc-recentchanges on top of Kafka.

First off: I have prototype code that supports UDP Echo and SSE, but not Kafka. It's not something that it's fully ready or tested yet. This has been developed over weekends/holidays etc., as a fun project -- and I can't promise I'll find spare time to add more stuff to it right now. Someone that can commit to it -staff or volunteer- should pick it up at some point and maybe also add Kafka in the process. We still have an open item and pending conversation on where ownership for the service itself lies.

Wed, Feb 12, 12:39 AM · User-Elukey, Analytics
faidon added a comment to T244719: Create a replacement for kraz.wikimedia.org.

The way this works now is that the entire MW fleet sends UDP packets to a specific IP (kraz) using the so-called "echo" protocol (= #channel<tab>message). We could theoretically switch this to a multicast address in order to get the ability of having multiple listeners (all connecting to separate IRC servers, each on each listener's localhost perhaps?), but noone has invested the time to do this and set up those multiple frontends.

Wed, Feb 12, 12:34 AM · serviceops, Operations, vm-requests, User-Elukey, Analytics

Fri, Feb 7

faidon added a comment to T244497: cr3-knams:xe-0/1/3 down.

Please file a procurement task for Willy/Rob to execute on :)

Fri, Feb 7, 1:42 PM · Operations, netops

Thu, Jan 23

faidon added a comment to T237466: Remove unused custom fields from Netbox.

Correct. Also check the export templates (in the admin interface) for references to those fields.

Thu, Jan 23, 5:42 PM · SRE-tools, DC-Ops, netbox
faidon assigned T229586: decommission cp1008, cp1071, cp1072, cp1073, cp1074, cp1099 to RobH.

(@Volans is not in Traffic), but regardles... judging from @BBlack comments before the flurry of Gerrit commits, it seems like I misunderstood where this lies. This is not blocked on Traffic, but with DC Ops. Reassigning to @RobH and apologies for the added confusion!

Thu, Jan 23, 5:27 PM · ops-eqiad, DC-Ops, Operations, decommission
faidon added a comment to T229586: decommission cp1008, cp1071, cp1072, cp1073, cp1074, cp1099.

Traffic team, ping? This task has been open since August last year and as I was just saying on IRC, cp1008 is a constant outlier in all of our reports, projections, planning etc. Its purchase date is Jan 27th, 2011, 9 years ago almost to the day :)

Thu, Jan 23, 12:59 PM · ops-eqiad, DC-Ops, Operations, decommission

Wed, Jan 22

faidon added a comment to T242250: rack/setup/install ps[12]-60[34]-eqsin.

Hey - this was a Q2 task but it hasn't seen an update in a while. What's the status?

Wed, Jan 22, 3:44 PM · Operations, ops-eqsin
faidon added a comment to T243288: Retire the Tor relay.

To your last point: the WMCS Terms of Use explicitly lists "network proxy" in the "prohibited activities" section -and even names Tor specifically as the first example of such an activity- so running a node in a Cloud VPS is not an option here. This policy has been there since the inception of the Labs/WMCS ToS, and while I can't speak to the rationale behind it, I can say that prohibiting remains a good idea today: running proxies, whether in WMCS or in the production realm can be a messy business, and one that we don't have the capacity to support as an org.

Wed, Jan 22, 10:35 AM · Tor, Operations

Jan 20 2020

faidon added a comment to T213843: Juniper network device audit - all sites.

@ayounsi, what's the status here?

Jan 20 2020, 10:39 AM · DC-Ops, netops, Operations

Jan 17 2020

faidon added a comment to T184066: rack/setup/install ps[12]-oe1[456]-esams.

Could we import into Netbox now, and then change & document the setup at our convenience? It feels like documenting the existing situation and changing it are orthogonal to each other - any reason to block one on the other?

Jan 17 2020, 5:09 PM · Operations, ops-esams
faidon added a comment to T184066: rack/setup/install ps[12]-oe1[456]-esams.

What is the status of this?

Jan 17 2020, 12:12 PM · Operations, ops-esams
faidon added a comment to T237466: Remove unused custom fields from Netbox.

Owner was due to the leasing situation (as it lists Farnam as the only option), so I defer to @faidon's call on when that can go.

Jan 17 2020, 9:48 AM · SRE-tools, DC-Ops, netbox
faidon added a comment to T243048: python3.4 broken on deployment-logstash2.

I've seen this issue before, and if I recall correctly, it was an issue with the Python 3.4 backport. I think the latest backport for 3.4.10-1~stretch1 should fix it.

Jan 17 2020, 9:46 AM · Operations, Beta-Cluster-Infrastructure

Jan 16 2020

faidon added a comment to T242715: Webproxies are a SPOF.

I think increasing the availability and resilience of this service is an excellent idea! However, adding more servers to per site feels like a requirement, and a standard Pybal/IPVS setup sounds much more appropriate than anycast for this use case.

Jan 16 2020, 12:18 PM · Operations

Jan 14 2020

faidon added a comment to T242602: Sort out plan for install* servers in edge sites.

Splitting the internal apt repository from the install roles/servers sounds good -- it's more of a historical artifact than anything else. You probably know this already but do note that the install server does not provide just TFTP, but also HTTP (and that is actually favored these days), so we would need to have a webserver running on the install servers.

Jan 14 2020, 5:48 AM · Patch-For-Review, Operations

Jan 13 2020

faidon added a comment to T242412: ulsfo doesn't have any rack group set in Netbox.

@Volans, out of curiosity, why was this required? Note that the concept of "rows" doesn't apply in this site, it's just two racks next to each other :)

Jan 13 2020, 8:54 AM · DC-Ops, netbox
faidon added a comment to T226044: Prepare Phame to support heavy traffic for a Tech Department blog.

This task is about preparing "Phame to support heavy traffic for a Tech Department blog", which is not the plan anymore. We should probably decline this task in favor of another more-generic task ("set up a tech department blog"). @Bmueller, @srodlund, thoughts?

Jan 13 2020, 8:09 AM · Release-Engineering-Team-TODO, Release-Engineering-Team (Development services), Operations, Traffic, Phabricator

Jan 10 2020

faidon added a project to T241195: Add python3.8 to buster-wikimedia pyall component: Operations.
Jan 10 2020, 5:41 PM · Operations, Continuous-Integration-Infrastructure
faidon updated subscribers of T241195: Add python3.8 to buster-wikimedia pyall component.

I've updated the aforementioned apt repository with 3.8.1-2~buster1 packages Someone in SRE that's more familiar with how we do things these days (maybe @MoritzMuehlenhoff?) can update our reprepro to include that.

Jan 10 2020, 5:40 PM · Operations, Continuous-Integration-Infrastructure

Dec 21 2019

faidon added a comment to T241195: Add python3.8 to buster-wikimedia pyall component.
  • The canonical location is nowadays https://people.debian.org/~paravoid/python-all/ (which I maintain on my free time). We (Wikimedia) probably should set up a reprepro import for that.
  • The above repository has 3.8.0 beta4 for buster, I'll need to update that for a more recent version (currently looks like 3.8.1). I can do so soon-ish.
  • That said, I don't have any intentions to backport 3.8 to stretch.
Dec 21 2019, 1:30 AM · Operations, Continuous-Integration-Infrastructure
faidon updated subscribers of T237466: Remove unused custom fields from Netbox.

The owner field will have to stay with us for a little while longer (until the end of Q4). The other two ("Support until" and "Support contract") can be dropped at our earliest convenience. Adjustments need to be made in at least the export templates and maybe even reports. @Volans and/or @crusnov, that's now over to you. (Hopefully the backups work in case we later realize it's a mistake)

Dec 21 2019, 12:18 AM · SRE-tools, DC-Ops, netbox
faidon triaged T241289: Netbox report check for inventory items purchase date/task as Medium priority.
Dec 21 2019, 12:16 AM · SRE-tools, netbox

Dec 16 2019

faidon added a comment to T213843: Juniper network device audit - all sites.

Thanks @ayounsi! Appreciate the follow up. What exactly did you ask them to do in this last communication?

Dec 16 2019, 11:27 AM · DC-Ops, netops, Operations

Dec 13 2019

faidon added a comment to T238305: servers freeze across the caching cluster.

Note that R440s comprise 23.5% of the whole fleet, 84.1% of all servers purchased in the last 12 months, and 67.5% of all servers purchased in the last 24 months (I wish I had a graph!). Given this sample size, this may be just correlated to R440s and not specifically tied to them.

Dec 13 2019, 3:55 PM · Traffic, Operations
faidon added a comment to T234234: Redesign architecture of irc-recentchanges on top of Kafka.

Thanks @Krinkle, very much appreciate all this! I have code from a couple of weeks ago that basically implements all this: consuming from SSE and formatting into IRC logging messages, but by using log_action_comment. It needs some more polishing and repository creation etc. I'll add you as code reviewer once I find some time to work on something better than Gist; hopefully during the end of year holidays.

Dec 13 2019, 1:31 AM · User-Elukey, Analytics

Dec 11 2019

faidon added a comment to T240181: Documentation improvements for Eventstreams.

BTW, I don't think the IRC recentchanges stuff needs to consider historical consumption. The current IRC service doesn't support that now. I think we can always start consuming from latest offset (-1).

Dec 11 2019, 11:09 AM · Analytics-Kanban, Event-Platform, User-Elukey, Analytics

Dec 6 2019

Dzahn awarded T185319: IRC RecentChanges feed: code stewardship request a Barnstar token.
Dec 6 2019, 10:04 PM · Tools, Operations, Analytics, Wikimedia-IRC-RC-Server, Code-Stewardship-Reviews

Dec 5 2019

faidon committed rOSNEe8e17119d50e: Add three new Sentry PDU expansion units (authored by faidon).
Add three new Sentry PDU expansion units
Dec 5 2019, 11:20 PM
faidon committed rOSNE72d431eb5841: Remove esams exclusion (authored by ayounsi).
Remove esams exclusion
Dec 5 2019, 11:20 PM
faidon committed rOSNE4e051033cb6e: accounting: map the Date field as well (authored by faidon).
accounting: map the Date field as well
Dec 5 2019, 11:20 PM
faidon committed rOSNE56ed882328e1: librenms: exclude another PDU secondary model (authored by faidon).
librenms: exclude another PDU secondary model
Dec 5 2019, 11:20 PM
faidon committed rOSNE992af999f1ef: coherence: optimize by reducing database queries (authored by faidon).
coherence: optimize by reducing database queries
Dec 5 2019, 11:20 PM
faidon committed rOSNE5a06bde0dd25: Add "accounting" report (authored by faidon).
Add "accounting" report
Dec 5 2019, 11:20 PM
faidon committed rOSNE1613f49eb192: Remove the oldhardware report (authored by faidon).
Remove the oldhardware report
Dec 5 2019, 11:20 PM
faidon committed rOSNE4f957f2de0da: Further fixes to the coherence report (authored by crusnov).
Further fixes to the coherence report
Dec 5 2019, 11:19 PM
faidon updated the task description for T187456: Decommission labstore100[123] and their disk shelves.
Dec 5 2019, 6:13 PM · cloud-services-team (Hardware), decommission, Data-Services, Operations, DC-Ops, ops-eqiad

Dec 3 2019

faidon edited projects for T239675: Add 10G NICs to core site DNS servers (6 servers, 3 per site), added: hardware-requests; removed ops-codfw, ops-eqiad.
Dec 3 2019, 8:28 PM · hardware-requests, Traffic, Operations

Dec 2 2019

faidon added a project to T239597: Hardware asset tag Netbox/DNS mgmt inconsistencies: ops-eqiad.
Dec 2 2019, 3:56 PM · Operations, ops-eqiad, DC-Ops

Nov 28 2019

faidon closed T174637: Setup esams atlas anchor as Resolved.

All done!

Nov 28 2019, 1:50 PM · Operations, netops, ops-esams
faidon closed T174637: Setup esams atlas anchor, a subtask of T184061: SRE 2017-18 Q3 goal Cleanup esams and refresh servers and infrastructure (tracking), as Resolved.
Nov 28 2019, 1:49 PM · Operations, Epic, ops-esams
faidon updated the task description for T174637: Setup esams atlas anchor.
Nov 28 2019, 1:49 PM · Operations, netops, ops-esams
faidon added a comment to T174637: Setup esams atlas anchor.

The nl-ams-as14907 anchor is now fully online and has ID #6671.

Nov 28 2019, 1:38 PM · Operations, netops, ops-esams

Nov 26 2019

faidon added a comment to T234234: Redesign architecture of irc-recentchanges on top of Kafka.

I think conceptually this belongs together with EventStreams, as a product offering and, by extension, to the same owners and maintainers. This is just another (non-HTTP) API for streaming events, like RCStream was, and its fate and evolution should be viewed together as a whole. For example, a valid product decision -now or in the future- may be "we'll sunset this by date X, and we recommend users to migrate to Y".

Nov 26 2019, 10:29 PM · User-Elukey, Analytics
faidon triaged T239244: Netbox report check for no position set in rack as Medium priority.
Nov 26 2019, 3:25 PM · netbox, Operations
faidon reassigned T238652: Hardware request for Postgres database for censorship monitoring scripts from faidon to RobH.

Approved.

Nov 26 2019, 3:00 PM · Operations, hardware-requests
faidon added a comment to T234234: Redesign architecture of irc-recentchanges on top of Kafka.
  • Do we need to support full IRC spec? I see in Faidon's PoC he's attempting to at least respond to them all. AFACIT we don't actually support any commands other than joining channels. I guess we want to keep track of user nicks too? Do we need to? We could just keep the list of connected clients in channels and broadcast the incoming messages. Could we just ignore things we don't support? E.g. if not JOIN or whatever else we need: respond with some 'command unsupported' message.
Nov 26 2019, 1:44 PM · User-Elukey, Analytics
faidon closed T236767: cr3-esams:et-1/0/0 flap as Resolved.

Looks good!

Nov 26 2019, 12:13 PM · Operations, ops-esams

Nov 25 2019

faidon updated the task description for T237014: Update spare QFX labels.
Nov 25 2019, 8:05 PM · Operations, ops-esams
faidon reopened T239110: Asset tag remaining cablemgmt in eqiad as "Open".

Some of these were not done - I suspect partially because my ranges were misparsed as individual items (should had made that clearer, apologies!). The following are still missing asset tags:

  • cablemgmt-eqiad-C4
  • cablemgmt-eqiad-C5
  • cablemgmt-eqiad-C6
  • cablemgmt-eqiad-D1
  • cablemgmt-eqiad-D2
  • cablemgmt-eqiad-D3
  • cablemgmt-eqiad-D4
  • cablemgmt-eqiad-D5
  • cablemgmt-eqiad-D6
  • cablemgmt-eqiad-D7
Nov 25 2019, 8:02 PM · ops-eqiad, Operations, DC-Ops
faidon closed T227632: Document PDU models as Resolved.

I went digging in RT and fixed it for all of them except the old/unracked/offline sdtpa PDUs.

Nov 25 2019, 7:36 PM · netbox, Operations, ops-codfw, ops-eqiad
faidon renamed T237030: Setup new MX204 in knams from setup new MX204 in knams to Setup new MX204 in knams.
Nov 25 2019, 7:10 PM · netops, Operations, ops-esams
faidon updated subscribers of T236767: cr3-esams:et-1/0/0 flap.

@mark swapped the optic with a new one and the link is now reenabled. This is being monitored for another 24-36h and will be resolved then.

Nov 25 2019, 4:37 PM · Operations, ops-esams
faidon added a comment to T174637: Setup esams atlas anchor.

The Anchor is now installed, connected to the SCS, and we see a getty on serial with the right hostname. It's also now responsive to IPv4 pings but not IPv6 (which matches our previous experiene with regards to the initial install).

Nov 25 2019, 4:13 PM · Operations, netops, ops-esams
faidon updated the task description for T174637: Setup esams atlas anchor.
Nov 25 2019, 4:11 PM · Operations, netops, ops-esams
faidon created T239110: Asset tag remaining cablemgmt in eqiad.
Nov 25 2019, 1:49 PM · ops-eqiad, Operations, DC-Ops
faidon added a comment to T237803: Netbox reports Icinga checks timeout.

What's the status of this task?

Nov 25 2019, 1:27 PM · Operations, SRE-tools, netbox
faidon triaged T239098: Duplicate cable label in cr1-eqiad/cr2-eqiad as High priority.
Nov 25 2019, 12:44 PM · DC-Ops, ops-eqiad, Operations

Nov 24 2019

Krinkle awarded T98006: Anycast AuthDNS a Orange Medal token.
Nov 24 2019, 3:32 AM · Performance-Team (Radar), Patch-For-Review, netops, Operations, Traffic

Nov 21 2019

faidon added a comment to T238820: CloudVPS: consider mirroring debian repos for openstack packages.

/debian/ is for the official Debian mirror -- and note that we are part of the ftp.us.debian.org/http.us.debian.org rotation. So no, we should not pollute that namespace (and I don't think that would work anyway, ftpsync would just delete it all in the next sync).

Nov 21 2019, 2:30 PM · cloud-services-team (Kanban), Cloud-Services
faidon added a comment to T223292: Netbox: generate CSV backups.

I believe there has been progress here since the last update. @crusnov what's the latest?

Nov 21 2019, 1:51 PM · netbox
faidon closed T210566: Netbox should use CN rather than UID for LDAP login username as Declined.

I'll decline, on the basis that this will be converted to use SSO soon-ish, and there's no point in going over two migrations :)

Nov 21 2019, 1:49 PM · netbox, Operations
faidon added a comment to T238820: CloudVPS: consider mirroring debian repos for openstack packages.

We don't have specific criteria, it's on a case-by-case basis. This particular one sounds fine to me, let's do it! :)

Nov 21 2019, 1:28 PM · cloud-services-team (Kanban), Cloud-Services
faidon added a comment to T174637: Setup esams atlas anchor.

Update: given the upcoming follow-up visit to esams next week, I requested a new image from RIPE. I got it today, and it can be found in the same place, as "anchor.nl-ams-as14907-v2.img".

Nov 21 2019, 12:57 PM · Operations, netops, ops-esams

Nov 20 2019

faidon added a member for acl*procurement-review: leila.
Nov 20 2019, 4:12 PM
faidon renamed T233448: Review prometheus ORES rules for completeness from Review promethius ORES rules for completeness to Review prometheus ORES rules for completeness.
Nov 20 2019, 3:48 PM · Patch-For-Review, ORES, Scoring-platform-team
faidon added a comment to T237803: Netbox reports Icinga checks timeout.

This is an excerpt of the backlog overnight:

01:17 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
02:31 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
02:43 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
03:00 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
03:23 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
04:14 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
04:48 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
05:16 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
06:19 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
06:42 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
07:33 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
09:56 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
11:11 < icinga-wm> PROBLEM - Netbox report coherence. on netbox1001 is CRITICAL: coherence.Coherence CRITICAL https://netbox.wikimedia.org/extras/reports/coherence.Coherence
Nov 20 2019, 11:28 AM · Operations, SRE-tools, netbox

Nov 19 2019

faidon added a comment to T224946: Netbox Alert Cleanups.

What's the latest here? Please keep the task updated :)

Nov 19 2019, 12:55 PM · Operations, observability, User-crusnov, netbox, SRE-tools

Nov 15 2019

faidon added a project to T227632: Document PDU models: netbox.

Now that the PDU migration in eqiad has been completed, all that's left in this task is to record and document the modles for:

  • eqiad's row D (rows A/B as well as C are all documented now)
  • codfw
Nov 15 2019, 3:46 PM · netbox, Operations, ops-codfw, ops-eqiad
faidon raised the priority of T237492: Create a second text-lb IP address for test purposes from Medium to High.

It looks like there are proposed patches for this, so perhaps we're not too far off? This ties to an exploration we're doing with a vendor so it's relatively time-sensitive. Thanks a lot!

Nov 15 2019, 1:03 PM · Traffic, Operations

Nov 5 2019

faidon triaged T237466: Remove unused custom fields from Netbox as Medium priority.
Nov 5 2019, 9:20 PM · SRE-tools, DC-Ops, netbox
faidon added a comment to T233318: scs monitoring missing in Icinga.

Thanks @herron! Should we resolve this?

Nov 5 2019, 3:16 PM · Icinga, observability, Operations

Nov 4 2019

faidon changed the status of T204589: eqiad: (1) misc single cpu server allocation for performance browser testing from Open to Stalled.

Update per IRC conversation with @Gilles: this is still needed, but is stalled and currently blocked until the Performance-Team fills its vacancy.

Nov 4 2019, 10:23 AM · Performance-Team (Radar), Operations, hardware-requests

Nov 1 2019

faidon added a parent task for T235669: codfw: recycle Cisco old servers: T128821: reclaim and return all cisco servers.
Nov 1 2019, 2:59 PM · ops-codfw, Operations
faidon added a subtask for T128821: reclaim and return all cisco servers: T235669: codfw: recycle Cisco old servers.
Nov 1 2019, 2:59 PM · decommission, Goal, Operations
faidon renamed T235669: codfw: recycle Cisco old servers from Recycle Cisco old servers to codfw: recycle Cisco old servers.
Nov 1 2019, 2:58 PM · ops-codfw, Operations
faidon added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): T237030: Setup new MX204 in knams.
Nov 1 2019, 2:53 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
faidon added a parent task for T237030: Setup new MX204 in knams: T235805: ESAMS Refresh/Rebuild (October 2019).
Nov 1 2019, 2:53 PM · netops, ops-esams, Operations
faidon closed T237011: Update DNS/NTP servers on the esams PDUs/SCS as Resolved.

Anycasting NTP sounds a good idea in general, but a) should be kept in a separate task b) it doesn't sound like a priority IMHO at this time. Things work OK, and that sounds like a time investment that won't pay off right now.

Nov 1 2019, 2:51 PM · DC-Ops, ops-esams, Operations
faidon reassigned T214024: Two test hosts for SREs from faidon to RobH.

Ping :)

Nov 1 2019, 2:37 PM · Operations, hardware-requests

Oct 31 2019

faidon added a comment to T233774: Netbox: tracking of hardware errors / grouping servers in order/batches.

Indeed, and in fact procurement task alone would be enough to identify the batch. Is that what you were looking for @MoritzMuehlenhoff? How could we make this more visible?

Oct 31 2019, 5:03 PM · Operations, netbox
faidon moved T236767: cr3-esams:et-1/0/0 flap from Backlog to Next visit on the ops-esams board.
Oct 31 2019, 3:06 PM · ops-esams, Operations
faidon moved T237006: Relabel cables with duplicate IDs from Backlog to Next visit on the ops-esams board.
Oct 31 2019, 3:06 PM · ops-esams, Operations
faidon moved T237009: Add missing labels for equipment and cables from Backlog to Next visit on the ops-esams board.
Oct 31 2019, 3:06 PM · DC-Ops, Operations, ops-esams
faidon moved T237014: Update spare QFX labels from Backlog to Next visit on the ops-esams board.
Oct 31 2019, 3:06 PM · Operations, ops-esams
faidon updated subscribers of T236216: rack/setup/install ganeti300[123].

@BBlack, what were your plans here? Can others in SRE help with some of that perhaps?

Oct 31 2019, 2:04 PM · Operations, ops-esams
faidon renamed T237011: Update DNS/NTP servers on the esams PDUs/SCS from Update DNS/NTP for all non-network/server gear to Update DNS/NTP servers on the esams PDUs/SCS.
Oct 31 2019, 1:51 PM · DC-Ops, ops-esams, Operations
faidon triaged T237011: Update DNS/NTP servers on the esams PDUs/SCS as Medium priority.
Oct 31 2019, 1:45 PM · DC-Ops, ops-esams, Operations
faidon added a comment to T184066: rack/setup/install ps[12]-oe1[456]-esams.

@RobH what's the status of this?

Oct 31 2019, 1:41 PM · Operations, ops-esams
faidon triaged T237009: Add missing labels for equipment and cables as Medium priority.
Oct 31 2019, 1:36 PM · DC-Ops, Operations, ops-esams
faidon renamed T237007: Add a Netbox check for duplicate cable IDs from Add a Netbox coherence check for duplicate cable IDs to Add a Netbox check for duplicate cable IDs.
Oct 31 2019, 1:15 PM · DC-Ops, SRE-tools, netbox
faidon closed T184061: SRE 2017-18 Q3 goal Cleanup esams and refresh servers and infrastructure (tracking), a subtask of T235805: ESAMS Refresh/Rebuild (October 2019), as Resolved.
Oct 31 2019, 1:07 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
faidon closed T184061: SRE 2017-18 Q3 goal Cleanup esams and refresh servers and infrastructure (tracking) as Resolved.
Oct 31 2019, 1:07 PM · Operations, Epic, ops-esams
faidon created T237007: Add a Netbox check for duplicate cable IDs.
Oct 31 2019, 1:05 PM · DC-Ops, SRE-tools, netbox
faidon triaged T237006: Relabel cables with duplicate IDs as Medium priority.
Oct 31 2019, 1:01 PM · ops-esams, Operations
faidon moved T185337: rack spare switches in c1-eqiad from Backlog to Watching on the netops board.
Oct 31 2019, 1:02 AM · Operations, netops, ops-eqiad
faidon moved T208734: Decommission asw-c-eqiad from Backlog to Watching on the netops board.
Oct 31 2019, 1:01 AM · decommission, Operations, ops-eqiad, netops
faidon moved T208788: Decommission asw-b-eqiad from Backlog to Watching on the netops board.
Oct 31 2019, 1:01 AM · decommission, Operations, ops-eqiad, netops