Page MenuHomePhabricator

crusnov (Cas Rusnov)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 15 2018, 5:56 PM (36 w, 19 h)
Availability
Available
LDAP User
CRusnov
MediaWiki User
Unknown

Recent Activity

Yesterday

crusnov committed rOSNB975b7b33ad3f: Add a passthrough configuration system (authored by crusnov).
Add a passthrough configuration system
Mon, Jun 24, 11:41 PM
crusnov committed rOSNB3fe0ea706fc1: Add a passthrough configuration system (authored by crusnov).
Add a passthrough configuration system
Mon, Jun 24, 11:41 PM
crusnov committed rCUMINae6e9681a2af: backends: add Netbox backend (authored by crusnov).
backends: add Netbox backend
Mon, Jun 24, 6:10 PM
crusnov committed rCUMINbdbccba0f729: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Mon, Jun 24, 6:07 PM
crusnov committed rCUMINb4e30034ea67: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Mon, Jun 24, 4:17 PM
crusnov added a comment to T205900: Cumin: add backend for Netbox.

Just closing the loop here, the backend is up for review, but there is apparently a pylint bug preventing CI from passing (or there was last week).

Mon, Jun 24, 3:48 PM · Patch-For-Review, netbox, Operations, Operations-Software-Development
crusnov added a comment to T226331: Upgrade Netbox to 2.6.0.

Roger, it might be best to wait on the upgrade until the split to Ganeti is done (maybe early next week) as using Redis was part of the spec for that.

Mon, Jun 24, 3:21 PM · netbox

Tue, Jun 18

crusnov created P8628 tables to dump from Netbox.
Tue, Jun 18, 6:10 PM

Mon, Jun 17

crusnov moved T205900: Cumin: add backend for Netbox from Up next to In Code Review on the Operations-Software-Development board.
Mon, Jun 17, 11:13 PM · Patch-For-Review, netbox, Operations, Operations-Software-Development
crusnov moved T217072: Spicerack module for Netbox from Pending to Complete on the User-crusnov board.
Mon, Jun 17, 11:13 PM · netbox, Patch-For-Review, User-crusnov, Operations-Software-Development
crusnov moved T224946: Netbox Alert Cleanups from Backlog to In Progress on the User-crusnov board.
Mon, Jun 17, 11:13 PM · User-crusnov, netbox, Operations-Software-Development

Wed, Jun 12

crusnov committed rCUMIN891554eae47a: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Wed, Jun 12, 8:46 AM

Tue, Jun 11

crusnov committed rCUMIN7f1369cdef37: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Tue, Jun 11, 1:28 PM
crusnov committed rCUMINe17b61990855: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Tue, Jun 11, 10:35 AM

Fri, Jun 7

crusnov added a comment to T209182: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack).

okay so this works, mostly, in labs when manually configured to operate against the deployment-prep Swift cluster. Netbox lets me upload images and shows them associated with the object in question - except that viewing fails because they are served from a URL in the swift cluster that is unavailable. We'll look at this part more next week no doubt.

Fri, Jun 7, 4:02 PM · Patch-For-Review, netbox, Operations
crusnov renamed T209182: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack) from netbox won't allow me to upload photos of the rack to Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack).
Fri, Jun 7, 2:46 PM · Patch-For-Review, netbox, Operations

Thu, Jun 6

crusnov committed rCUMIN15e8d9ecfc6c: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Thu, Jun 6, 7:16 PM

Wed, Jun 5

crusnov created T225166: Gerrit crashed due to out of Heap.
Wed, Jun 5, 10:22 PM · Gerrit

Mon, Jun 3

crusnov added a comment to T221507: Netbox report to validate network equipment data.

I agree that from the perspective of more closely modelling the devices between the various tools that the domain name for the VC name thing is necessary. I'm not completely clear on how that would make the matching better? Currently the by-serial matching seems to be working correctly, the complexities are mostly in lining up vendor and model information at this point, unless I'm mistaken - and this appears to be approachable either by matching things more loosely or creating a map between what's in LibreNMS and what's in Netbox. Separately, there are only a few inventory items which don't appear to line up, but I believe it's because they are builtin so they are left out of the librenms query.

Mon, Jun 3, 11:14 PM · Patch-For-Review, netbox, User-crusnov, Operations-Software-Development, Operations, netops
crusnov updated the task description for T224946: Netbox Alert Cleanups.
Mon, Jun 3, 10:52 PM · User-crusnov, netbox, Operations-Software-Development
crusnov triaged T224946: Netbox Alert Cleanups as Normal priority.
Mon, Jun 3, 10:45 PM · User-crusnov, netbox, Operations-Software-Development
crusnov created T224946: Netbox Alert Cleanups.
Mon, Jun 3, 10:45 PM · User-crusnov, netbox, Operations-Software-Development
crusnov added a comment to T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports.

Alerts are alerting and in production.

Mon, Jun 3, 10:36 PM · netbox, Operations-Software-Development
crusnov closed T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports as Resolved.
Mon, Jun 3, 10:36 PM · netbox, Operations-Software-Development
crusnov updated the task description for T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports.
Mon, Jun 3, 10:35 PM · netbox, Operations-Software-Development

Wed, May 29

crusnov moved T223450: Triage and resolve all outstanding Netbox report errors from Backlog to In Progress on the Operations-Software-Development board.
Wed, May 29, 5:58 PM · ops-codfw, ops-eqiad, Operations, Operations-Software-Development, netbox, DC-Ops
crusnov moved T216469: Netbox: cable termination names report from Up next to In Code Review on the Operations-Software-Development board.
Wed, May 29, 5:57 PM · Patch-For-Review, netbox, Operations-Software-Development, Operations
crusnov added a comment to T216469: Netbox: cable termination names report.

After a discussion with Faidon, I think the general consensus is that DRAC (and ILO) should be an acceptable termination name for managament interfaces, in addition to, going forward, the normal default being mgmt\d? (enumerated in the case of tehre being multiple interfaces).

Wed, May 29, 5:54 PM · Patch-For-Review, netbox, Operations-Software-Development, Operations
crusnov added a comment to T216469: Netbox: cable termination names report.

Sample output:

Wed, May 29, 3:56 PM · Patch-For-Review, netbox, Operations-Software-Development, Operations

Tue, May 28

crusnov added a comment to T221507: Netbox report to validate network equipment data.

Hello here is the sample output. There are several inconsistencies that I can see the fix for that I'd already attempted to mitigate (but not successfully apparently) such as devices like Netbox devtype=Juniper EX4600-40F, LibreNMS devtype=Juniper Networks, Inc. ex4600-40f Ethernet Switch, kernel JUNOS 14.1X53-D45.3, Build date: 2017-07-28 01:39:39 UTC Copyright (c) 1996-2017 Juniper Networks, Inc. or Juniper EX4600 where the information is there it's just not lined up the same. Other things seem less obvious, like duplicated serial numbers and similar.

Tue, May 28, 4:45 PM · Patch-For-Review, netbox, User-crusnov, Operations-Software-Development, Operations, netops

May 22 2019

crusnov added a comment to T223292: Netbox: generate CSV backups.

Just to follow up on this. I did spend some time trying to figure out how to initiate a template-based export from hitting a URL. It seems as though there's no API-way, and hitting the URL endpoint doesn't work with a token authentication as far as I can tell.

May 22 2019, 11:01 PM · Patch-For-Review, netbox
crusnov added a comment to T222922: wmf7622 wont powercycle (cannot be allocated from spares).

I'm definitely in favor or allowing a failed state to basically come from any other state.

May 22 2019, 10:12 PM · Operations, ops-eqiad

May 21 2019

Krenair awarded T224057: Request increased quota for Automation Framework Cloud VPS project a Like token.
May 21 2019, 7:20 PM · Cloud-VPS (Quota-requests)
crusnov created T224057: Request increased quota for Automation Framework Cloud VPS project.
May 21 2019, 6:15 PM · Cloud-VPS (Quota-requests)
crusnov closed T220422: Netbox Reports: General Cleanup and Improvement as Resolved.

Merged the change and deployed which uses admin_state instead.

May 21 2019, 4:06 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov updated the task description for T220422: Netbox Reports: General Cleanup and Improvement.
May 21 2019, 4:05 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development

May 14 2019

crusnov added a comment to T221507: Netbox report to validate network equipment data.

It was pointed out to me that the vendor name in entPhysical is there, so we could hypothetically check that (for inventory items only) - the devices table remains complex.

May 14 2019, 10:24 PM · Patch-For-Review, netbox, User-crusnov, Operations-Software-Development, Operations, netops
crusnov added a comment to T221507: Netbox report to validate network equipment data.

Change 510256 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/software/netbox-reports@master] Add LibreNMS parity check report.

https://gerrit.wikimedia.org/r/510256

May 14 2019, 10:15 PM · Patch-For-Review, netbox, User-crusnov, Operations-Software-Development, Operations, netops

May 13 2019

crusnov added a comment to T221507: Netbox report to validate network equipment data.

After digging and discussing I believe the way forward since the mapping is slightly ... weird between LibreNMS and Netbox:

May 13 2019, 10:18 PM · Patch-For-Review, netbox, User-crusnov, Operations-Software-Development, Operations, netops
crusnov added a comment to T222922: wmf7622 wont powercycle (cannot be allocated from spares).

Hello, process question about this. The current flowchart for states doesn't allow Spare->Failed to happen, so there are some implicit assumptions inside of f or example the PuppetDB netbox report about that (Failed state is expected to be in Puppet since it implicitly comes from a production state). Is it the preference that boxes like this go through a Failed state (and thus never appear in Puppet? Thanks.

May 13 2019, 6:10 PM · Operations, ops-eqiad
crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

Just a note, admin_down does not seem to indicate anything particular about the machines that is useful to denote in Netbox as far as I can tell? It seems to reflect the *desired* state. To clarify is there any situation where it would not match the op_state within a short period of time? AFAICT it is used to tell ganeti to down or up the machine but I may be incorrect here. I have implemented mirroring the op_state but if we truly do need an extra field for admin_state that'd be useful to know.

You are correct that admin_state is the desired state, while oper_state is the operating one; basically, if your desired state is "this VM should be up" and for some reason (e.g. the host had an unscheduled reboot, or QEMU crashed etc.) the VM is down, then admin != oper, and a cronjob that runs the ganeti "watcher" will execute and fix things up (= start the VM). Until that happens, gnt-instance list will list the VM with a status of "ERROR_down", meaning "it's down, but it shouldn't be".

For the purposes of Netbox, I think what we need is the admin state, not the operating one, i.e. what we've configured Ganeti to do, rather than what has actually happened due to an error. The equivalent for a physical host is that we expect a Status: Offline host to be powered down, and a Status: Active host to be powered up, but we don't really track whether we've shut down or powercycled a host manually.

Does that make sense and do you agree?

May 13 2019, 1:35 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development

May 10 2019

crusnov added a comment to T222931: Netbox Reports Ideas and Requests.

An idea that came up in discussing DNS automation with @ayounsi is to verify interface names match, and/or automate updating interface names from PuppetDB into Netbox.

May 10 2019, 12:34 AM · netbox, User-crusnov, Operations-Software-Development
crusnov triaged T222931: Netbox Reports Ideas and Requests as Normal priority.
May 10 2019, 12:33 AM · netbox, User-crusnov, Operations-Software-Development
crusnov moved T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports from In Code Review to Pending release/deployment on the Operations-Software-Development board.
May 10 2019, 12:32 AM · netbox, Operations-Software-Development
crusnov moved T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports from Backlog to In Code Review on the Operations-Software-Development board.
May 10 2019, 12:32 AM · netbox, Operations-Software-Development

May 8 2019

crusnov renamed T222837: Discussion about synchronizing Ganeti VM network interfaces to Netbox from Discussion about synchronizing Ganeti devices to Netbox to Discussion about synchronizing Ganeti VM network interfaces to Netbox.
May 8 2019, 10:06 PM · Operations-Software-Development
crusnov triaged T222837: Discussion about synchronizing Ganeti VM network interfaces to Netbox as Normal priority.
May 8 2019, 7:04 PM · Operations-Software-Development

May 7 2019

crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

Just a note, admin_down does not seem to indicate anything particular about the machines that is useful to denote in Netbox as far as I can tell? It seems to reflect the *desired* state. To clarify is there any situation where it would not match the op_state within a short period of time? AFAICT it is used to tell ganeti to down or up the machine but I may be incorrect here. I have implemented mirroring the op_state but if we truly do need an extra field for admin_state that'd be useful to know.

May 7 2019, 6:20 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov moved T220422: Netbox Reports: General Cleanup and Improvement from Pending to Complete on the User-crusnov board.
May 7 2019, 6:17 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov updated the task description for T220422: Netbox Reports: General Cleanup and Improvement.
May 7 2019, 5:49 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov added a comment to T212697: uwsgi's logsocket_plugin.so causes segfaults during log rotation.

The patch seems sane and simple. I concur with this plan fwiw.

May 7 2019, 4:12 PM · Patch-For-Review, Operations
crusnov added a comment to T184086: Add prometheus exporter to Gerrit.

@crusnov we could use your help, yup. We need to create a prometheusBearerToken [plugin.javamelody.prometheusBearerToken] https://gerrit.googlesource.com/plugins/javamelody/+/refs/heads/stable-2.15/src/main/resources/Documentation/config.md .Which then get prometheus to query gerrit.wikimedia.org/r/monitoring?format=prometheus using the token.

May 7 2019, 2:44 PM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, Patch-For-Review, Gerrit, Operations

May 6 2019

crusnov added a comment to T184086: Add prometheus exporter to Gerrit.

Just to +1 the idea of shipping javamelody to prometheus. Let me know if I can help at all.

May 6 2019, 9:02 PM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, Patch-For-Review, Gerrit, Operations
crusnov created T222629: Netbox: Set up deploy groups for scap to ensure primary is deployed before secondary.
May 6 2019, 3:48 PM · User-crusnov, netbox, Operations-Software-Development

May 5 2019

crusnov added a comment to T212697: uwsgi's logsocket_plugin.so causes segfaults during log rotation.

I agree with this approach, and it's what I was pursuing some months ago. I have merged some time back support for uwsgi::app to set its LimitCORE for this very purpose[1]. Putting this into production should be trivial.

May 5 2019, 7:13 PM · Patch-For-Review, Operations

May 3 2019

crusnov moved T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports from Pending to Complete on the User-crusnov board.
May 3 2019, 9:32 PM · netbox, Operations-Software-Development

May 2 2019

crusnov closed T222351: Netbox: upgrade to v2.5.12 as Resolved.
May 2 2019, 9:02 PM · Patch-For-Review, User-crusnov, netbox
crusnov moved T222351: Netbox: upgrade to v2.5.12 from Pending to Complete on the User-crusnov board.
May 2 2019, 9:01 PM · Patch-For-Review, User-crusnov, netbox
crusnov moved T222351: Netbox: upgrade to v2.5.12 from Backlog to Pending on the User-crusnov board.
May 2 2019, 2:50 PM · Patch-For-Review, User-crusnov, netbox
crusnov added a project to T222351: Netbox: upgrade to v2.5.12: User-crusnov.
May 2 2019, 2:38 PM · Patch-For-Review, User-crusnov, netbox

May 1 2019

crusnov moved T218440: Cumin: allow running as non-root from Backlog to In Code Review on the Operations-Software-Development board.
May 1 2019, 8:16 PM · Patch-For-Review, Operations-Software-Development
crusnov moved T219908: Build an API for generating boot options for iPXE from Netbox et al. based on Serial Number from Up next to Backlog on the Operations-Software-Development board.
May 1 2019, 6:48 PM · User-crusnov, Operations-Software-Development
crusnov moved T215378: Figure out how to make Netbox Reports actionable / alertable from In Progress to In Code Review on the Operations-Software-Development board.
May 1 2019, 6:48 PM · netbox, Patch-For-Review, Operations-Software-Development
crusnov moved T220422: Netbox Reports: General Cleanup and Improvement from In Progress to In Code Review on the Operations-Software-Development board.
May 1 2019, 6:48 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov moved T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports from Backlog to Pending on the User-crusnov board.
May 1 2019, 6:48 PM · netbox, Operations-Software-Development
crusnov moved T218956: Should we deploy sshguard on external IP addresses? from Ready to Backlog on the User-crusnov board.
May 1 2019, 6:48 PM · Patch-For-Review, User-crusnov, Security-Team
crusnov moved T203963: Convert makevm to spicerack cookbook from Pending to Complete on the User-crusnov board.
May 1 2019, 6:47 PM · serviceops-radar, Patch-For-Review, User-crusnov, Operations-Software-Development, User-jijiki, User-Joe, Operations
crusnov moved T218709: Add Spicerack module for Ganeti from Pending to Complete on the User-crusnov board.
May 1 2019, 6:47 PM · Operations-Software-Development, User-crusnov
crusnov created T222294: refinery-sqoop-mediawiki-production on an-coord1001 needed restart.
May 1 2019, 6:21 PM · Analytics-Kanban, Analytics-Cluster

Apr 29 2019

crusnov added a comment to T221529: Frequent puppet failures .

Have not had time to look at this in depth yet however i did just notice an issue while applying a refactor change[1]

while applying the change set i got the following eror on a few hosts (mw1286 and cp2006) as examples.

Apr 29 15:10:31 mw1286 puppet-agent[17486]: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Duplicate declaration: Class[Standard] is already declared; cannot re declare at /etc/puppet/modules/profile/manifests/standard.pp:5 at /etc/puppet/modules/profile/manifests/standard.pp:5:5 on node mw1286.eqiad.wmnet

The error happened as puppet-merge was rolling out changes. I have not looked at how puppet-merge works but this looks like it is caused by a none atomic update. i.e. if puppet-merge is updating the code repository as at the same time a puppet compile is being preformed then one could have a situation where the catalog is compiled using some files from the production branch pre-merge and some post merges

[1]https://gerrit.wikimedia.org/r/c/operations/puppet/+/506990

Apr 29 2019, 5:03 PM · Puppet, puppet-compiler, Operations

Apr 26 2019

crusnov moved T221507: Netbox report to validate network equipment data from Backlog to In Progress on the User-crusnov board.
Apr 26 2019, 6:11 PM · Patch-For-Review, netbox, User-crusnov, Operations-Software-Development, Operations, netops
crusnov moved T218956: Should we deploy sshguard on external IP addresses? from In Progress to Ready on the User-crusnov board.
Apr 26 2019, 6:11 PM · Patch-For-Review, User-crusnov, Security-Team
crusnov moved T221507: Netbox report to validate network equipment data from Backlog to In Progress on the Operations-Software-Development board.
Apr 26 2019, 6:11 PM · Patch-For-Review, netbox, User-crusnov, Operations-Software-Development, Operations, netops
crusnov claimed T221507: Netbox report to validate network equipment data.
Apr 26 2019, 6:11 PM · Patch-For-Review, netbox, User-crusnov, Operations-Software-Development, Operations, netops
crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

Coherence report quality pass is deployed.

Apr 26 2019, 3:49 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov triaged T220422: Netbox Reports: General Cleanup and Improvement as Normal priority.
Apr 26 2019, 3:48 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov updated the task description for T220422: Netbox Reports: General Cleanup and Improvement.
Apr 26 2019, 3:48 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development

Apr 24 2019

crusnov updated the task description for T213114: Q3 2018/19 Goal: TEC6: Build automated workflows for server provisioning (Tracking Task).
Apr 24 2019, 8:07 PM · User-crusnov, Operations-Software-Development
crusnov added a comment to T221529: Frequent puppet failures .

I forget where but in digging about this it seems that Puppet will return 503 if it is too busy, there are numerous reports of this (to be clear I don't know if it's puppet itself or an intermediary that returns 503, but the result from the client's perspective is this).

Apr 24 2019, 3:38 PM · Puppet, puppet-compiler, Operations

Apr 23 2019

crusnov added a comment to T217074: Reduce Manual Steps in Provisioning by 4.

getting the netbox module in the cookbooks will save steps on decoms and probably reimages and installs (which share many procedures); the caveat is that in decoms it will have to prompt as to the state to transition into (decom or spare).

Apr 23 2019, 10:49 PM · User-crusnov, Operations-Software-Development
crusnov added a comment to T217074: Reduce Manual Steps in Provisioning by 4.

After several conversations with robh, I think we can start looking at the low hanging fruit. For the record all of these processes are mediated by a dynamic, ever changing checklist.

Apr 23 2019, 10:48 PM · User-crusnov, Operations-Software-Development
crusnov moved T217074: Reduce Manual Steps in Provisioning by 4 from Ready to In Progress on the User-crusnov board.
Apr 23 2019, 10:21 PM · User-crusnov, Operations-Software-Development
crusnov moved T217072: Spicerack module for Netbox from In Progress to Pending on the User-crusnov board.
Apr 23 2019, 10:21 PM · netbox, Patch-For-Review, User-crusnov, Operations-Software-Development
crusnov moved T220422: Netbox Reports: General Cleanup and Improvement from In Progress to Pending on the User-crusnov board.
Apr 23 2019, 10:21 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov moved T218709: Add Spicerack module for Ganeti from In Progress to Pending on the User-crusnov board.
Apr 23 2019, 10:21 PM · Operations-Software-Development, User-crusnov
crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.
Apr 23 2019, 4:55 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov moved T215229: Keep Ganeti VMs synchronized in Netbox from In Code Review to Pending release/deployment on the Operations-Software-Development board.
Apr 23 2019, 3:17 PM · Patch-For-Review, User-crusnov, Operations-Software-Development
crusnov moved T203963: Convert makevm to spicerack cookbook from In Code Review to Pending release/deployment on the Operations-Software-Development board.
Apr 23 2019, 3:17 PM · serviceops-radar, Patch-For-Review, User-crusnov, Operations-Software-Development, User-jijiki, User-Joe, Operations
crusnov moved T218709: Add Spicerack module for Ganeti from In Code Review to Pending release/deployment on the Operations-Software-Development board.
Apr 23 2019, 3:17 PM · Operations-Software-Development, User-crusnov
crusnov moved T217072: Spicerack module for Netbox from In Code Review to Pending release/deployment on the Operations-Software-Development board.
Apr 23 2019, 3:17 PM · netbox, Patch-For-Review, User-crusnov, Operations-Software-Development
crusnov closed T215229: Keep Ganeti VMs synchronized in Netbox as Resolved.
Apr 23 2019, 2:50 AM · Patch-For-Review, User-crusnov, Operations-Software-Development
crusnov closed T215229: Keep Ganeti VMs synchronized in Netbox, a subtask of T213114: Q3 2018/19 Goal: TEC6: Build automated workflows for server provisioning (Tracking Task), as Resolved.
Apr 23 2019, 2:50 AM · User-crusnov, Operations-Software-Development

Apr 17 2019

crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

exclude esams from console report

Apr 17 2019, 5:27 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

robh requests that the status show up in test_netbox_in_puppetdb

Apr 17 2019, 3:53 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov moved T219908: Build an API for generating boot options for iPXE from Netbox et al. based on Serial Number from In Progress to Backlog on the User-crusnov board.
Apr 17 2019, 3:50 PM · User-crusnov, Operations-Software-Development

Apr 16 2019

crusnov created T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports.
Apr 16 2019, 5:08 PM · netbox, Operations-Software-Development

Apr 12 2019

crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

Thanks for refiguring the checklist :)

Apr 12 2019, 10:19 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development
crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

I forgot another one, the opposite of this:

We needs a new method, to check for devices with Status: Offline, that have row/rack assigned. I'm sure there are plenty of those now.

i.e. alert on any devices with status not in (offline, planned) but with no row/rack assigned :)

Apr 12 2019, 8:42 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development

Apr 11 2019

crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

OK, so, after the efforts in the past few days, we're in a much better shape! The PuppetDB report seems to be (almost?) entirely indicative of real issues and is actionable now - I will involve DC Ops to start fixing the cases that are known to be real errors, and we'll see if there are any false positives (I know of at least one, that is tough to handle!).

The Coherence report in its current state is not super useful. Issues to fix:

  • Blacklist sites esams and knams for now; these are known to be wildly inconsistent (or should I say incoherent? :) and they are unfortunately not actionable right now. We can re-enable once we do a big cleanup project there, hopefully in a few months.
  • Blacklist certain roles, which are known to miss data and we accept that's OK. I can think of at least: Cable management, Storage bin, Optical device. Possibly others.
Apr 11 2019, 9:47 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, Operations-Software-Development

Apr 10 2019

crusnov moved T219908: Build an API for generating boot options for iPXE from Netbox et al. based on Serial Number from Backlog to In Progress on the User-crusnov board.
Apr 10 2019, 3:31 PM · User-crusnov, Operations-Software-Development