Page MenuHomePhabricator
Feed Advanced Search

Yesterday

crusnov committed rOBPYc1081a054896: Add git build-dep. Add vcs links (authored by crusnov).
Add git build-dep. Add vcs links
Mon, Jul 22, 5:44 PM
crusnov committed rOBPYe0ff52682030: Bump dh compat to 9 for stretch. (authored by crusnov).
Bump dh compat to 9 for stretch.
Mon, Jul 22, 5:44 PM
crusnov committed rOBPYdd9ce69f107b: debian/rules: skip autotest (authored by crusnov).
debian/rules: skip autotest
Mon, Jul 22, 5:44 PM
crusnov committed rOBPYd38c669f01ac: update gpb.conf (authored by crusnov).
update gpb.conf
Mon, Jul 22, 5:44 PM
crusnov committed rOBPYe7b140748b03: update gpb.conf (authored by crusnov).
update gpb.conf
Mon, Jul 22, 5:44 PM
crusnov committed rOBPYf9b2fca9b10b: initial debian stuff (authored by crusnov).
initial debian stuff
Mon, Jul 22, 5:44 PM
crusnov created P8780 Repropo errors.
Mon, Jul 22, 5:40 PM · SRE-tools
crusnov created T228670: Import management interfaces into Netbox from DNS.
Mon, Jul 22, 2:50 PM · User-crusnov, Goal, SRE-tools
crusnov moved T228387: Bare metal cloud: management interfaces from Backlog to In Progress on the User-crusnov board.
Mon, Jul 22, 2:33 PM · User-crusnov, Goal, SRE-tools
crusnov added a project to T228387: Bare metal cloud: management interfaces: User-crusnov.
Mon, Jul 22, 2:33 PM · User-crusnov, Goal, SRE-tools

Mon, Jul 8

MoritzMuehlenhoff awarded T203963: Convert makevm to spicerack cookbook a Like token.
Mon, Jul 8, 9:56 AM · serviceops-radar, Patch-For-Review, User-crusnov, SRE-tools, User-jijiki, User-Joe, Operations

Mon, Jul 1

crusnov added a comment to T203963: Convert makevm to spicerack cookbook.

Interesting, it sure does take a while for the disk to build, and the tool will wait.

Mon, Jul 1, 6:08 PM · serviceops-radar, Patch-For-Review, User-crusnov, SRE-tools, User-jijiki, User-Joe, Operations
crusnov renamed T212783: cumin: Make ouput path sane and flexible (was: allow to suppress output and progress bars) from cumin: allow to suppress output and progress bars to cumin: Make ouput path sane and flexible (was: allow to suppress output and progress bars).
Mon, Jul 1, 3:48 PM · SRE-tools

Wed, Jun 26

crusnov added a comment to T164587: cumin could use randomization/splay options.

After looking into this a bit, the details of how this would be done are a bit involved; since internally cumin uses a NodeSet from clustershell, which acts like a set(), the order is 'unspecified' (semi-random). If we want it to be more random, we'd have to I think convert it into a list and randomize it before batching. If we want to apply sorting, the same is true. I am told this is a relatively unimportant change, but it doesn't seem super complicated to implement if there is demand or this would reduce toil.

Wed, Jun 26, 4:05 PM · Operations, SRE-tools

Tue, Jun 25

crusnov added a comment to T164587: cumin could use randomization/splay options.

@BBlack Thanks for opening this feature request, because right now it's totally implementation dependent and actually I realized this is neither clear nor explained in the docs / readme.
The TL;DR is that right now it depends if batches (-b) are used or not.

  • With batches: the order is somehow randomized due to access to a python dictionary (see the Python2 implementation note), see the table at the bottom.
  • Without batches: the selection is passed as is to ClusterShell and the execution is pretty much ordered. The pretty much is due to the fact that ClusterShell in turn uses the fanout limit (for the max child to fork at any given time) that right now is left at it's default value of 64, and when going over that it might alter a bit the order. Over ~100 hosts I've seen the first 2 in the order being actually picked up at the end, while all the others were executed in order.

I'm leaning to force the randomness on all cases and add a --ordered (or similar) option to force the execution in order (although I need to check how to do that in the case without batches).
Regarding the NNNN specific implementation, given the generic nature of Cumin, I'd rather not add it into the tool itself but maybe consider the possibility to allow to specify custom filters where we could have a custom implementation for the sorted and shuffle algorithms.
Thoughts?

Tue, Jun 25, 10:24 PM · Operations, SRE-tools
crusnov closed T216469: Netbox: cable termination names report as Resolved.
Tue, Jun 25, 4:04 PM · Patch-For-Review, netbox, SRE-tools, Operations
crusnov committed rOSNB60c58bdc64d1: Add a passthrough configuration system (authored by crusnov).
Add a passthrough configuration system
Tue, Jun 25, 3:28 PM

Mon, Jun 24

crusnov committed rOSNB975b7b33ad3f: Add a passthrough configuration system (authored by crusnov).
Add a passthrough configuration system
Mon, Jun 24, 11:41 PM
crusnov committed rOSNB3fe0ea706fc1: Add a passthrough configuration system (authored by crusnov).
Add a passthrough configuration system
Mon, Jun 24, 11:41 PM
crusnov committed rCUMINae6e9681a2af: backends: add Netbox backend (authored by crusnov).
backends: add Netbox backend
Mon, Jun 24, 6:10 PM
crusnov committed rCUMINbdbccba0f729: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Mon, Jun 24, 6:07 PM
crusnov committed rCUMINb4e30034ea67: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Mon, Jun 24, 4:17 PM
crusnov added a comment to T205900: Cumin: add backend for Netbox.

Just closing the loop here, the backend is up for review, but there is apparently a pylint bug preventing CI from passing (or there was last week).

Mon, Jun 24, 3:48 PM · Patch-For-Review, netbox, Operations, SRE-tools
crusnov added a comment to T226331: Upgrade Netbox to 2.6.1.

Roger, it might be best to wait on the upgrade until the split to Ganeti is done (maybe early next week) as using Redis was part of the spec for that.

Mon, Jun 24, 3:21 PM · netbox

Jun 18 2019

crusnov created P8628 tables to dump from Netbox.
Jun 18 2019, 6:10 PM

Jun 17 2019

crusnov moved T205900: Cumin: add backend for Netbox from Up next to In Code Review on the SRE-tools board.
Jun 17 2019, 11:13 PM · Patch-For-Review, netbox, Operations, SRE-tools
crusnov moved T217072: Spicerack module for Netbox from Pending to Complete on the User-crusnov board.
Jun 17 2019, 11:13 PM · netbox, Patch-For-Review, User-crusnov, SRE-tools
crusnov moved T224946: Netbox Alert Cleanups from Backlog to In Progress on the User-crusnov board.
Jun 17 2019, 11:13 PM · User-crusnov, netbox, SRE-tools

Jun 12 2019

crusnov committed rCUMIN891554eae47a: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Jun 12 2019, 8:46 AM

Jun 11 2019

crusnov committed rCUMIN7f1369cdef37: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Jun 11 2019, 1:28 PM
crusnov committed rCUMINe17b61990855: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Jun 11 2019, 10:35 AM

Jun 7 2019

crusnov added a comment to T209182: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack).

okay so this works, mostly, in labs when manually configured to operate against the deployment-prep Swift cluster. Netbox lets me upload images and shows them associated with the object in question - except that viewing fails because they are served from a URL in the swift cluster that is unavailable. We'll look at this part more next week no doubt.

Jun 7 2019, 4:02 PM · netbox, Operations
crusnov renamed T209182: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack) from netbox won't allow me to upload photos of the rack to Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack).
Jun 7 2019, 2:46 PM · netbox, Operations

Jun 6 2019

crusnov committed rCUMIN15e8d9ecfc6c: Add Cumin backend for accessing Netbox (authored by crusnov).
Add Cumin backend for accessing Netbox
Jun 6 2019, 7:16 PM

Jun 5 2019

crusnov created T225166: Gerrit crashed due to out of Heap.
Jun 5 2019, 10:22 PM · Gerrit

Jun 3 2019

crusnov added a comment to T221507: Netbox report to validate network equipment data.

I agree that from the perspective of more closely modelling the devices between the various tools that the domain name for the VC name thing is necessary. I'm not completely clear on how that would make the matching better? Currently the by-serial matching seems to be working correctly, the complexities are mostly in lining up vendor and model information at this point, unless I'm mistaken - and this appears to be approachable either by matching things more loosely or creating a map between what's in LibreNMS and what's in Netbox. Separately, there are only a few inventory items which don't appear to line up, but I believe it's because they are builtin so they are left out of the librenms query.

Jun 3 2019, 11:14 PM · netbox, User-crusnov, SRE-tools, Operations, netops
crusnov updated the task description for T224946: Netbox Alert Cleanups.
Jun 3 2019, 10:52 PM · User-crusnov, netbox, SRE-tools
crusnov triaged T224946: Netbox Alert Cleanups as Normal priority.
Jun 3 2019, 10:45 PM · User-crusnov, netbox, SRE-tools
crusnov created T224946: Netbox Alert Cleanups.
Jun 3 2019, 10:45 PM · User-crusnov, netbox, SRE-tools
crusnov added a comment to T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports.

Alerts are alerting and in production.

Jun 3 2019, 10:36 PM · netbox, SRE-tools
crusnov closed T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports as Resolved.
Jun 3 2019, 10:36 PM · netbox, SRE-tools
crusnov updated the task description for T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports.
Jun 3 2019, 10:35 PM · netbox, SRE-tools

May 29 2019

crusnov moved T223450: Triage and resolve all outstanding Netbox report errors from Backlog to In Progress on the SRE-tools board.
May 29 2019, 5:58 PM · ops-codfw, ops-eqiad, Operations, SRE-tools, netbox, DC-Ops
crusnov moved T216469: Netbox: cable termination names report from Up next to In Code Review on the SRE-tools board.
May 29 2019, 5:57 PM · Patch-For-Review, netbox, SRE-tools, Operations
crusnov added a comment to T216469: Netbox: cable termination names report.

After a discussion with Faidon, I think the general consensus is that DRAC (and ILO) should be an acceptable termination name for managament interfaces, in addition to, going forward, the normal default being mgmt\d? (enumerated in the case of tehre being multiple interfaces).

May 29 2019, 5:54 PM · Patch-For-Review, netbox, SRE-tools, Operations
crusnov added a comment to T216469: Netbox: cable termination names report.

Sample output:

May 29 2019, 3:56 PM · Patch-For-Review, netbox, SRE-tools, Operations

May 28 2019

crusnov added a comment to T221507: Netbox report to validate network equipment data.

Hello here is the sample output. There are several inconsistencies that I can see the fix for that I'd already attempted to mitigate (but not successfully apparently) such as devices like Netbox devtype=Juniper EX4600-40F, LibreNMS devtype=Juniper Networks, Inc. ex4600-40f Ethernet Switch, kernel JUNOS 14.1X53-D45.3, Build date: 2017-07-28 01:39:39 UTC Copyright (c) 1996-2017 Juniper Networks, Inc. or Juniper EX4600 where the information is there it's just not lined up the same. Other things seem less obvious, like duplicated serial numbers and similar.

May 28 2019, 4:45 PM · netbox, User-crusnov, SRE-tools, Operations, netops

May 22 2019

crusnov added a comment to T223292: Netbox: generate CSV backups.

Just to follow up on this. I did spend some time trying to figure out how to initiate a template-based export from hitting a URL. It seems as though there's no API-way, and hitting the URL endpoint doesn't work with a token authentication as far as I can tell.

May 22 2019, 11:01 PM · Patch-For-Review, netbox
crusnov added a comment to T222922: wmf7622 wont powercycle (cannot be allocated from spares).

I'm definitely in favor or allowing a failed state to basically come from any other state.

May 22 2019, 10:12 PM · Operations, ops-eqiad

May 21 2019

Krenair awarded T224057: Request increased quota for Automation Framework Cloud VPS project a Like token.
May 21 2019, 7:20 PM · Cloud-VPS (Quota-requests)
crusnov created T224057: Request increased quota for Automation Framework Cloud VPS project.
May 21 2019, 6:15 PM · Cloud-VPS (Quota-requests)
crusnov closed T220422: Netbox Reports: General Cleanup and Improvement as Resolved.

Merged the change and deployed which uses admin_state instead.

May 21 2019, 4:06 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, SRE-tools
crusnov updated the task description for T220422: Netbox Reports: General Cleanup and Improvement.
May 21 2019, 4:05 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, SRE-tools

May 14 2019

crusnov added a comment to T221507: Netbox report to validate network equipment data.

It was pointed out to me that the vendor name in entPhysical is there, so we could hypothetically check that (for inventory items only) - the devices table remains complex.

May 14 2019, 10:24 PM · netbox, User-crusnov, SRE-tools, Operations, netops
crusnov added a comment to T221507: Netbox report to validate network equipment data.

Change 510256 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/software/netbox-reports@master] Add LibreNMS parity check report.
https://gerrit.wikimedia.org/r/510256

May 14 2019, 10:15 PM · netbox, User-crusnov, SRE-tools, Operations, netops

May 13 2019

crusnov added a comment to T221507: Netbox report to validate network equipment data.

After digging and discussing I believe the way forward since the mapping is slightly ... weird between LibreNMS and Netbox:

May 13 2019, 10:18 PM · netbox, User-crusnov, SRE-tools, Operations, netops
crusnov added a comment to T222922: wmf7622 wont powercycle (cannot be allocated from spares).

Hello, process question about this. The current flowchart for states doesn't allow Spare->Failed to happen, so there are some implicit assumptions inside of f or example the PuppetDB netbox report about that (Failed state is expected to be in Puppet since it implicitly comes from a production state). Is it the preference that boxes like this go through a Failed state (and thus never appear in Puppet? Thanks.

May 13 2019, 6:10 PM · Operations, ops-eqiad
crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

Just a note, admin_down does not seem to indicate anything particular about the machines that is useful to denote in Netbox as far as I can tell? It seems to reflect the *desired* state. To clarify is there any situation where it would not match the op_state within a short period of time? AFAICT it is used to tell ganeti to down or up the machine but I may be incorrect here. I have implemented mirroring the op_state but if we truly do need an extra field for admin_state that'd be useful to know.

You are correct that admin_state is the desired state, while oper_state is the operating one; basically, if your desired state is "this VM should be up" and for some reason (e.g. the host had an unscheduled reboot, or QEMU crashed etc.) the VM is down, then admin != oper, and a cronjob that runs the ganeti "watcher" will execute and fix things up (= start the VM). Until that happens, gnt-instance list will list the VM with a status of "ERROR_down", meaning "it's down, but it shouldn't be".
For the purposes of Netbox, I think what we need is the admin state, not the operating one, i.e. what we've configured Ganeti to do, rather than what has actually happened due to an error. The equivalent for a physical host is that we expect a Status: Offline host to be powered down, and a Status: Active host to be powered up, but we don't really track whether we've shut down or powercycled a host manually.
Does that make sense and do you agree?

May 13 2019, 1:35 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, SRE-tools

May 10 2019

crusnov added a comment to T222931: Netbox Reports Ideas and Requests.

An idea that came up in discussing DNS automation with @ayounsi is to verify interface names match, and/or automate updating interface names from PuppetDB into Netbox.

May 10 2019, 12:34 AM · netbox, User-crusnov, SRE-tools
crusnov triaged T222931: Netbox Reports Ideas and Requests as Normal priority.
May 10 2019, 12:33 AM · netbox, User-crusnov, SRE-tools
crusnov moved T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports from In Code Review to Pending release/deployment on the SRE-tools board.
May 10 2019, 12:32 AM · netbox, SRE-tools
crusnov moved T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports from Backlog to In Code Review on the SRE-tools board.
May 10 2019, 12:32 AM · netbox, SRE-tools

May 8 2019

crusnov renamed T222837: Discussion about synchronizing Ganeti VM network interfaces to Netbox from Discussion about synchronizing Ganeti devices to Netbox to Discussion about synchronizing Ganeti VM network interfaces to Netbox.
May 8 2019, 10:06 PM · SRE-tools
crusnov triaged T222837: Discussion about synchronizing Ganeti VM network interfaces to Netbox as Normal priority.
May 8 2019, 7:04 PM · SRE-tools

May 7 2019

crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

Just a note, admin_down does not seem to indicate anything particular about the machines that is useful to denote in Netbox as far as I can tell? It seems to reflect the *desired* state. To clarify is there any situation where it would not match the op_state within a short period of time? AFAICT it is used to tell ganeti to down or up the machine but I may be incorrect here. I have implemented mirroring the op_state but if we truly do need an extra field for admin_state that'd be useful to know.

May 7 2019, 6:20 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, SRE-tools
crusnov moved T220422: Netbox Reports: General Cleanup and Improvement from Pending to Complete on the User-crusnov board.
May 7 2019, 6:17 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, SRE-tools
crusnov updated the task description for T220422: Netbox Reports: General Cleanup and Improvement.
May 7 2019, 5:49 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, SRE-tools
crusnov added a comment to T212697: uwsgi's logsocket_plugin.so causes segfaults during log rotation.

The patch seems sane and simple. I concur with this plan fwiw.

May 7 2019, 4:12 PM · Patch-For-Review, Operations
crusnov added a comment to T184086: Add prometheus exporter to Gerrit.

@crusnov we could use your help, yup. We need to create a prometheusBearerToken [plugin.javamelody.prometheusBearerToken] https://gerrit.googlesource.com/plugins/javamelody/+/refs/heads/stable-2.15/src/main/resources/Documentation/config.md .Which then get prometheus to query gerrit.wikimedia.org/r/monitoring?format=prometheus using the token.

May 7 2019, 2:44 PM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, Patch-For-Review, Gerrit, Operations

May 6 2019

crusnov added a comment to T184086: Add prometheus exporter to Gerrit.

Just to +1 the idea of shipping javamelody to prometheus. Let me know if I can help at all.

May 6 2019, 9:02 PM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, Patch-For-Review, Gerrit, Operations
crusnov created T222629: Netbox: Set up deploy groups for scap to ensure primary is deployed before secondary.
May 6 2019, 3:48 PM · User-crusnov, netbox, SRE-tools

May 5 2019

crusnov added a comment to T212697: uwsgi's logsocket_plugin.so causes segfaults during log rotation.

I agree with this approach, and it's what I was pursuing some months ago. I have merged some time back support for uwsgi::app to set its LimitCORE for this very purpose[1]. Putting this into production should be trivial.

May 5 2019, 7:13 PM · Patch-For-Review, Operations

May 3 2019

crusnov moved T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports from Pending to Complete on the User-crusnov board.
May 3 2019, 9:32 PM · netbox, SRE-tools

May 2 2019

crusnov closed T222351: Netbox: upgrade to v2.5.12 as Resolved.
May 2 2019, 9:02 PM · Patch-For-Review, User-crusnov, netbox
crusnov moved T222351: Netbox: upgrade to v2.5.12 from Pending to Complete on the User-crusnov board.
May 2 2019, 9:01 PM · Patch-For-Review, User-crusnov, netbox
crusnov moved T222351: Netbox: upgrade to v2.5.12 from Backlog to Pending on the User-crusnov board.
May 2 2019, 2:50 PM · Patch-For-Review, User-crusnov, netbox
crusnov added a project to T222351: Netbox: upgrade to v2.5.12: User-crusnov.
May 2 2019, 2:38 PM · Patch-For-Review, User-crusnov, netbox

May 1 2019

crusnov moved T218440: Cumin: allow running as non-root from Backlog to In Code Review on the SRE-tools board.
May 1 2019, 8:16 PM · Patch-For-Review, SRE-tools
crusnov moved T219908: Build an API for generating boot options for iPXE from Netbox et al. based on Serial Number from Up next to Backlog on the SRE-tools board.
May 1 2019, 6:48 PM · User-crusnov, SRE-tools
crusnov moved T215378: Figure out how to make Netbox Reports actionable / alertable from In Progress to In Code Review on the SRE-tools board.
May 1 2019, 6:48 PM · netbox, Patch-For-Review, SRE-tools
crusnov moved T220422: Netbox Reports: General Cleanup and Improvement from In Progress to In Code Review on the SRE-tools board.
May 1 2019, 6:48 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, SRE-tools
crusnov moved T221113: Netbox Reports: Create an icinga check for alerting on a set of Netbox reports from Backlog to Pending on the User-crusnov board.
May 1 2019, 6:48 PM · netbox, SRE-tools
crusnov moved T218956: Should we deploy sshguard on external IP addresses? from Ready to Backlog on the User-crusnov board.
May 1 2019, 6:48 PM · User-crusnov, Security-Team
crusnov moved T203963: Convert makevm to spicerack cookbook from Pending to Complete on the User-crusnov board.
May 1 2019, 6:47 PM · serviceops-radar, Patch-For-Review, User-crusnov, SRE-tools, User-jijiki, User-Joe, Operations
crusnov moved T218709: Add Spicerack module for Ganeti from Pending to Complete on the User-crusnov board.
May 1 2019, 6:47 PM · SRE-tools, User-crusnov
crusnov created T222294: refinery-sqoop-mediawiki-production on an-coord1001 needed restart.
May 1 2019, 6:21 PM · Analytics-Kanban, Analytics-Cluster

Apr 29 2019

crusnov added a comment to T221529: Frequent puppet failures .

Have not had time to look at this in depth yet however i did just notice an issue while applying a refactor change[1]
while applying the change set i got the following eror on a few hosts (mw1286 and cp2006) as examples.

Apr 29 15:10:31 mw1286 puppet-agent[17486]: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Duplicate declaration: Class[Standard] is already declared; cannot re declare at /etc/puppet/modules/profile/manifests/standard.pp:5 at /etc/puppet/modules/profile/manifests/standard.pp:5:5 on node mw1286.eqiad.wmnet

The error happened as puppet-merge was rolling out changes. I have not looked at how puppet-merge works but this looks like it is caused by a none atomic update. i.e. if puppet-merge is updating the code repository as at the same time a puppet compile is being preformed then one could have a situation where the catalog is compiled using some files from the production branch pre-merge and some post merges
[1]https://gerrit.wikimedia.org/r/c/operations/puppet/+/506990

Apr 29 2019, 5:03 PM · Patch-For-Review, Puppet, puppet-compiler, Operations

Apr 26 2019

crusnov moved T221507: Netbox report to validate network equipment data from Backlog to In Progress on the User-crusnov board.
Apr 26 2019, 6:11 PM · netbox, User-crusnov, SRE-tools, Operations, netops
crusnov moved T218956: Should we deploy sshguard on external IP addresses? from In Progress to Ready on the User-crusnov board.
Apr 26 2019, 6:11 PM · User-crusnov, Security-Team
crusnov moved T221507: Netbox report to validate network equipment data from Backlog to In Progress on the SRE-tools board.
Apr 26 2019, 6:11 PM · netbox, User-crusnov, SRE-tools, Operations, netops
crusnov claimed T221507: Netbox report to validate network equipment data.
Apr 26 2019, 6:11 PM · netbox, User-crusnov, SRE-tools, Operations, netops
crusnov added a comment to T220422: Netbox Reports: General Cleanup and Improvement.

Coherence report quality pass is deployed.

Apr 26 2019, 3:49 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, SRE-tools
crusnov triaged T220422: Netbox Reports: General Cleanup and Improvement as Normal priority.
Apr 26 2019, 3:48 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, SRE-tools
crusnov updated the task description for T220422: Netbox Reports: General Cleanup and Improvement.
Apr 26 2019, 3:48 PM · netbox, Patch-For-Review, User-crusnov, DC-Ops, SRE-tools

Apr 24 2019

crusnov updated the task description for T213114: Q3 2018/19 Goal: TEC6: Build automated workflows for server provisioning (Tracking Task).
Apr 24 2019, 8:07 PM · User-crusnov, SRE-tools
crusnov added a comment to T221529: Frequent puppet failures .

I forget where but in digging about this it seems that Puppet will return 503 if it is too busy, there are numerous reports of this (to be clear I don't know if it's puppet itself or an intermediary that returns 503, but the result from the client's perspective is this).

Apr 24 2019, 3:38 PM · Patch-For-Review, Puppet, puppet-compiler, Operations

Apr 23 2019

crusnov added a comment to T217074: Reduce Manual Steps in Provisioning by 4.

getting the netbox module in the cookbooks will save steps on decoms and probably reimages and installs (which share many procedures); the caveat is that in decoms it will have to prompt as to the state to transition into (decom or spare).

Apr 23 2019, 10:49 PM · User-crusnov, SRE-tools
crusnov added a comment to T217074: Reduce Manual Steps in Provisioning by 4.

After several conversations with robh, I think we can start looking at the low hanging fruit. For the record all of these processes are mediated by a dynamic, ever changing checklist.

Apr 23 2019, 10:48 PM · User-crusnov, SRE-tools
crusnov moved T217074: Reduce Manual Steps in Provisioning by 4 from Ready to In Progress on the User-crusnov board.
Apr 23 2019, 10:21 PM · User-crusnov, SRE-tools
crusnov moved T217072: Spicerack module for Netbox from In Progress to Pending on the User-crusnov board.
Apr 23 2019, 10:21 PM · netbox, Patch-For-Review, User-crusnov, SRE-tools