Page MenuHomePhabricator

crusnov (Cas Rusnov)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 15 2018, 5:56 PM (52 w, 17 h)
Availability
Available
LDAP User
CRusnov
MediaWiki User
Unknown

Recent Activity

Tue, Oct 8

crusnov created T234997: Make Netbox Active/Active.
Tue, Oct 8, 8:11 PM · Patch-For-Review, Traffic, Operations

Mon, Oct 7

crusnov added a comment to T233183: Automate generation of Management DNS records from Netbox.

After discussing this a bit and thinking about it quite a lot, I'm highly in favor of a machine git repo for the generated side. This has a nice side-benefit of us being able to easy expose it to the network via https on the netbox servers (and, thus, both publically and to the dns servers).

Mon, Oct 7, 8:12 PM · Traffic, Operations, Patch-For-Review, User-crusnov, Goal, SRE-tools

Thu, Oct 3

crusnov added a comment to T233183: Automate generation of Management DNS records from Netbox.

Thanks for the extensive feedback & validation suggestions! I'll see what i can come up with.

Thu, Oct 3, 3:15 PM · Traffic, Operations, Patch-For-Review, User-crusnov, Goal, SRE-tools

Wed, Oct 2

crusnov claimed T229397: Puppet: get row/rack info from Netbox.
Wed, Oct 2, 5:32 PM · Patch-For-Review, Puppet, Operations
crusnov added a comment to T234452: Puppet breakage in automation-feedback VMs.

THanks for the heads up, we'll loop around to fix these up.

Wed, Oct 2, 4:16 PM · Operations
crusnov added a comment to T230449: Automate selection of IP address for interface.

This script has been released and appears to work correctly!

Has this been tested in the cloud Netbox test instance first?
Where can I see some example of generated IPs?

Wed, Oct 2, 3:49 PM · User-crusnov, Goal, SRE-tools
crusnov updated the task description for T228387: Bare metal cloud: management interfaces.
Wed, Oct 2, 4:04 AM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov closed T230449: Automate selection of IP address for interface, a subtask of T228387: Bare metal cloud: management interfaces, as Resolved.
Wed, Oct 2, 4:04 AM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov closed T230449: Automate selection of IP address for interface as Resolved.

This script has been released and appears to work correctly!

Wed, Oct 2, 4:04 AM · User-crusnov, Goal, SRE-tools

Tue, Sep 24

crusnov added a subtask for T232767: Netbox API Occasionally 500s and Netbox2001 dumpcsv fails: T233728: Netbox: netbox_dump_run service failed.
Tue, Sep 24, 11:49 PM · SRE-tools
crusnov added a parent task for T233728: Netbox: netbox_dump_run service failed: T232767: Netbox API Occasionally 500s and Netbox2001 dumpcsv fails.
Tue, Sep 24, 11:49 PM · netbox
crusnov added a comment to T233728: Netbox: netbox_dump_run service failed.

This is the "switching to http" problem discussed in T232767 I believe. I haven't taken the time to more fully debug it but I suspect it's something in Netbox's configuration or a bug in django rest framework possibly.

Tue, Sep 24, 3:07 PM · netbox
crusnov added a comment to T232767: Netbox API Occasionally 500s and Netbox2001 dumpcsv fails.

In debugging alerts on Netbox, I noticed that, unrelated to the CSV dumper, the ganeti sync sometimes returs a 500 error. This is caused by this error:

Tue, Sep 24, 3:42 AM · SRE-tools

Mon, Sep 23

crusnov added a comment to T218956: Should we deploy sshguard on external IP addresses?.

i think the general consensus i've heard is that external load balancers don't or can't have firewall rules, but perhaps we should consider it on a case by case basis for other external services.

Mon, Sep 23, 11:28 PM · User-crusnov

Wed, Sep 18

crusnov added a comment to T232767: Netbox API Occasionally 500s and Netbox2001 dumpcsv fails.

I spent some time debugging the problem with csv dumps from netbox2001. The basic gist is that when dumping a larger table, its pagination routine fails because the second page it tries to retrieve from the API is returning an http instead of an https url as the "next page" URL (and when netbox2001 tries to access an http url, it times out eventually because :80 is blocked). I suspect a bug in Netbox itself. I traced the execution in pdb, and it showed that this value is coming from the remote end.

Wed, Sep 18, 4:31 AM · SRE-tools
crusnov added a comment to T232767: Netbox API Occasionally 500s and Netbox2001 dumpcsv fails.

I spent some time debugging the problem with csv dumps from netbox2001. The basic gist is that when dumping a larger table, its pagination routine fails because the second page it tries to retrieve from the API is returning an http instead of an https url as the "next page" URL (and when netbox2001 tries to access an http url, it times out eventually because :80 is blocked). I suspect a bug in Netbox itself. I traced the execution in pdb, and it showed that this value is coming from the remote end.

Wed, Sep 18, 4:30 AM · SRE-tools
crusnov added a comment to T232767: Netbox API Occasionally 500s and Netbox2001 dumpcsv fails.

After increasing the CPU count to 4 on both fornt-ends the number of 500 errors that occur are much lower.

Wed, Sep 18, 4:28 AM · SRE-tools
crusnov renamed T232767: Netbox API Occasionally 500s and Netbox2001 dumpcsv fails from Netbox API Occasionally 500s to Netbox API Occasionally 500s and Netbox2001 dumpcsv fails.
Wed, Sep 18, 4:28 AM · SRE-tools
crusnov closed T224517: netbox / netmon1002: netbox report related service units failed as Resolved.

netmon1002 is cleaned up now and should not be alerting on these basis anymore.

Wed, Sep 18, 4:26 AM · observability, netbox, Operations
crusnov created T233183: Automate generation of Management DNS records from Netbox.
Wed, Sep 18, 3:49 AM · Traffic, Operations, Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov closed T218709: Add Spicerack module for Ganeti as Resolved.
Wed, Sep 18, 3:47 AM · SRE-tools, User-crusnov

Tue, Sep 17

crusnov added a comment to T223291: Netbox: move it to dedicated Ganeti VMs.

This has been completed modulo some growing pains.

Tue, Sep 17, 5:42 PM · netbox
crusnov closed T223291: Netbox: move it to dedicated Ganeti VMs as Resolved.
Tue, Sep 17, 5:42 PM · netbox

Sep 12 2019

crusnov created T232767: Netbox API Occasionally 500s and Netbox2001 dumpcsv fails.
Sep 12 2019, 6:16 PM · SRE-tools

Sep 11 2019

crusnov closed T231502: More special cases for Netbox LibreNMS report as Resolved.
Sep 11 2019, 3:23 AM · netbox
crusnov added a comment to T230449: Automate selection of IP address for interface.

Shifting gears on this project to use the custom scripts interface added in 2.6.3. I shall make an 'add management interface' script that automatically assigns an IP address.

Sep 11 2019, 3:23 AM · User-crusnov, Goal, SRE-tools
crusnov added a comment to T223291: Netbox: move it to dedicated Ganeti VMs.

FWIW netbox.wikimedia.org points at netbox1001.wikimedia.og now. I am working on fixing some minor remaining issues with reports and making backups be correct (database is currently backed-up correctly, but netbox proper needs dumps backed up).

Sep 11 2019, 3:21 AM · netbox

Sep 9 2019

crusnov added a comment to T223291: Netbox: move it to dedicated Ganeti VMs.

Also adding @akosiaris for input as he initially wrote these classes.

The initial intention was for $includes to be an array indeed. So I favor b) as well.

@crusnov : Can you please fix this today, either by (partly) reverting https://gerrit.wikimedia.org/r/514395 or by adapting the type hints to use an array? This has prevented puppet runs on puppetdb2001 for ~ three days now and is blocking the setup of the new Buster-based puppetdb instances.

Ah my mistake, apologies. As we discussed on IRC, i shall untypehint the offending part and open a ticket to later address it.

Sep 9 2019, 4:35 PM · netbox
crusnov created T232358: postgres::slave module type for includes parameter in inconsistent..
Sep 9 2019, 4:19 PM · Puppet, Operations, netbox
crusnov added a comment to T223291: Netbox: move it to dedicated Ganeti VMs.

Also adding @akosiaris for input as he initially wrote these classes.

The initial intention was for $includes to be an array indeed. So I favor b) as well.

@crusnov : Can you please fix this today, either by (partly) reverting https://gerrit.wikimedia.org/r/514395 or by adapting the type hints to use an array? This has prevented puppet runs on puppetdb2001 for ~ three days now and is blocking the setup of the new Buster-based puppetdb instances.

Sep 9 2019, 4:14 PM · netbox

Sep 4 2019

crusnov committed rOBPY79ab2f724da4: Fix patches (authored by crusnov).
Fix patches
Sep 4 2019, 6:22 PM
crusnov committed rOBPY98cd21f0b7fd: fix setup (authored by crusnov).
fix setup
Sep 4 2019, 6:22 PM
crusnov committed rOBPY72680b58ffa2: add final newline (authored by crusnov).
add final newline
Sep 4 2019, 6:22 PM
crusnov committed rOBPY63e81a3c13c5: Update changelog (authored by crusnov).
Update changelog
Sep 4 2019, 6:22 PM
crusnov committed rOBPYb86c1a8d0d0a: Patch requirements in debian branch (authored by crusnov).
Patch requirements in debian branch
Sep 4 2019, 6:22 PM

Sep 3 2019

crusnov added a comment to T231068: Spicerack: improve support for Ganeti VMs.

Okay I've implemented changes to the netbox and ganeti modules as linked above which should allow all of the operations requested. I have not implemented writing status to Ganeti VMs since this information is updated automatically but it should be relatively straight forward to implement if desired.

Sep 3 2019, 4:47 AM · SRE-tools
crusnov added a comment to T223291: Netbox: move it to dedicated Ganeti VMs.

@Volans Ah hah thanks for this. I was given to believe the 'default' would include the ferm config and did'nt even think of looking.

Sep 3 2019, 2:38 AM · netbox

Aug 30 2019

crusnov updated the task description for T224946: Netbox Alert Cleanups.
Aug 30 2019, 3:15 PM · observability, User-crusnov, netbox, SRE-tools
crusnov updated the task description for T224946: Netbox Alert Cleanups.
Aug 30 2019, 3:10 PM · observability, User-crusnov, netbox, SRE-tools

Aug 29 2019

crusnov added a comment to T231068: Spicerack: improve support for Ganeti VMs.

Finding which cluster, or if the instance by fqdn is a Ganeti instance, could be done as easily as trrying to look it up in every configured cluster, and checking if there's information. We could provide utility functions to perform those actions trivially.

Aug 29 2019, 4:07 AM · SRE-tools
crusnov created T231512: Netbox: Add CSV dump rotation.
Aug 29 2019, 3:58 AM · SRE-tools

Aug 27 2019

crusnov added a comment to T230449: Automate selection of IP address for interface.

Work is progressing, I've taken the step of setting up vagrant so i can comfortably hack on Netbox without breaking anybody.

Aug 27 2019, 4:18 PM · User-crusnov, Goal, SRE-tools
crusnov added a comment to T209182: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack).

I have confirmed content-type is set correctly, however Swift sets a content-disposition to attachment which causes browser to download. Incoming patch to sttrip this at the apache level.

Aug 27 2019, 4:46 AM · Patch-For-Review, netbox, Operations

Aug 22 2019

crusnov closed T230964: Netbox LibreNMS report fails as Resolved.

Fix deployed with https://gerrit.wikimedia.org/r/531763

Aug 22 2019, 11:09 PM · netbox, Operations
crusnov claimed T230964: Netbox LibreNMS report fails.
Aug 22 2019, 3:30 PM · netbox, Operations

Aug 19 2019

crusnov created T230725: Make contact group for Netbox report alerts.
Aug 19 2019, 3:01 PM · observability, Operations
crusnov closed T221507: Netbox report to validate network equipment data as Resolved.
Aug 19 2019, 2:57 PM · netbox, User-crusnov, SRE-tools, Operations, netops

Aug 14 2019

crusnov updated the task description for T217072: Spicerack module for Netbox.
Aug 14 2019, 4:47 PM · netbox, Patch-For-Review, User-crusnov, SRE-tools
crusnov added a comment to T217072: Spicerack module for Netbox.

At least the first one i

Aug 14 2019, 4:47 PM · netbox, Patch-For-Review, User-crusnov, SRE-tools

Aug 13 2019

crusnov added a comment to T230449: Automate selection of IP address for interface.

There is an undocumented API which creates a new IP address for a given prefix:

Aug 13 2019, 11:18 PM · User-crusnov, Goal, SRE-tools
crusnov committed rLPRIb3c1c8624751: netbox: Add fake secrets for reorg (authored by crusnov).
netbox: Add fake secrets for reorg
Aug 13 2019, 9:59 PM
crusnov added a comment to T230449: Automate selection of IP address for interface.

I have spent time looking at adding the API required for this functionality. I believe I have figured out how to do it and will produce a patch shortly.

Aug 13 2019, 9:15 PM · User-crusnov, Goal, SRE-tools
crusnov created T230449: Automate selection of IP address for interface.
Aug 13 2019, 9:15 PM · User-crusnov, Goal, SRE-tools
crusnov updated the task description for T228387: Bare metal cloud: management interfaces.
Aug 13 2019, 9:11 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov closed T223292: Netbox: generate CSV backups as Resolved.

this has been fully deployed now and tested. It is automated.

Aug 13 2019, 6:43 PM · netbox
crusnov closed T228670: Import management interfaces into Netbox from DNS, a subtask of T228387: Bare metal cloud: management interfaces, as Resolved.
Aug 13 2019, 6:43 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov closed T228670: Import management interfaces into Netbox from DNS as Resolved.
Aug 13 2019, 6:43 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov added a comment to T228670: Import management interfaces into Netbox from DNS.

FWIW there was no mgmt DNS information for some hosts:

Aug 13 2019, 6:43 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov updated the task description for T228670: Import management interfaces into Netbox from DNS.
Aug 13 2019, 6:37 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov updated the task description for T228670: Import management interfaces into Netbox from DNS.
Aug 13 2019, 6:37 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov added a comment to T228670: Import management interfaces into Netbox from DNS.

Script has completed running. Several edge cases worked out with Arzhel (frack, etc). MGMT interfaces should be largely correct now.

Aug 13 2019, 6:36 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov moved T228670: Import management interfaces into Netbox from DNS from In Progress to Pending on the User-crusnov board.
Aug 13 2019, 5:22 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov committed rOSNB78bed5d03506: switch swagger to nonpublic mode (authored by crusnov).
switch swagger to nonpublic mode
Aug 13 2019, 3:40 PM
crusnov committed rLPRIb1029da4af40: make netbox tokens available to hosts that need em (authored by crusnov).
make netbox tokens available to hosts that need em
Aug 13 2019, 3:17 PM

Aug 6 2019

crusnov closed T209182: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack) as Resolved.

Okay after some finagling, uploading (and downloading) images should work. A particularity of swift storage is that they download instead of viewing, but they work!

Aug 6 2019, 6:09 PM · Patch-For-Review, netbox, Operations
crusnov committed rLPRI6dcd0ece5ca3: netbox: add dummy swift url key (authored by crusnov).
netbox: add dummy swift url key
Aug 6 2019, 3:33 AM

Aug 1 2019

crusnov moved T222629: Netbox: Set up deploy groups for scap to ensure primary is deployed before secondary from Backlog to Complete on the User-crusnov board.
Aug 1 2019, 10:25 PM · User-crusnov, netbox, SRE-tools

Jul 31 2019

crusnov added a comment to T226331: Upgrade Netbox to 2.6.1.

What was the issue?

  • There were some root-owned files in the tree because of testing that happened
  • There were missing deps for Swift in the build package
Jul 31 2019, 3:21 PM · Patch-For-Review, netbox
crusnov added a comment to T226331: Upgrade Netbox to 2.6.1.

What was the issue?

Jul 31 2019, 3:20 PM · Patch-For-Review, netbox

Jul 30 2019

crusnov added a comment to T209182: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack).

Netbox has been deployed with the change that should enable this. We're testing.

Jul 30 2019, 10:52 PM · Patch-For-Review, netbox, Operations
crusnov closed T226331: Upgrade Netbox to 2.6.1 as Resolved.
Jul 30 2019, 10:52 PM · Patch-For-Review, netbox
crusnov added a comment to T226331: Upgrade Netbox to 2.6.1.

Obviously some finagling happened, but in the end the upgrade is good.

Jul 30 2019, 10:51 PM · Patch-For-Review, netbox
crusnov moved T228670: Import management interfaces into Netbox from DNS from Backlog to In Progress on the SRE-tools board.
Jul 30 2019, 10:51 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov moved T222837: Discussion about synchronizing Ganeti VM network interfaces to Netbox from In Progress to Pending release/deployment on the SRE-tools board.
Jul 30 2019, 10:50 PM · SRE-tools
crusnov moved T221507: Netbox report to validate network equipment data from In Progress to Pending release/deployment on the SRE-tools board.
Jul 30 2019, 10:50 PM · netbox, User-crusnov, SRE-tools, Operations, netops
crusnov moved T228670: Import management interfaces into Netbox from DNS from Backlog to In Progress on the User-crusnov board.
Jul 30 2019, 10:50 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov moved T218956: Should we deploy sshguard on external IP addresses? from Backlog to Complete on the User-crusnov board.
Jul 30 2019, 10:50 PM · User-crusnov
crusnov moved T224946: Netbox Alert Cleanups from In Progress to Pending on the User-crusnov board.
Jul 30 2019, 10:50 PM · observability, User-crusnov, netbox, SRE-tools
crusnov moved T221507: Netbox report to validate network equipment data from In Progress to Complete on the User-crusnov board.
Jul 30 2019, 10:50 PM · netbox, User-crusnov, SRE-tools, Operations, netops
crusnov committed rOSNB4b8dc43ebe70: Fix imports for settings mod. (authored by crusnov).
Fix imports for settings mod.
Jul 30 2019, 10:42 PM

Jul 29 2019

crusnov committed rLPRI9d0685255bf8: netbox: Add dummy redis passwords (authored by crusnov).
netbox: Add dummy redis passwords
Jul 29 2019, 10:50 PM

Jul 26 2019

crusnov triaged T228670: Import management interfaces into Netbox from DNS as Normal priority.
Jul 26 2019, 3:02 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov updated the task description for T228670: Import management interfaces into Netbox from DNS.
Jul 26 2019, 3:02 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools

Jul 25 2019

crusnov committed rOSNBb3c505ea3515: Merge branch 'master' of https://github.com/digitalocean/netbox (authored by crusnov).
Merge branch 'master' of https://github.com/digitalocean/netbox
Jul 25 2019, 6:28 PM

Jul 24 2019

crusnov added a comment to T223291: Netbox: move it to dedicated Ganeti VMs.

To update this ticket with current situation.

Jul 24 2019, 3:19 PM · netbox

Jul 22 2019

crusnov committed rOBPYc1081a054896: Add git build-dep. Add vcs links (authored by crusnov).
Add git build-dep. Add vcs links
Jul 22 2019, 5:44 PM
crusnov committed rOBPYe0ff52682030: Bump dh compat to 9 for stretch. (authored by crusnov).
Bump dh compat to 9 for stretch.
Jul 22 2019, 5:44 PM
crusnov committed rOBPYdd9ce69f107b: debian/rules: skip autotest (authored by crusnov).
debian/rules: skip autotest
Jul 22 2019, 5:44 PM
crusnov committed rOBPYd38c669f01ac: update gpb.conf (authored by crusnov).
update gpb.conf
Jul 22 2019, 5:44 PM
crusnov committed rOBPYe7b140748b03: update gpb.conf (authored by crusnov).
update gpb.conf
Jul 22 2019, 5:44 PM
crusnov committed rOBPYf9b2fca9b10b: initial debian stuff (authored by crusnov).
initial debian stuff
Jul 22 2019, 5:44 PM
crusnov created P8780 Repropo errors.
Jul 22 2019, 5:40 PM · SRE-tools
crusnov created T228670: Import management interfaces into Netbox from DNS.
Jul 22 2019, 2:50 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov moved T228387: Bare metal cloud: management interfaces from Backlog to In Progress on the User-crusnov board.
Jul 22 2019, 2:33 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools
crusnov added a project to T228387: Bare metal cloud: management interfaces: User-crusnov.
Jul 22 2019, 2:33 PM · Patch-For-Review, User-crusnov, Goal, SRE-tools

Jul 8 2019

MoritzMuehlenhoff awarded T203963: Convert makevm to spicerack cookbook a Like token.
Jul 8 2019, 9:56 AM · serviceops-radar, Patch-For-Review, User-crusnov, SRE-tools, User-jijiki, User-Joe, Operations

Jul 1 2019

crusnov added a comment to T203963: Convert makevm to spicerack cookbook.

Interesting, it sure does take a while for the disk to build, and the tool will wait.

Jul 1 2019, 6:08 PM · serviceops-radar, Patch-For-Review, User-crusnov, SRE-tools, User-jijiki, User-Joe, Operations
crusnov renamed T212783: cumin: Make ouput path sane and flexible (was: allow to suppress output and progress bars) from cumin: allow to suppress output and progress bars to cumin: Make ouput path sane and flexible (was: allow to suppress output and progress bars).
Jul 1 2019, 3:48 PM · SRE-tools

Jun 26 2019

crusnov added a comment to T164587: cumin could use randomization/splay options.

After looking into this a bit, the details of how this would be done are a bit involved; since internally cumin uses a NodeSet from clustershell, which acts like a set(), the order is 'unspecified' (semi-random). If we want it to be more random, we'd have to I think convert it into a list and randomize it before batching. If we want to apply sorting, the same is true. I am told this is a relatively unimportant change, but it doesn't seem super complicated to implement if there is demand or this would reduce toil.

Jun 26 2019, 4:05 PM · Operations, SRE-tools

Jun 25 2019

crusnov added a comment to T164587: cumin could use randomization/splay options.

@BBlack Thanks for opening this feature request, because right now it's totally implementation dependent and actually I realized this is neither clear nor explained in the docs / readme.
The TL;DR is that right now it depends if batches (-b) are used or not.

  • With batches: the order is somehow randomized due to access to a python dictionary (see the Python2 implementation note), see the table at the bottom.
  • Without batches: the selection is passed as is to ClusterShell and the execution is pretty much ordered. The pretty much is due to the fact that ClusterShell in turn uses the fanout limit (for the max child to fork at any given time) that right now is left at it's default value of 64, and when going over that it might alter a bit the order. Over ~100 hosts I've seen the first 2 in the order being actually picked up at the end, while all the others were executed in order.

I'm leaning to force the randomness on all cases and add a --ordered (or similar) option to force the execution in order (although I need to check how to do that in the case without batches).
Regarding the NNNN specific implementation, given the generic nature of Cumin, I'd rather not add it into the tool itself but maybe consider the possibility to allow to specify custom filters where we could have a custom implementation for the sorted and shuffle algorithms.
Thoughts?

Jun 25 2019, 10:24 PM · Operations, SRE-tools