08:28 chasemp: [16:25] persists through a restart on -02
08:28 yuvipanda: [16:25] chasemp: nah, it's not the packages - it's probably the puppet patch I made that took away the unit files from puppet and put them in the package
08:28 yuvipanda: [16:26] chasemp: hand hacking that out worked. I'll make a patch
08:28 chasemp: [16:26] ok
08:28 chasemp: [16:26] yeah I wondered, I was asking that with 'yuvipanda: were params set in the service unit files?'
08:28 chasemp: [16:26] thanks
08:28 yuvipanda: [16:27] chasemp: np. I am just going to remove that 127 line, since it's a shitty default
08:28 yuvipanda: [16:28] chasemp: thanks for the quick catch
08:28 chasemp: [16:28] yuvipanda: yup
08:28 chasemp: [16:28] yuvipanda: so this will get set by puppet from now on?
08:28 yuvipanda: [16:28] chasemp: nope, I'm just fixing the deb and rolling out a new deb
08:28 chasemp: [16:29] ah
08:28 chasemp: [16:29] fixing the deb to point to?
08:28 yuvipanda: [16:29] chasemp: since that default will never actually be true in 99% of setups
08:28 chasemp: [16:29] I guess I'm wondering if the deb will then be tools specific
08:28 yuvipanda: [16:29] (I mentioned this in the original packaging patch but forgot to see if it was fixed)
08:28 yuvipanda: [16:29] chasemp: nope, it's just reading from /etc/kubernetes/kubeconfig
08:28 yuvipanda: [16:29] chasemp: but the 127.0.0.1 overrode it
08:28 chasemp: [16:29] right, ok I understand now
08:28 yuvipanda: [16:29] chasemp: and /etc/kubernetes/kubeconfig path is set in /etc/defaults/kubernetes, which is set in puppet
08:28 chasemp: [16:30] mods
08:28 chasemp: [16:30] heh nods
08:28 yuvipanda: [16:30] I kicked off a build
08:28 chasemp: [16:30] yuvipanda: post new package let's spin up a new tool and restart a few etc and do a bit of poking?
08:28 yuvipanda: [16:30] chasemp: at least I found it myself instead of leaving a ticking time bomb in my wake :)
08:28 chasemp: [16:30] yes
08:28 yuvipanda: [16:30] chasemp: yeah, that's how I found this one too (I was restarting reasonator)
08:28 yuvipanda: [16:31] I usutally check that doing a webservice shell works & a restart works
08:28 yuvipanda: [16:35] and I've successfully zero'd out the disk of my old laptop
08:28 andrewbogott: [16:42] yuvipanda: if you haven't crossed the horizon yet… can you tell me about instance_info_dumper.pp ?
08:28 andrewbogott: [16:42] Is its output consumed by anything?
08:28 yuvipanda: [16:42] andrewbogott: nope, you can destroy it
08:28 yuvipanda: [16:42] well, it's useful in that the output can be used to find which instances have which roles
08:28 yuvipanda: [16:42] which I don't think we've anything else for
08:28 andrewbogott: [16:42] Ah, it includes roles? That does seem useful.
08:28 yuvipanda: [16:42] so ideally that code should be moved into watroles
08:28 andrewbogott: [16:42] ok, I'll nurse it along for now.
08:28 andrewbogott: [16:42] Thanks
08:28 yuvipanda: [16:43] andrewbogott: the current code isn't actually outputting the JSON file to anywhere useful
08:28 andrewbogott: [16:43] Yeah, I noticed :)
08:28 yuvipanda: [16:43] so it's one of those 'if we have a roles to instances mapping in a forest and nobody knows of it, do we really have a roles to instances mapping?' things
08:28 andrewbogott: [16:43] But now I know where it is, so it will be useful to me!
08:28 yuvipanda: [16:44] andrewbogott: :D
08:28 andrewbogott: [16:50] is out for now.
08:28 andrewbogott: [16:51] Catch you later, Yuvi!
08:28 yuvipanda: [16:51] andrewbogott: bye!
08:28 chasemp: [16:56] yuvipanda: what's the lead tiem on new packages?
08:28 chasemp: [17:04] madhuvishy: are you about?
08:28 chasemp: [17:11] I'm stepping away for a bit for some personal business, yuvipanda, madhuvishy is going to walk through things with you and hang out for a bit to verify new packages resolve
08:28 yuvipanda: [17:13] chasemp: have a good day and happy anniversary :)
08:28 yuvipanda: [17:13] chasemp: cool, got it.
08:28 yuvipanda: [17:14] madhuvishy: the debs are almost done, maybe 2 more mins?
08:28 madhuvishy: [17:15] yuvipanda: yup okay
08:28 yuvipanda: [17:22] madhuvishy: ok, build completed. am scping it to the aptly host now (tools-services-01)
08:28 Reedy: [17:30] yuvipanda: btw, no one would stop you hanging around in #mediawiki_security
08:28 Reedy: [17:30] But if you left just for some seperation, that's fine :)
08:28 yuvipanda: [17:30] Reedy: :D Yeah, am just leaving for some more separation for a bit :)
08:28 bd808: [17:31] we will have to stalk him on slack ;)
08:28 yuvipanda: [17:31] hehe
08:28 yuvipanda: [17:32] bd808: Reedy although, tbh, leaving _security was the first 'oh shit this really is happening' moment :(
08:28 Reedy: [17:33] On and up
08:28 bd808: [17:33] new things are fun. you'll be too busy to miss us
08:28 yuvipanda: [17:36] bd808: I merged your novaobserver.yaml in containers patch btw
08:28 bd808: [17:37] you wrote it :) but thanks
08:28 bd808: [17:37] it will let me clean up some stuff in openstack-browser
08:28 bd808: [17:37] and maybe somebody else wiil figure out a cool thing to build with it
08:28 bd808: [17:37] will need to document how to use it
08:28 yuvipanda: [17:43] bd808: yeah!
08:28 yuvipanda: [17:43] I'm gonna leave this channel too :(
08:28 yuvipanda: [17:44] bd808: Krenair valhallasw madhuvishy chasemp andrewbogott it was great working with y'all :) I'll come back in May, so don't entirely forget me :) And do still page me for PAWS and Quarry. Thanks!
08:28 yuvipanda: [17:44] <3
08:28 bd808: [17:44] see you soon yuvipanda
08:28 yuvipanda: [17:44] :) good luck everyone!
08:28 ***: Playback Complete.
08:29 Mode: +ns
08:29 Created at: Feb 24, 2016, 8:31 AM
08:36 andrewbogott: chasemp: so, nothing actually wrong that you can see?
08:37 chasemp: andrewbogott: sge_qmaster on tools-grid-master kept spiking at 100% cpu and my guess is single threaded?
08:37 chasemp: I'm not sure why
08:37 chasemp: I can't remember ever seeing it high usage
08:38 andrewbogott: huh
08:38 andrewbogott: If someone has a script that's creating a million jobs, maybe?
08:38 chasemp: my first thought too, but I haven't dug anything up yet
08:38 chasemp: sucks
08:39 chasemp: andrewbogott: sorry for txting, I got nervous when gridmaster started throwing dns failure
08:39 andrewbogott: no worries, I was awake
08:41 andrewbogott: I'm looking at sge_qmaster in 'top' but of course I don't know what normal behavior looks like
08:41 chasemp: andrewbogott: nslcd and nscd on tools-grid-master were reallly pegged there for awhile
08:41 chasemp: I restarted both and on 1420 too
08:42 chasemp: andrewbogott: that box is single core tools-grid-master
08:42 chasemp: so when it's pegging at 100% it's not kidding
08:42 chasemp: idk why...
08:43 chasemp: any ideas?
08:44 andrewbogott: I've never seen them be busy. I guess if there was a brief dns outage they might have been retrying frantically
08:44 andrewbogott: I'll look on the dns server and see if there's anything in the log
08:51 andrewbogott: …I don't see anything interesting
08:51 chasemp: it's not constant but it's pretty often still cpu is hiking up
08:54 chasemp: andrewbogott: I removed any nfs throttling from just teh master
08:54 chasemp: I'm tempted to restart the master service
08:54 chasemp: I don't see or dont' recognize anyone doing anything nuts?
08:54 chasemp: but it's very busy
08:54 andrewbogott: restarting it seems ok
08:55 chasemp: I wonder if there isn't something about less nodes and more loaded on each that causes more work rather than more nodes that are under resourced...
08:55 andrewbogott: We don't know for sure that this isn't normal, do we?
08:56 chasemp: andrewbogott: well definite iowait spike to associate
09:06 chasemp: andrewbogott: maybe you could thinkg about how we can bump up cpu on tools-grid-master?
09:07 andrewbogott: We could build a new, larger master. As I recall, though, failing over the master is super ugly
09:07 chasemp: I mean hitting cap on a single core cpu isnt exactly shocking even tho we don't normally do it historically
09:07 andrewbogott: And actually it's totally calm now, since your restarts.
09:08 chasemp: andrewbogott: I was thinking more shut down existing, make a backup, bump up flavor https://docs.openstack.org/user-guide/cli-change-the-size-of-your-server.html ?
09:08 andrewbogott: hm, nope, I spoke too soon, there it goes :)
09:08 andrewbogott: I've never had an instance survive a resize like that.
09:09 chasemp: yeah
09:09 andrewbogott: the whole feature is nonsense as far as I know
09:10 andrewbogott: man, there are really a LOT of exec nodes already! 24.
09:10 chasemp: yeah, aer we short on webgrid or soemthign?
09:10 chasemp: that seems like parity to me or I can't recall
09:11 chasemp: something just whacky happend to the master around that time
09:23 andrewbogott: that's a very modest increase :)
09:24 chasemp: 20% increase in write andrewbogott
09:24 chasemp: across all
09:27 chasemp: andrewbogott: so you're going to add 5 execs as you can, and I'm upping the threshold across, leaving the thresholed off only on the master
09:27 andrewbogott: yep
09:27 chasemp: it seems pretty clear iowait on execs since early jan is up
09:27 chasemp: so that is a reasonable place to try to adjust, thanks for making those
09:28 chasemp: I'm scared of bumping up write too much as one tool could hammer hard
09:28 chasemp: we'll see
09:28 chasemp: madhuvishy: whenever you wakeup check out SAL for tools we are trying to sort out an issue that seems like io/iowait with small adjustments while adnrew builds a few execs
09:28 chasemp: to spread the load
09:32 chasemp: andrewbogott: it's rolling out now fyi
09:35 chasemp: andrewbogott: it looks like https://phabricator.wikimedia.org/P5182 when it rolls out
09:35 chasemp: doing puppet on a 5 node fanout from clush now for execs
09:35 chasemp: I have to step away andrewbogott for a second, I do have a brunch with my sisters home on spring break in a bit I'll bring my laptop
09:36 chasemp: going to watch this rollout
09:36 andrewbogott: ok
09:42 andrewbogott: I'm stepping away for a bit — going to take forever for this initial puppet run to finish on the new nodes
09:45 chasemp: ok
09:47 madhuvishy: hello
09:48 chasemp: mornin
09:48 madhuvishy: chasemp: tools.iabot is running like 62 jobs across execs
09:48 madhuvishy: they are all in running
09:48 chasemp: madhuvishy: ah I didn't catch that
09:48 madhuvishy: not wait or anything
09:48 chasemp: what the heck is that?
09:49 madhuvishy: internet archiver bot
09:49 chasemp: maybe it's periodic too
09:49 chasemp: because it seems to come and go
09:49 madhuvishy: something involving cyberpower
09:49 chasemp: can you attempt to ping him in -labs and ask him to throttle that back?
09:57 madhuvishy: chasemp: hmmm i don't think jsub prevents you from starting 2 jobs with same name
09:57 madhuvishy: jstart does i think
09:57 chasemp: right I was thinking of jstart
09:57 chasemp: madhuvishy: I'm trying to keep an 11 lunch, do youhave a minute to make a task for this assign to cyberbot and let's just stop these since it's clearly in violation? or?
09:57 chasemp: overrunnign teh grid w/ overlapping jobs every...minute
09:58 chasemp: is cleary a big problem
09:58 chasemp: why or why is this not capped at soemthing sane (concurrent jobs per tool)
09:58 madhuvishy: yeah!
09:58 chasemp: madhuvishy: for context we are hurting io wise /already/ so this is just icing on the cake
09:58 madhuvishy: without any decent intervals on the crons
09:59 chasemp: yeah
10:00 madhuvishy: chasemp: ya i'll make a task and then comment out the crons
10:00 madhuvishy: may be kill the existing jobs
10:00 chasemp: madhuvishy: I would
10:00 chasemp: or even can
10:00 chasemp: how about I do that and you make the task :)
10:00 madhuvishy: chasemp: okay :)
10:03 chasemp: madhuvishy: should I leave 3 workers going?
10:04 madhuvishy: chasemp: sure why not
10:04 madhuvishy: will save from some wrath :)
10:04 chasemp: fyi
10:04 chasemp: ####### Commented out by a Tool admin -- this is overwhelming the grid 2017-4-1