Fri, Feb 16
The situation before was to run daily. Since there were issues with it i temp. reduced it to weekly. Then "Reedy fixed it(tm)" happened and i simply reverted to how it was before just now. So it's back to daily.
fwiw: 55 people just got a notification and clicked on it just to be informed that this task isn't Epic anymore but now just "other completed". worth it?
Thu, Feb 15
@Vgutierrez re: Icinga command permissions. should be all done. the ultimate test is if you try a "schedule downtime" or "disable/enable notifications" or "send acknowledgement" for something from the Icinga web ui. caveat is that auth_ldap will let you login with and without capitalization but to get the permissions above you need to match the "cn" from LDAP, so capitalized. With the other version you would still be logged in but not have the permissions.
But we also want to replace the apache module with the httpd module (which doesn't have monitoring.pp anymore because we didn't want the diamond collector). So it should be added there as well.
I fixed the server status page. it's available again now.
Tue, Feb 13
before (there is "Subject: " in each message and the same content is repeated twice. the actual status as in "PROBLEM" or "ACK" is not shown at all. which means sending an ACK shows as another CRIT)
was already done in T187035#3966316
@RobH could you do one more Racktables user? thanks!
please take a look at
subscribed you to both ops mailing lists (others like wikitech-l are optional and self-service)
added to "WMF-NDA" in Phabricator
I didn't even mean to imply you have to rush it, just that it's not worth to upload a puppet change for like one more (weekly) restart that we can do manually.
I suppose we don't bother with the cron to restart Apache anymore and first wait for the version upgrade.
Mon, Feb 12
T182832#3965043 now about upgrading to 7.1. experimenting early wouldn't have been bad after all.
This issue is coming back on T182832
17:30 < Hauskatze> I wonder if for the next round of https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mediawiki/manifests/maintenance/purge_abusefilter.pp;cacc0b5224994e40df6c23bc7b8781305061ed57$4 we could actually store the log ?
17:31 < Hauskatze> re-running after fixing it, not running since 2016
added to WMF-NDA-Requests group on Phabricator
Fri, Feb 9
I agree we should _not_ use a second domain, yea. Also the short version with just "g" or something similar seems nice.
Thu, Feb 8
@jmatazzoni If it's a group (list) of people but the word "lists" isn't actually part of the address. (email@example.com as opposed to firstname.lastname@example.org) then that means it's handled by OIT. You can reach them via email at email@example.com for any questions. If it is a real "lists.wikimedia.org" list then you are in the right place in Phabricator. In this case it was the former though.
Icinga for meitnerium looks fine. no disk space warnings.
Wed, Feb 7
Looks like we are all done. If there are any issues or things missing, please just reopen it.
Confirmed i saw the new signature now.
Tue, Feb 6
I can't see your signature yet. I tried my default keyserver (hkps://hkps.pool.sks-keyservers.net) and pgp.mit.edu. Depending on which keyserver you used it might just take a while until they have synced.
The subtask was automatically created by Herald (for shell access requests) but it was closed as Invalid now.
I think the answer is "no separate domain"?
cool, this ticket seems resolved
@Prtksxna I asked around a bit and yea there is already precedence for this. Others are doing it that way. So as long as i can let puppet clone from Gerrit and you can handle how it gets synced into Gerrit with people setting up Gerrit projects, that would work, yea.
@chasemp Adding the key to pwstore requires that it has at least 2 signatures on it. Since you already confirmed the key during hangout, could you add one of those? Like sign the key and then upload the signed version to keyserver? Then it should show signatures with gpg --list-sigs 2051251AF5172F75 I could then do the second one.
Ok, since it's already on the Done colum and all subtasks are checked off, seems pretty resolved to me. Of course reopen if anyone feels differently.
Can also be closed as resolved now, right?
I re-enabled the git cloning. At first there were some conflicts when puppet tried to git pull. Deleting the entire /srv/org/wikimedia/research directory and running puppet fixed that and files were then current.
Mon, Feb 5
Yea we should disable that feature or people will start using it.
[bast1001:~] $ id bstorm uid=18713(bstorm) gid=500(wikidev) groups=500(wikidev),50(staff),700(ops),600(all-users)
Fri, Feb 2
After talking with jcrespo on IRC:
gnt-instance shutdown <fqdn>
gnt-instance startup <fqdn>
Yes, i can't do them though because i don't have the access to disable switch ports.
@elukey so yea, now we'd have to restart the instance from ganeti, as the comment above says rebooting from within the instance won't do it. You said above archiva downtime needs some heads-up, so didn't do that yet, but the new hardware is there if you wanna continue
.. Fri Feb 2 00:23:13 2018 - INFO: - device disk/1: 99.30% done, 21s remaining (estimated) Fri Feb 2 00:23:34 2018 - INFO: - device disk/1: 100.00% done, 1s remaining (estimated) Fri Feb 2 00:23:35 2018 - INFO: - device disk/1: 100.00% done, 0s remaining (estimated) Fri Feb 2 00:23:36 2018 - INFO: Instance meitnerium.wikimedia.org's disks are in sync Modified instance meitnerium.wikimedia.org - disk/1 -> add:size=102400,mode=rw Please don't forget that most parameters take effect only at the next (re)start of the instance initiated by ganeti; restarting from within the instance will not be enough. [ganeti1004:~] $
@Tbayer purely from a ticket triaging perspective: since the ticket title is "vet reliability of the response_size field.." and you said "we can conclude with reasonable confidence that erroneous logging of the response_size field is not the cause", can we call it resolved?
Thu, Feb 1
@Cameron11598 Could you forward Brandon's answer above to the original reporter? Let us know if there is any feedback from him. As i see it now i think that reply is the best we can offer in this case and there isn't much else on this ticket that is really actionable on our end.
@Paladox can you add a specific host name where this happens?
Can't reproduce it right now. It works for me.
Yea, that makes sense. I also think it's the easiest way to create a new disk in ganeti and then mount it.