for scripts that are just one line i think it reasonable to just build the content in puppet, this is a bit of a cheat but for simple one liners i think its a fine compromise. I took a quick look through the files other then the oneliners, some of them don't actually need to be templates as there is no erb and a few others look like they are already config files so we could possibly just remove the shebang . however there are of course some that may be a bit awkward to fit into this model. Perhaps we should just whitelist them. It would at least mean going forward we don't have more violations added.
Another option is we could just ban templated scripts and add CI to reject any erb file with a shebang in it. This would mean updating any current templated scripts to read a new templated config. i.e. move all the dynamic bits to a template
Originally i wanted to do this checking in the Rake checks as that feels like the right place for them. however i cant think of a reasonable way to compile the erb files as we dont know the values of variables that would ordinarily be bound by the parent Puppet manifest. And different variables/hosts/roles etc could produce different outputs. As such im now leaning more to putting this in PCC, however i wonder if this should be in standard and in the main PCC code or if we should just have some helper scripts in utils. Regardless ill probably work on the latter to see how it looks but ideas welcome
i have made a first pass at the incident report. The main sections which could use some more input are
this is a bit better
quick glance suggest most are SC2086 & SC2006
This is a great idea, i think we may be able to do it in a rake task by adding something to task gen, the biggest issues with adding stuff like this is fixing all of the current issues. the wmf_style checks dose avoid this by only alerting on new errors but i have not looked at it. We should also add shellcheck for sh files in general
Wed, Jun 3
We could resolve this by having mcrouter talk directly to memcache in the other DC however this requires 1.5.13 which is not currently in buster.
1.6.6 is available in testing and seems to build correctly on buster.
I have configured memcache and mcrouter for CAS however there is currently an error. If CAS talks directly to memcache then all works fine. however when CAS talks to mcrouter we get and amplification storm before a timeout is sent to CAS. After speaking with @elukey we think the following may be happening
Is this a duplicate of https://phabricator.wikimedia.org/T251340 ?
@CDanis you still have the example text for the "Where did we get lucky?" and "Links to relevant documentation" sections. Also still need tasks for
Reviewed this looks good to publish, what is the procedure for that @CDanis ?
Tue, Jun 2
Mon, Jun 1
Resolve as per Moritz comment
Fri, May 29
when i checked stat1008 already had a AAAA record, not sure if someone fixed it or some issue in the script?
Currently mongodb is not listening on IPv6 however[[ https://phabricator.wikimedia.org/T180761 | mongodb is going away ]] so we should wait until that work has been completed
Thu, May 28
Wed, May 27
I think I'm leaning towards a few stable anchors in similar geographic locations to our PoPs. Maybe also a few root servers as well even though they're less apples-to-apples.
mcrouter 0.4.1 is now available in buster-wikimedia
Tue, May 26
I had a look at this on the CAS side and i think it would be doable to add some level of 2FA with an account in ldap. however i think most scripted users would only need to login to one service and not all services as such im not sure it makes sense to give scripted accounts a full SSO account as we would end up complicating things by applying additional ACL's for. Again we can restrict which services a specific account has access to on the CAS side however it seems to me that we are just adding complexity , both on CAS but also on the monitoring scripts which would need some type of session management.
Currently blocked waiting on mcrouter for buster
A new skin is now in place
This has been discussed and documented . Please reopen if further investigation is required
This is now in place, resolving
This is now in place, resolving
I asked @Zbyszko to look at this and they where unable to get to the root cause however as we now use an external tomcat instance this issue no longer affects us. As such im resolving.
Mon, May 25
Thanks for raising this, i have now added the unstable src repos to the build hosts.
Thu, May 21
Have added yamllint checking to the private repo
Which measurements to you plan to scrap?
- all measurements the anchors are performing outbound?
- the anchoring measurements directed at thes anchor.
I noticed that the ns0.openstack.codfw1dev.wikimediacloud.org. servers are configured as name servers for the 57.15.185.in-addr.arpa. zone. however you actually need to configure the 0/18.104.22.168.in-addr.arpa. zone
Tue, May 19
That's exactly what i have used, docker_entry.sh is just a cleaned up version of the FB packaging scripts
Correct the 0.41 version in the docker process builds correctly