Page MenuHomePhabricator

Labs security rules changed on integration labs project around Friday Feb 6th 23:30 UTC
Closed, ResolvedPublic

Description

Starting Friday Feb 6th 23:30 UTC, puppet runs on instance of the integration labs project started failing. The instances puppet agent point to integration-puppetmaster.eqiad.wmflabs but timeout connecting to it.

@coren found out that adding a security rule to allow the puppet TCP port (8140) fixed the issue for the integration project. The deployment-project uses a local puppet master but works just fine without any additional security rule.

Original ticket: T88960

Seems something got changes on Friday. Maybe the security rules for the integration project ended up being corrupted for some reason and might miss some sane default. I always though the security rules applied to the project and not to communications between instances of the same project.

Other peoples reported strange behavior as well. I don't have a log though.

Event Timeline

hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar added a project: Cloud-VPS.
hashar added subscribers: hashar, coren, Andrew.

I messed with the security rules on Friday because someone on IRC (timo, I think?) was trying to ssh between instances and failing. This turned out to be because of ferm rules on the concerned instances. I then reverted the security rules to their original state, I thought? But maybe I missed one.

Anyway -- this is almost certainly Andrew's fault, although I'm still not clear on why you need a special rule for the local puppetmaster.

Andrew found out that the integration labs project is missing the security rule that allows communication between instances. That is known has the 'Source group' option, seems most other labs projects have that default

contintcloud_labs_project_security_rules (395×1 px, 52 KB)

Andrew claimed this task.

Yeah, I deleted the 'source group' rule because I suspected it of interfering with inter-instance ssh. As it turned out, the source-group rule was /allowing/ inter-instance ssh, but a firewall running on instances was subsequently blocking...

So, anyway, I've replaced the source-group rule, and I declare firewalls-on-instances to not be my problem :)