Page MenuHomePhabricator

cloudcumin not able to communicate with openstack.eqiad1.wikimediacloud.org:25000 anymore
Closed, ResolvedPublic

Description

At some point cloudcumin has stopped being able to communicate with the openstack api, I discovered this while trying cumin:

root@cloudcumin1001:~# cumin 'O{project:toolsbeta}'
Caught ConnectTimeout exception: Request to https://openstack.eqiad1.wikimediacloud.org:25000/v3/auth/tokens timed out
root@cloudcumin1001:~#

Event Timeline

@taavi mentioned that https://gerrit.wikimedia.org/r/c/operations/homer/public/+/970275 might have broken this communication, which seems likely. Short of reverting that change what's the right approach here to make sure cloudcumin can talk to the openstack api? cc @ayounsi @cmooney

Since that firewall change is "correct" in terms of the administrative policy we want to do, and the cloudcumin hosts live in Ganeti where we don't have the cloud-private networks available, the best fix probably is to get Cumin to talk to the OpenStack API via the general prod outbound HTTP proxies. (The other option I can think of is the cloudlb hosts making the API also available on a wikiprod-realm service address, but that seems like the exact opposite of the direction we've been recently going to.)

I agree cloudcumin talking via prod http proxy like any other client is the right fix here. @Volans what do you think of the above idea? namely get cumin O backend to talk through prod proxies to the openstack api? from a quick look to wmcs-cookbooks spicerack shouldn't be affected in the sense that openstack interaction happens through CLI anyways and thus works

Conceptually that could work for me, but I fear that we might need to patch cumin for that. Given that keystoneauth1 uses python's requests library AFAIK, we could just set in cumin's config the environmental variables for the proxy, but then it will be used for all request's based backends, like PuppetDB.
So if we want to add proxy support specifically for the openstack backend we might need to patch it specifically. For additional context cumin has now a large refactor merged in master that will be released soon but it can't be released in a hurry given the major refactor.

Change #1253574 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/homer/public@master] cr-cloud: allow cumin/cloudcumin traffic

https://gerrit.wikimedia.org/r/1253574

We discussed this in the team meeting today: to restore functionality I have https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1253574 out. I'll be following up with a specific spicerack task for the openstack backend to be able to use an http proxy.

Thanks, patch lgtm ! and indeed using the proxies seems the best here.

Change #1253574 merged by Filippo Giunchedi:

[operations/homer/public@master] cr-cloud: allow cumin/cloudcumin traffic

https://gerrit.wikimedia.org/r/1253574

FWIW I agree it'd be better if the web proxy could be used here, as conceptually this is "private WMF host needs access to external internet IP".

But this is ok. Probably a better fix is whatever work is being done on cumin and talking to openstack be executted from a "cumin" host that lives in cloud-land itself. But I'm probably opening a massive bag of worms evening mentioning that :P

fgiunchedi claimed this task.

This is fixed! Thank you all for your help, and will follow up with another task to get http proxy support for spicerack openstack backend (and other related can of worms!)

Change #1254211 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/homer/public@master] cr-cloud-vrf: Narrowly scope (cloud)cumin firewall exemption

https://gerrit.wikimedia.org/r/1254211

Change #1254211 merged by jenkins-bot:

[operations/homer/public@master] cr-cloud-vrf: Narrowly scope (cloud)cumin firewall exemption

https://gerrit.wikimedia.org/r/1254211

Change #1266963 had a related patch set uploaded (by Volans; author: Volans):

[operations/homer/public@master] Revert "cr-cloud: allow cumin/cloudcumin traffic"

https://gerrit.wikimedia.org/r/1266963

Change #1266963 merged by jenkins-bot:

[operations/homer/public@master] Revert "cr-cloud: allow cumin/cloudcumin traffic"

https://gerrit.wikimedia.org/r/1266963