Page MenuHomePhabricator

Move gallium to an internal host?
Closed, DeclinedPublic

Description

During an audit of HTTPS-related things (cf T132521#2202245), it was noted that gallium.wikimedia.org appears to only host two HTTP sites (doc.wikimedia.org and integration.wikimedia.org), both of which are currently revproxied through the cache_misc cluster. If gallium has no other reason that it needs to be on a public subnet, we should move it to an internal-subnet host to reduce its exposure to the wild Internet.

Related Objects

Event Timeline

BBlack created this task.Apr 20 2016, 1:59 PM
Restricted Application added a project: Operations. · View Herald TranscriptApr 20 2016, 1:59 PM
Restricted Application added subscribers: TerraCodes, Aklapper. · View Herald Transcript
Dzahn added a subscriber: hashar.Apr 20 2016, 3:05 PM

it's also running Jenkins. added Hashar to answer if it need the public IP. also see T95757

gallium has been setup in 2011 and is still on Precise. It received a public IP to serves the Jenkins web interface. With time, all the http entry points have been migrated to be behind the misc varnish.

Beside doc/integration.wikimedia.org, the server host Jenkins, Zuul scheduler and Zuul merger. There are network flow from/to labnodepool1001 and scandium in the labs support network as well as flow to/from labs instances.

We have a tracking task to get rid of gallium T95757. There is not much cycles to work on it though, but a first step is to update the CI architecture documentation, specially to keep track of all the network flows: T102137. The outdated doc being at https://www.mediawiki.org/wiki/Continuous_integration/Architecture/Isolation

Maybe we can assign a private IP to gallium, then migrate network flows / update firewalls rules. Once everything is migrated phase out the public IP and rename the host to gallium.eqiad.wmnet.

We have created a sub project in Phabricator https://phabricator.wikimedia.org/project/view/1966/

First step is for Release-Engineering-Team to agree on an architecture via T133300 and propose it to Operations for validation.

fgiunchedi changed the task status from Open to Stalled.Apr 28 2016, 9:55 AM
fgiunchedi triaged this task as Normal priority.
hashar added a comment.Jun 2 2016, 7:47 AM

I have drawn a summary of web services that ends up on gallium. One is on doc.wikimedia.org the three others are on integration.wikimedia.org. They all pass through misc varnish and the path based routing is done on the Apache on gallium via mod_proxy.

None of that needs a public IP for sure.


As I have mentioned earlier on this task, the Gearman daemon is reached by hosts in labs support network: labnodepool1001.eqiad.wmnet and scandium.eqiad.wmnet . I dont think we can make them to reach a private IP in prod, so we need to dispatch gallium services to different hosts/network. Going to be discussed on T133300.

With gallium that lost a disk today, we had contint1001.eqiad.wmnet allocated (Jessie and private IP). Switching services to it is T137358.

Change 293284 had a related patch set uploaded (by Hashar):
cache_misc: change doc/integration.wm.o backend

https://gerrit.wikimedia.org/r/293284

Change 293284 abandoned by Hashar:
cache_misc: change doc/integration.wm.o backend

Reason:
I have prepared this patch in case we had to switch the CI infra to contint if gallium proven to be lost.

That is nore more an urgency and we are considering a better long term plan via T133300

https://gerrit.wikimedia.org/r/293284

hashar added a comment.EditedJul 11 2016, 1:19 PM

integration.wikimedia.org (with Zuul and Jenkins) is going to migrate to scandium.eqiad.wmnet

doc.wikimedia.org is looking for a new home. Potentially via T137890 or yet another task.

Dzahn added a comment.Jul 11 2016, 9:51 PM

doc.wikimedia.org is looking for a new home.

Ganeti VM ?

doc.wikimedia.org home is tracked via T137890

hashar closed this task as Declined.Sep 8 2016, 9:11 AM

From T140257#2595926 and follow up response from ops, we are keeping the status quo of using a public IP.