gallium is in bad shape and we have contint1001 (Jessie) available to migrate services to. Once firewall ports are open (T137323), we would want to sync Jenkins data and change all the puppet bits referencing the gallium IP address.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | None | T133150 Move gallium to an internal host? | |||
Declined | None | T137358 Migrate CI services from gallium to contint1001 | |||
Resolved | hashar | T137323 Firewall rules for labs support host to communicate with contint1001.wikimedia.org (new gallium) | |||
Resolved | hashar | T137279 Port Zuul package 2.1.0-95-g66c8e52 from Precise to Jessie | |||
Declined | None | T137293 Update all references to gallium and change it to contint1001 in integration/* | |||
Resolved | None | T137265 / on gallium is read only, breaking jenkins | |||
Resolved | hashar | T137418 Remove zuul-merger from gallium |
Event Timeline
Change 293283 had a related patch set uploaded (by Hashar):
contint: cleanup gallium / use contint1001
Change 293284 had a related patch set uploaded (by Hashar):
cache_misc: change doc/integration.wm.o backend
https://gerrit.wikimedia.org/r/#/c/293283/ against puppet.git is a beast it basically change all occurrences of gallium IP address or fqdn in puppet.
I am pondering between:
- split it in more manageable chunks and switch service after service (safe, more preparation work)
- stop CI again, merge it in one go and catch up with issues (evil)
Change 293300 had a related patch set uploaded (by Paladox):
gallium is replaced by contint1001.eqiad.wmnet
Following T137323: Firewall rules for labs support host to communicate with contint1001.wikimedia.org (new gallium), @mark stated that there should be no traffic between the private network and labs instance. Which make sense. Moreover gallium being on production is legacy.
So either:
A) we move contint1001 to the labs support host next to scandium/labnodepool. It will then be able to communicate with labs instances.
B) we can reuse scandium which is currently solely hosting zuul-merger
C) we migrate the whole CI infra to labs
From talk we had, contint1001 was setup in emergency since gallium could have been unrecoverable. Turns out contint1001 cant reach out labs instances per design so there is not much to do with it at this point.
Depending on outcome of T133300 we might want to decomm it.
Change 293284 abandoned by Hashar:
cache_misc: change doc/integration.wm.o backend
Reason:
I have prepared this patch in case we had to switch the CI infra to contint if gallium proven to be lost.
That is nore more an urgency and we are considering a better long term plan via T133300
Change 293283 abandoned by Hashar:
contint: cleanup gallium / use contint1001
Reason:
Was done in a rush last week to switch to contint1001. Turns out the machine is in a private lan and would not let us setup the service.
More discussion is happening on T133300 which would eventually lead to a similar change but split in smaller chunks.