Page MenuHomePhabricator

setup/install WMF7426 as phab1003.eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the reimage of WMF7426 as phab1003.eqiad.wmnet.

phab1003:

  • - hostname labels updated/applied to physical system to be done via sub-task T221392
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan - internal subnet)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare most likely)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox' once its actively doing work
  • - apply phab role in site.pp and add as phabricator server in Hiera
  • - schedule switch-over maintenance window with @mmodell
  • - do the switch
  • - reminder: update SPF records in DNS

Event Timeline

RobH triaged this task as Normal priority.Apr 18 2019, 4:13 PM
RobH created this task.
RobH renamed this task from setup/install WMF7426 as phab1002.wikimedia.org to setup/install WMF7426 as phab1003.wikimedia.org.Apr 18 2019, 4:25 PM
RobH updated the task description. (Show Details)

Did you mean phab1003.eqiad.wmnet? Existing phab* hosts are internal

RobH updated the task description. (Show Details)Apr 18 2019, 4:36 PM
RobH renamed this task from setup/install WMF7426 as phab1003.wikimedia.org to setup/install WMF7426 as phab1003.eqiad.wmnet.Apr 18 2019, 4:42 PM
RobH updated the task description. (Show Details)
RobH removed a project: ops-eqiad.
RobH updated the task description. (Show Details)

Change 504922 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] setting phab1003 mgmt dns entry

https://gerrit.wikimedia.org/r/504922

Change 504922 merged by RobH:
[operations/dns@master] setting phab1003 mgmt dns entry

https://gerrit.wikimedia.org/r/504922

RobH updated the task description. (Show Details)
RobH reassigned this task from RobH to Dzahn.Apr 18 2019, 4:54 PM

Please note this is now ready for installation, and I'm assigning to @Dzahn per our IRC conversation. Please ensure the system state in netbox is changed to 'staged' once the OS is installed and changed to 'active' once it is doing actual work.

Change 504951 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: remove phab1002, add phab1003

https://gerrit.wikimedia.org/r/504951

Change 504951 merged by Dzahn:
[operations/puppet@production] install_server: remove phab1002, add phab1003

https://gerrit.wikimedia.org/r/504951

Change 504964 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mariadb: replace phab1002 grant comments with phab1003

https://gerrit.wikimedia.org/r/504964

Dzahn added a comment.Apr 18 2019, 8:01 PM

@RobH Ideally i would like to use the same IP i had used for phab1002, so waiting for that decom task to be past "remove production dns".

Dzahn updated the task description. (Show Details)Apr 18 2019, 8:09 PM
Dzahn updated the task description. (Show Details)Apr 18 2019, 11:55 PM

Change 505040 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] netboot: add phab1003 to partman

https://gerrit.wikimedia.org/r/505040

Change 505040 merged by Dzahn:
[operations/puppet@production] netboot: add phab1003 to partman

https://gerrit.wikimedia.org/r/505040

Change 505051 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add phab1003

https://gerrit.wikimedia.org/r/505051

Change 505051 merged by Dzahn:
[operations/puppet@production] site: add phab1003

https://gerrit.wikimedia.org/r/505051

Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts:

['phab1003.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201904190041_dzahn_82147.log.

Completed auto-reimage of hosts:

['phab1003.eqiad.wmnet']

Of which those FAILED:

['phab1003.eqiad.wmnet']

Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts:

phab1003.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201904191755_dzahn_40551_phab1003_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['phab1003.eqiad.wmnet']

and were ALL successful.

Dzahn updated the task description. (Show Details)Apr 19 2019, 7:33 PM
Dzahn updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-releng) [2019-04-19T19:56:47Z] <mutante> phab1003 - editing /srv/deployment/phabricator/deployment-cache/.config manually to replace tin.eqiad.wmnet with deploy1001.eqiad.wmnet to fix git cloning issue on first puppet run on new host where somehow tin.eqiad still shows up. fixes puppet run on T221389

Dzahn updated the task description. (Show Details)Apr 19 2019, 8:02 PM

note: phab1001 has more IPs on the interface than phab1003, adding the additional ones doesn't look puppetized !!

cc; @20after4 we need to remember this for migration and figure out which to add

[phab1001:~] $ ip a s | grep 2620
    inet6 2620:0:861:ed1a::3:16/128 scope global 
    inet6 2620:0:861:102:10:64:16:100/128 scope global deprecated 
    inet6 2620:0:861:103:10:64:32:186/128 scope global deprecated 
    inet6 2620:0:861:102:10:64:16:8/64 scope global 


[phab1003:~] $ ip a s | grep 2620
    inet6 2620:0:861:ed1a::3:16/128 scope global 
    inet6 2620:0:861:107:10:64:48:21/64 scope global 


[phab2001:~] $ ip a s  | grep 2620
    inet6 2620:0:860:ed1a::3:fa/128 scope global 
    inet6 2620:0:860:103:10:192:32:149/128 scope global deprecated 
    inet6 2620:0:860:103:10:192:32:147/64 scope global
greg updated the task description. (Show Details)Apr 19 2019, 10:32 PM
greg added a subscriber: greg.

note: phab1001 has more IPs on the interface than phab1003, adding the additional ones doesn't look puppetized !!
cc; @20after4 we need to remember this for migration and figure out which to add

@mmodell :)

greg removed a subscriber: greg.Apr 19 2019, 10:33 PM

Change 505332 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] update SPF records from phab1001 to phab1003 IP

https://gerrit.wikimedia.org/r/505332

fwiw my irc client highlights both names but I rarely log in to the @20after4 phab account.

Change 504964 abandoned by Dzahn:
mariadb: replace phab1002 grant comments with phab1003

Reason:
duplicate of https://gerrit.wikimedia.org/r/c/operations/puppet/ /496120

https://gerrit.wikimedia.org/r/504964

Change 512077 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] phabricator: rsync /srv/repos from 1001 to 1003

https://gerrit.wikimedia.org/r/512077

Change 512077 merged by Dzahn:
[operations/puppet@production] phabricator: rsync /srv/repos from 1001 to 1003

https://gerrit.wikimedia.org/r/512077

Dzahn closed this task as Resolved.May 23 2019, 2:37 AM

<pre>
2019-05-23

02:35 mutante: phabricator - going read-write again
02:24 twentyafterfour: manually started aphlict on phab1003
02:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
02:04 mutante: puppetmaster1001 - sudo -i conftool-merge
01:52 twentyafterfour: phabricator is now served by phab1003 though still in read-only mode for a bit longer
01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
01:49 mutante: puppetmaster1001 - conftool-merge
01:37 mutante: depooled phab1001-vcs from git-ssh via conftool
01:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab1001-vcs.eqiad.wmnet
01:33 mutante: run puppet on mx1001/mx2001 - switch mail route for phab to phab1003
01:30 mutante: switched from phab1001 to phab1003 - applied on cp1008 varnish canary first
01:28 twentyafterfour: stopping phd on phab1001
01:18 mutante: phabricator going readonly momentarily
01:09 twentyafterfour: extended phab downtime in icinga, actual downtime hasn't started yet, prep work taking longer than expected
00:45 mutante: phab1003 - rsyncing /srv/repos from phab1001

</pre>

Dzahn updated the task description. (Show Details)May 23 2019, 4:55 AM

Change 512088 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] update SPF records for Phabricator to phab1003 IP

https://gerrit.wikimedia.org/r/512088

Change 512088 merged by Dzahn:
[operations/dns@master] update SPF records for Phabricator to phab1003 IP

https://gerrit.wikimedia.org/r/512088

Dzahn updated the task description. (Show Details)May 23 2019, 4:59 AM

Mentioned in SAL (#wikimedia-operations) [2019-05-30T00:17:37Z] <mutante> rsyncing /srv/repos again. pulling on phab2001 from phab1003 (T221389)

Mentioned in SAL (#wikimedia-operations) [2019-05-30T00:24:56Z] <mutante> re-enabling puppet on phab1001 now that it does not have the phab role anymore (T221389)

Change 505332 abandoned by Dzahn:
update SPF records from phab1001 to phab1003 IP

Reason:
duplicate, already done

https://gerrit.wikimedia.org/r/505332