Page MenuHomePhabricator

setup/install phab1002(WMF4727)
Closed, ResolvedPublic

Description

xThis task will track the setup of spare pool system phab1002(WMF4727). This is a temp allocation, with the existing phab1001 being re-imaged and then failed back to.

phab1002(WMF4727):

  • - apply new hostname label (arguable if this is needed since it'll return to spares in less than a month.)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for hostname (confirm asset tag entries exist)
  • - network port setup (description, enable, internal vlan)
  • - production dns entries added (all added, phab1002. phab1002-vcs, phab1002-aphlict, IPv6 records)
  • - operations/puppet update (install_server at minimum, other files if possible) (install_server done, added role spare, mapped IPv6)
  • - OS installation (done)
  • - puppet accept/initial run
  • - apply phabricator puppet role and confirm it works

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 436406 merged by RobH:
[operations/dns@master] phab1002 mgmt dns

https://gerrit.wikimedia.org/r/436406

Change 435211 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] assign wmf4727 as phab1002

https://gerrit.wikimedia.org/r/435211

Change 435211 abandoned by Dzahn:
assign wmf4727 as phab1002

https://gerrit.wikimedia.org/r/435211

Change 436631 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] assign 10.64.16.18 to phab1002.eqiad.wmnet

https://gerrit.wikimedia.org/r/436631

Change 436631 merged by Dzahn:
[operations/dns@master] assign 10.64.16.18 to phab1002.eqiad.wmnet

https://gerrit.wikimedia.org/r/436631

Change 436678 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: add phab1001 to DHCP,netboot

https://gerrit.wikimedia.org/r/436678

Change 436678 merged by Dzahn:
[operations/puppet@production] install_server: add phab1002 to DHCP,netboot

https://gerrit.wikimedia.org/r/436678

Mentioned in SAL (#wikimedia-operations) [2018-05-31T21:00:55Z] <mutante> dzahn@neodymium:~$ sudo wmf-auto-reimage-host --new phab1002.eqiad.wmnet (T196019)

Change 436680 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: remove bast-test from DHCP

https://gerrit.wikimedia.org/r/436680

Change 436680 merged by Dzahn:
[operations/puppet@production] install_server: remove bast-test from DHCP

https://gerrit.wikimedia.org/r/436680

Change 436681 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove bast-test.eqiad.wmnet

https://gerrit.wikimedia.org/r/436681

Change 436681 merged by Dzahn:
[operations/dns@master] remove bast-test.eqiad.wmnet

https://gerrit.wikimedia.org/r/436681

Change 436685 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] fix IP address for phab1002, was wrong row

https://gerrit.wikimedia.org/r/436685

Change 436685 merged by Dzahn:
[operations/dns@master] fix IP address for phab1002, was wrong row

https://gerrit.wikimedia.org/r/436685

after wmf-auto-reimage-host said:

23:14:19 | phab1002.eqiad.wmnet | Still waiting for reboot after 45.0 minutes

i went to mgmt console to check myself and i saw:

Install the GRUB boot loader on a hard disk ├────────┐      
     │                                                                   │      
     │                     Installation step failed                      │      
     │ An installation step failed. You can try to run the failing item  │      
     │ again from the menu, or skip it and choose something else. The    │      
     │ failing step is: Install the GRUB boot loader on a hard disk

in the installer i selected the very last step to install grub manually. next was:

[!!] Install the GRUB boot loader on a hard disk ├┐             
             │                                                    │             
  ┌──────────│         Unable to install GRUB in /dev/sda         │ ────────┐   
  │          │ Executing 'grub-install /dev/sda' failed.          │         │   
  │          │                                                    │         │   
  │          │ This is a fatal error.                             │         │   
  │ Running "│                                                    │         │   
  │          │     <Go Back>                       <Continue>     │         │   
  └──────────│                                                    │

sounds like it expected a ssd.

See T189804 and T190093. This is the same spare machine that had this same issue before when i got it for bastion host replacement in the past. I reopened one of those.

Change 436702 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: use raid1-gpt partman recipe for phab1002

https://gerrit.wikimedia.org/r/436702

Change 436702 merged by Dzahn:
[operations/puppet@production] install_server: use raid1-gpt partman recipe for phab1002

https://gerrit.wikimedia.org/r/436702

OS has been installed (with wmf-auto-reimage-host) after using raid1-gpt partman.

Debian GNU/Linux 9 phab1002 ttyS1

phab1002 login:

Change 436708 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add phab1002 with spare role and mapped IPv6

https://gerrit.wikimedia.org/r/436708

Change 436708 merged by Dzahn:
[operations/puppet@production] site: add phab1002 with spare role and mapped IPv6

https://gerrit.wikimedia.org/r/436708

Change 436709 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add IPv6 records for phab1002

https://gerrit.wikimedia.org/r/436709

Change 436709 merged by Dzahn:
[operations/dns@master] add IPv6 records for phab1002

https://gerrit.wikimedia.org/r/436709

Change 436710 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add aphlict and vcs service IPs for phab1002

https://gerrit.wikimedia.org/r/436710

Change 436710 merged by Dzahn:
[operations/dns@master] add aphlict and vcs service IPs for phab1002

https://gerrit.wikimedia.org/r/436710

Dzahn updated the task description. (Show Details)

Change 437300 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] phabricator: add role to node phab1002

https://gerrit.wikimedia.org/r/437300

Change 437558 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] dumps: add phab1002 as second phab server

https://gerrit.wikimedia.org/r/437558

profile::dumps::distribution::datasets::fetcher is what you want to fix. I need to go through and see which rsync confs I can kill. Later.

Change 437613 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mariadb: add phab1002 to phabricator grants

https://gerrit.wikimedia.org/r/437613

Change 437615 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] hiera/phabricator: add phab1002 as phab server

https://gerrit.wikimedia.org/r/437615

Change 437620 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] switch phabricator from phab1001 to phab1002

https://gerrit.wikimedia.org/r/437620

Change 437625 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mtail: replace phab1001 with phab1002?

https://gerrit.wikimedia.org/r/437625

@Dzahn I see phab1002 is installed and in icinga does the bios/drac/serial still need setup/testing

Cmjohnson moved this task from Up next to Blocked on the ops-eqiad board.

@Cmjohnson i'm not sure, Rob created the task with the check boxes. I think from a template.

I used wmf-auto-reimage so that was able to use the mgmt interface to install. I can also confirm i get a console. But i'm' not sure if anything else in BIOS needs to be checked.

I don't have a specific problem with this instance but if this is a checkbox item that is supposed to happen on each re-assignment (like to set it back to the right BIOS setting defaults) then feel free to reboot that any time. The system isn't serving anything yet.

Change 438235 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] phabricator: set service IPs for phab1002 in Hiera

https://gerrit.wikimedia.org/r/438235

Change 438235 merged by Dzahn:
[operations/puppet@production] phabricator: set service IPs for phab1002 in Hiera

https://gerrit.wikimedia.org/r/438235

Change 437300 merged by Dzahn:
[operations/puppet@production] phabricator: add role to node phab1002

https://gerrit.wikimedia.org/r/437300

Change 437615 merged by Dzahn:
[operations/puppet@production] hiera/phabricator: add phab1002 as phab server

https://gerrit.wikimedia.org/r/437615

role has been applied

these things are done:

https://gerrit.wikimedia.org/r/#/q/topic:phab1002+(status:merged)

but these are still needed:

https://gerrit.wikimedia.org/r/#/q/topic:phab1002+(status:open)

next we need to get the DB grants merged

Change 437613 merged by Marostegui:
[operations/puppet@production] mariadb: add phab1002 to phabricator grants

https://gerrit.wikimedia.org/r/437613

Change 437625 abandoned by Dzahn:
mtail: replace phab1001 with phab1002?

Reason:
host names for tests don't need to reflect actual machine names

https://gerrit.wikimedia.org/r/437625

Mentioned in SAL (#wikimedia-operations) [2018-06-11T10:52:48Z] <mutante> phab1002 - editing cached scap config /srv/deployment/phabricator/deployment-cache/.config to replace tin.eqiad with deploy1001.eqiad deployment server, run puppet. other options: run scap with --refresh-config, delet cached .config file (T196019) (T175288)

Dzahn removed Dzahn as the assignee of this task.Jun 21 2018, 12:09 PM

for current status please see T190568#4305040

Vvjjkkii renamed this task from setup/install phab1002(WMF4727) to cybaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
ArielGlenn renamed this task from cybaaaaaaa to setup/install phab1002(WMF4727).Jul 1 2018, 6:59 AM
ArielGlenn lowered the priority of this task from High to Medium.
ArielGlenn updated the task description. (Show Details)
ArielGlenn added subscribers: gerritbot, Aklapper.

there has been a meeting (https://wikitech.wikimedia.org/wiki/Phabricator/Meeting_Notes/2019-01-23)

and further progress happened at T190568#4907166 which resolved on of the follow-ups in it and unblocked testing the stretch server phab1002 installation which also stops using mod_php, has PHP 7 and so on.

T195623 is the original hardware request for this ticket and i reopened it because a blocker has been identified, which is "replacement server has 32GB but production server has 64GB RAM and we need 64GB".

So that's kind of stalled on this now.

I can't speak for the " - bios/drac/serial setup/testing" checkbox but the "add to operations puppet" and "make phabricator role work on stretch" part has been done. updated boxes accordingly.

@RobH See above fyi and i am not sure if i should have reopened the original hardware request as i did on T195623#4903960 or if this ticket here is the most appropriate place because it is still open and about implementing the service on it. But the issue is we would a box with 64GB instead of 32GB RAM whichever way is easier. Let me know and i use the right ticket. Thank!

re-closing. replaced by new request in T215335. will be reverted in T215332

Change 437558 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] dumps: switch phab1001->phab1003 as phab dumps source

https://gerrit.wikimedia.org/r/437558

Change 437558 abandoned by Dzahn:
dumps: switch phab1001->phab1003 as phab dumps source

Reason:
merging into https://gerrit.wikimedia.org/r/c/operations/puppet/ /437620

https://gerrit.wikimedia.org/r/437558

Change 511929 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] phabricator: activate read-only mode for maintenance

https://gerrit.wikimedia.org/r/511929

Change 511929 merged by Dzahn:
[operations/puppet@production] phabricator: activate read-only mode for maintenance

https://gerrit.wikimedia.org/r/511929