Page MenuHomePhabricator

setup/install phab1002(WMF4727)
Closed, ResolvedPublic

Description

xThis task will track the setup of spare pool system phab1002(WMF4727). This is a temp allocation, with the existing phab1001 being re-imaged and then failed back to.

phab1002(WMF4727):

  • - apply new hostname label (arguable if this is needed since it'll return to spares in less than a month.)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for hostname (confirm asset tag entries exist)
  • - network port setup (description, enable, internal vlan)
  • - production dns entries added (all added, phab1002. phab1002-vcs, phab1002-aphlict, IPv6 records)
  • - operations/puppet update (install_server at minimum, other files if possible) (install_server done, added role spare, mapped IPv6)
  • - OS installation (done)
  • - puppet accept/initial run
  • - apply phabricator puppet role and confirm it works

Details

Related Gerrit Patches:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 436406 merged by RobH:
[operations/dns@master] phab1002 mgmt dns

https://gerrit.wikimedia.org/r/436406

Dzahn claimed this task.May 30 2018, 10:07 PM
RobH updated the task description. (Show Details)

Change 435211 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] assign wmf4727 as phab1002

https://gerrit.wikimedia.org/r/435211

Change 435211 abandoned by Dzahn:
assign wmf4727 as phab1002

https://gerrit.wikimedia.org/r/435211

Change 436631 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] assign 10.64.16.18 to phab1002.eqiad.wmnet

https://gerrit.wikimedia.org/r/436631

Change 436631 merged by Dzahn:
[operations/dns@master] assign 10.64.16.18 to phab1002.eqiad.wmnet

https://gerrit.wikimedia.org/r/436631

Dzahn updated the task description. (Show Details)May 31 2018, 8:26 PM

Change 436678 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: add phab1001 to DHCP,netboot

https://gerrit.wikimedia.org/r/436678

Change 436678 merged by Dzahn:
[operations/puppet@production] install_server: add phab1002 to DHCP,netboot

https://gerrit.wikimedia.org/r/436678

Mentioned in SAL (#wikimedia-operations) [2018-05-31T21:00:55Z] <mutante> dzahn@neodymium:~$ sudo wmf-auto-reimage-host --new phab1002.eqiad.wmnet (T196019)

Change 436680 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: remove bast-test from DHCP

https://gerrit.wikimedia.org/r/436680

Change 436680 merged by Dzahn:
[operations/puppet@production] install_server: remove bast-test from DHCP

https://gerrit.wikimedia.org/r/436680

Change 436681 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove bast-test.eqiad.wmnet

https://gerrit.wikimedia.org/r/436681

Change 436681 merged by Dzahn:
[operations/dns@master] remove bast-test.eqiad.wmnet

https://gerrit.wikimedia.org/r/436681

Change 436685 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] fix IP address for phab1002, was wrong row

https://gerrit.wikimedia.org/r/436685

Change 436685 merged by Dzahn:
[operations/dns@master] fix IP address for phab1002, was wrong row

https://gerrit.wikimedia.org/r/436685

after wmf-auto-reimage-host said:

23:14:19 | phab1002.eqiad.wmnet | Still waiting for reboot after 45.0 minutes

i went to mgmt console to check myself and i saw:

Install the GRUB boot loader on a hard disk ├────────┐      
     │                                                                   │      
     │                     Installation step failed                      │      
     │ An installation step failed. You can try to run the failing item  │      
     │ again from the menu, or skip it and choose something else. The    │      
     │ failing step is: Install the GRUB boot loader on a hard disk

in the installer i selected the very last step to install grub manually. next was:

[!!] Install the GRUB boot loader on a hard disk ├┐             
             │                                                    │             
  ┌──────────│         Unable to install GRUB in /dev/sda         │ ────────┐   
  │          │ Executing 'grub-install /dev/sda' failed.          │         │   
  │          │                                                    │         │   
  │          │ This is a fatal error.                             │         │   
  │ Running "│                                                    │         │   
  │          │     <Go Back>                       <Continue>     │         │   
  └──────────│                                                    │

sounds like it expected a ssd.

See T189804 and T190093. This is the same spare machine that had this same issue before when i got it for bastion host replacement in the past. I reopened one of those.

Change 436702 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: use raid1-gpt partman recipe for phab1002

https://gerrit.wikimedia.org/r/436702

Change 436702 merged by Dzahn:
[operations/puppet@production] install_server: use raid1-gpt partman recipe for phab1002

https://gerrit.wikimedia.org/r/436702

Dzahn added a comment.Jun 1 2018, 12:35 AM

OS has been installed (with wmf-auto-reimage-host) after using raid1-gpt partman.

Debian GNU/Linux 9 phab1002 ttyS1

phab1002 login:

Dzahn updated the task description. (Show Details)Jun 1 2018, 12:36 AM

Change 436708 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add phab1002 with spare role and mapped IPv6

https://gerrit.wikimedia.org/r/436708

Change 436708 merged by Dzahn:
[operations/puppet@production] site: add phab1002 with spare role and mapped IPv6

https://gerrit.wikimedia.org/r/436708

Change 436709 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add IPv6 records for phab1002

https://gerrit.wikimedia.org/r/436709

Change 436709 merged by Dzahn:
[operations/dns@master] add IPv6 records for phab1002

https://gerrit.wikimedia.org/r/436709

Change 436710 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add aphlict and vcs service IPs for phab1002

https://gerrit.wikimedia.org/r/436710

Change 436710 merged by Dzahn:
[operations/dns@master] add aphlict and vcs service IPs for phab1002

https://gerrit.wikimedia.org/r/436710

Dzahn updated the task description. (Show Details)Jun 1 2018, 1:25 AM
Dzahn updated the task description. (Show Details)
Dzahn updated the task description. (Show Details)Jun 1 2018, 1:28 AM

Change 437300 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] phabricator: add role to node phab1002

https://gerrit.wikimedia.org/r/437300

Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Jun 5 2018, 6:16 PM

Change 437558 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] dumps: add phab1002 as second phab server

https://gerrit.wikimedia.org/r/437558

profile::dumps::distribution::datasets::fetcher is what you want to fix. I need to go through and see which rsync confs I can kill. Later.

Change 437613 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mariadb: add phab1002 to phabricator grants

https://gerrit.wikimedia.org/r/437613

Change 437615 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] hiera/phabricator: add phab1002 as phab server

https://gerrit.wikimedia.org/r/437615

Change 437620 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] switch phabricator from phab1001 to phab1002

https://gerrit.wikimedia.org/r/437620

Change 437625 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mtail: replace phab1001 with phab1002?

https://gerrit.wikimedia.org/r/437625

@Dzahn I see phab1002 is installed and in icinga does the bios/drac/serial still need setup/testing

Cmjohnson updated the task description. (Show Details)Jun 6 2018, 3:22 PM
Cmjohnson moved this task from Up next to Blocked on the ops-eqiad board.
Dzahn added a comment.Jun 7 2018, 12:35 PM

@Cmjohnson i'm not sure, Rob created the task with the check boxes. I think from a template.

Dzahn added a comment.Jun 7 2018, 12:39 PM

I used wmf-auto-reimage so that was able to use the mgmt interface to install. I can also confirm i get a console. But i'm' not sure if anything else in BIOS needs to be checked.

Dzahn added a comment.Jun 7 2018, 12:42 PM

I don't have a specific problem with this instance but if this is a checkbox item that is supposed to happen on each re-assignment (like to set it back to the right BIOS setting defaults) then feel free to reboot that any time. The system isn't serving anything yet.

Change 438235 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] phabricator: set service IPs for phab1002 in Hiera

https://gerrit.wikimedia.org/r/438235

Change 438235 merged by Dzahn:
[operations/puppet@production] phabricator: set service IPs for phab1002 in Hiera

https://gerrit.wikimedia.org/r/438235

Change 437300 merged by Dzahn:
[operations/puppet@production] phabricator: add role to node phab1002

https://gerrit.wikimedia.org/r/437300

Change 437615 merged by Dzahn:
[operations/puppet@production] hiera/phabricator: add phab1002 as phab server

https://gerrit.wikimedia.org/r/437615

Dzahn added a comment.Jun 8 2018, 12:45 PM

role has been applied

these things are done:

https://gerrit.wikimedia.org/r/#/q/topic:phab1002+(status:merged)

but these are still needed:

https://gerrit.wikimedia.org/r/#/q/topic:phab1002+(status:open)

next we need to get the DB grants merged

Change 437613 merged by Marostegui:
[operations/puppet@production] mariadb: add phab1002 to phabricator grants

https://gerrit.wikimedia.org/r/437613

Change 437625 abandoned by Dzahn:
mtail: replace phab1001 with phab1002?

Reason:
host names for tests don't need to reflect actual machine names

https://gerrit.wikimedia.org/r/437625

Mentioned in SAL (#wikimedia-operations) [2018-06-11T10:52:48Z] <mutante> phab1002 - editing cached scap config /srv/deployment/phabricator/deployment-cache/.config to replace tin.eqiad with deploy1001.eqiad deployment server, run puppet. other options: run scap with --refresh-config, delet cached .config file (T196019) (T175288)

Dzahn removed Dzahn as the assignee of this task.Jun 21 2018, 12:09 PM

for current status please see T190568#4305040

Vvjjkkii renamed this task from setup/install phab1002(WMF4727) to cybaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
ArielGlenn renamed this task from cybaaaaaaa to setup/install phab1002(WMF4727).Jul 1 2018, 6:59 AM
ArielGlenn lowered the priority of this task from High to Medium.
ArielGlenn updated the task description. (Show Details)
ArielGlenn added subscribers: gerritbot, Aklapper.
Dzahn claimed this task.Jul 24 2018, 5:46 PM
Dzahn added a comment.Jan 24 2019, 9:03 PM

there has been a meeting (https://wikitech.wikimedia.org/wiki/Phabricator/Meeting_Notes/2019-01-23)

and further progress happened at T190568#4907166 which resolved on of the follow-ups in it and unblocked testing the stretch server phab1002 installation which also stops using mod_php, has PHP 7 and so on.

T195623 is the original hardware request for this ticket and i reopened it because a blocker has been identified, which is "replacement server has 32GB but production server has 64GB RAM and we need 64GB".

So that's kind of stalled on this now.

Dzahn updated the task description. (Show Details)Jan 24 2019, 9:04 PM

I can't speak for the " - bios/drac/serial setup/testing" checkbox but the "add to operations puppet" and "make phabricator role work on stretch" part has been done. updated boxes accordingly.

Dzahn added a comment.Jan 24 2019, 9:07 PM

@RobH See above fyi and i am not sure if i should have reopened the original hardware request as i did on T195623#4903960 or if this ticket here is the most appropriate place because it is still open and about implementing the service on it. But the issue is we would a box with 64GB instead of 32GB RAM whichever way is easier. Let me know and i use the right ticket. Thank!

Cmjohnson updated the task description. (Show Details)Jan 30 2019, 10:13 PM
Dzahn closed this task as Resolved.Mar 12 2019, 12:18 PM

re-closing. replaced by new request in T215335. will be reverted in T215332

Change 437558 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] dumps: switch phab1001->phab1003 as phab dumps source

https://gerrit.wikimedia.org/r/437558

Change 437558 abandoned by Dzahn:
dumps: switch phab1001->phab1003 as phab dumps source

Reason:
merging into https://gerrit.wikimedia.org/r/c/operations/puppet/ /437620

https://gerrit.wikimedia.org/r/437558

Change 511929 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] phabricator: activate read-only mode for maintenance

https://gerrit.wikimedia.org/r/511929

Change 511929 merged by Dzahn:
[operations/puppet@production] phabricator: activate read-only mode for maintenance

https://gerrit.wikimedia.org/r/511929