Page MenuHomePhabricator

Procure and setup rbf2001-2002
Closed, ResolvedPublic

Description

This is the second redis cluster, it should be as the similar cluster in eqiad - see https://racktables.wikimedia.org/index.php?page=object&tab=default&object_id=1026

Event Timeline

Joe raised the priority of this task from to Needs Triage.
Joe updated the task description. (Show Details)
Joe added a project: ops-codfw.
Joe added subscribers: Aklapper, Joe.
Joe set Security to None.

So these are dual cpu (4 core) systems with 32GB of memory. My spare systems in codfw are slightly better, but will work.

Allocating:

Dell PowerEdge R420, Dual Intel Xeon E5-2440, 32 GB Memory, (2) 500GB Disks

rbf2001
wmf5849
a5-codfw

rbf2002
wmf5833
b5-codfw

RobH triaged this task as Medium priority.Jan 15 2015, 4:52 PM
RobH changed the edit policy from "All Users" to "WMF-NDA (Project)".
RobH mentioned this in Unknown Object (Diffusion Commit).Jan 15 2015, 5:46 PM

rbf2001 is installed and ready for service implementation

rbf2002 is having install issues detecting disks, and I need to further troubleshoot the installation.

daniel's setup rbf2001 via the linked task for service implementation https://phabricator.wikimedia.org/T86898

rbf2002 is now online with OS, awaiting puppet signing

dzahn@iron:~$ ssh root@rbf2001.mgmt
root@rbf2001.mgmt's password: 

dzahn@iron:~$ ssh root@rbf2002.mgmt
ssh: Could not resolve hostname rbf2002.mgmt: Name or service not known

^ hmm? odd? at a quick glance i see it in DNS zones though.

i can SSH to the host, but something is wrong about the mgmt entry in DNS it seems

i tried to install Debian on rbf2001 and the installer claims:

┌────────────┤ [!!] Download debconf preconfiguration file ├────────────┐
 │                                                                       │
 │                         Malformed IP address                          │
 │ The IP address you provided is malformed. It should be in the form    │
 │ x.x.x.x where each 'x' is no larger than 255 (an IPv4 address), or a  │
 │ sequence of blocks of hexadecimal digits separated by colons (an IPv6 │
 │ address).

the change was just adding the options for jessie:

https://gerrit.wikimedia.org/r/#/c/188932/5/modules/install-server/files/dhcpd/linux-host-entries.ttyS1-115200

dzahn@iron:~$ ssh root@rbf2002.mgmt
ssh: Could not resolve hostname rbf2002.mgmt: Name or service not known

see https://gerrit.wikimedia.org/r/#/c/188669/1/templates/10.in-addr.arpa
and https://gerrit.wikimedia.org/r/#/c/188669/1/templates/wmnet

i tried to install Debian on rbf2001 and the installer claims:

│ The IP address you provided is malformed.

https://gerrit.wikimedia.org/r/#/c/188092/1/templates/wmnet

?

The DNS issues for rbf2001 and rbf2002 mgmt have been fixed.

However, rbf2002.mgmt is on 10.193.2.118, and its not responsive to ping or ssh (both via fqdn or direct ip) I've reopened the blocking ticket (T88380) for repair of rbf2002 mgmt interface settings and connection.

RobH subscribed.

Well, we know that the install worked in Ubuntu before (since I had installed ubuntu on rbf2001). I'm not sure what issue would arise for its production DNS, as it all appears correct.

That being said, I did clear out all negatively cached entries on the recursors, perhaps try again?

The DNS issues for rbf2001 and rbf2002 mgmt have been fixed.

I don't see the change on iron yet.I''ll check later again.

dzahn@iron:~$ host rbf2001.mgmt
rbf2001.mgmt.codfw.wmnet has address 10.193.2.116
rbf2001.mgmt.codfw.wmnet has address 10.193.2.118

dzahn@iron:~$ host rbf2002.mgmt
Host rbf2002.mgmt not found: 3(NXDOMAIN)

Well, we know that the install worked in Ubuntu before (since I had installed ubuntu on rbf2001). I'm not sure what issue would arise for its production DNS, as it all appears correct.

That being said, I did clear out all negatively cached entries on the recursors, perhaps try again?

Thanks, i tried again and it did change, but to this:

┌──────────┤ [!!] Download debconf preconfiguration file ├──────────┐
     │                                                                   │
     │                  Failed to run preseeded command                  │
     │ Execution of preseeded command "wget -O /tmp/early_command        │
     │ http://apt.wikimedia.org/autoinstall/scripts/early_command && sh  │
     │ /tmp/early_command" failed with exit code 10.                     │
     │                                                                   │
     │     <Go Back>                                      <Continue>     │
     │                                                                   │
     └───────────────────────────────────────────────────────────────────┘

let's figure this out after the weekend

Dzahn removed Dzahn as the assignee of this task.Feb 7 2015, 12:41 AM

yea, no. i'm getting he " Malformed IP address " thing again...

faidon raised the priority of this task from Medium to High.Feb 11 2015, 9:49 AM
Dzahn closed this task as Declined.EditedMar 12 2015, 12:02 AM
Dzahn claimed this task.

The issue is still unchanged. I attempted another reinstall of rbf2001 and:

│                         Malformed IP address                          │
│ The IP address you provided is malformed. It should be in the form    │
│ x.x.x.x where each 'x' is no larger than 255 (an IPv4 address), or a  │
│ sequence of blocks of hexadecimal digits separated by colons (an IPv6 │
│ address). Please try again.                                           │

after the installer detects link on eth2 and eth3

BusyBox v1.22.1 (Debian 1:1.22.0-15) built-in shell (ash)
Enter 'help' for a list of built-in commands.

~ # ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 90:b1:1c:2d:85:70 brd ff:ff:ff:ff:ff:ff
    inet 10.192.0.33/22 scope global eth3
       valid_lft forever preferred_lft forever
3: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 90:b1:1c:2d:85:71 brd ff:ff:ff:ff:ff:ff
5155 host rbf2001 {
5156     hardware ethernet 90:B1:1C:2D:85:70;
5157     fixed-address rbf2001.codfw.wmnet;

2 is eth3 with scope global eth3
3 is eth2 without IP ?

Change 196138 had a related patch set uploaded (by Dzahn):
rbf2001: use eth2 MAC for DHCP

https://gerrit.wikimedia.org/r/196138

i tried to use eth2 and " Network autoconfiguration failed Your network is probably not using the DHCP protocol. "

Change 196624 had a related patch set uploaded (by Dzahn):
let rbf200x hosts be Ubuntu for now

https://gerrit.wikimedia.org/r/196624

Change 196624 merged by Dzahn:
let rbf200x hosts be Ubuntu for now

https://gerrit.wikimedia.org/r/196624

reinstalled rbf2001 with trusty, re-enabled in icinga:

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=rbf2001&style=detail&nostatusheader

rbf2001 and 2002 are both up and running but with trusty for now (while we can still investigate the problem with jessie on the related rdf2xxx hosts and their ticket