Page MenuHomePhabricator

rack/setup new host graphite2002
Closed, ResolvedPublic

Description

This task is to track the racking and deployment of the new graphite2002 system. This system was ordered on the spare pool order T130743, allocating one of the six systems from that order. The original hardware request is on T126253.

  • - receive in system normally on T130743 (includes asset tagging, racktables addition)
  • - rack in either c5 or d5 (as those are misc services racks, and rows a & b are quite full in comparison) the server is racked in c5
  • - add asset tag and hostname for mgmt dns
  • - add internal ip production dns entries
  • - update this task with the port# of the system (can ping @RobH to update switch stack)
  • - update install_server with mac info and partitioning (use raided setup with /srv in ext4)
  • - install os (Trusty, as graphite2001 is Trusty.)
  • - sign puppt/salt
  • - service implementation (hand off to @fgiunchedi as initial requestor on T126253)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH added a subtask: Unknown Object (Task).Mar 25 2016, 5:27 PM

mgmt 10.193.2.251
port info row 5 rack C5 ge-5/0/6

merged DNS changes

https://gerrit.wikimedia.org/r/280807
https://gerrit.wikimedia.org/r/280808/

[radon:~] $ host graphite2002.codfw.wmnet
graphite2002.codfw.wmnet has address 10.192.32.140

[radon:~] $ host graphite2002.mgmt.codfw.wmnet
graphite2002.mgmt.codfw.wmnet has address 10.193.2.251

Change 280809 had a related patch set uploaded (by Dzahn):
install-server: Add graphite2002 MAC address

https://gerrit.wikimedia.org/r/280809

Change 280809 merged by Dzahn:
install-server: Add graphite2002 MAC address

https://gerrit.wikimedia.org/r/280809

Change 280862 had a related patch set uploaded (by Filippo Giunchedi):
install_server: add graphite2002 partman

https://gerrit.wikimedia.org/r/280862

Change 280862 merged by Filippo Giunchedi:
install_server: add graphite2002 partman

https://gerrit.wikimedia.org/r/280862

fgiunchedi added a subscriber: Papaul.

I've updated the switch port, the host is missing 4x SSDs afaics

~ # cat /proc/partitions 
major minor  #blocks  name

   8        0  976762584 sda
   8        1   48827392 sda1
   8        2  927933440 sda2
   8       16  976762584 sdb
   8       17   48827392 sdb1
   8       18  927933440 sdb2
~ # cat /sys/block/sda/device/model
ST91000640NS    
~ # cat /sys/block/sdb/device/model
ST91000640NS    
~ #

to clarify, @RobH we're still missing the 4x SSDs for this order?

We didn't get a task in for this, so I'll create one shortly. Updating from IRC chat with Filippo:

from @godog's POV anything that gives >= 2TB usable is fine
we'll start at ~850GB used

small correction on the ~850GB figure I gave there:

here's the current usage

graphite1001:/var/lib/carbon/whisper$ du -hcs * | grep G | sort -rn
894G	total
562G	cassandra
151G	servers
38G	varnishkafka
31G	MediaWiki
28G	zuul
18G	frontend
12G	kafka
7.4G	varnish
6.9G	Hadoop
6.6G	restbase
6.6G	jobrunner
6.4G	reqstats
4.8G	swift
4.0G	webpagetest
3.0G	kartotherian
1.4G	tilerator
1.4G	statsd
1.3G	mw
1.1G	restbase-test

mediawiki at the moment has xhprof disabled, it used to use ~260GB.

So if we move cassandra to the new machine we'll even things out, in that case we'll start at 562GB used. For cassandra it takes ~12GB/instance, with 3x instances per machine that's 54 instances * 12GB = 648GB used. So anything > 2TB gives < 50% utilized.
What will be left on graphite1001 is mediawiki at ~300GB minus 562GB cassandra, or ~633GB used i.e. 65%

RobH mentioned this in Unknown Object (Task).Apr 1 2016, 5:11 PM
RobH edited subtasks, added: Unknown Object (Task); removed: Unknown Object (Task).

I am getting the message below during installation, I chat with Filippo on IRC, he will be taking over the installation.
No file system is specified for partition #1 of LVM VG │

│ graphite2002-vg, LV carbon.                                   │
│                                                               │
│ If you do not go back to the partitioning menu and assign a fi│
│ system to this partition, it won't be used at all.            │
│                                                               │
│ Go back to the menu?                                          │
│                                                               │
│     <Go Back>                                   <Yes>    <No>

machine is in service, resolving

RobH closed subtask Unknown Object (Task) as Resolved.Oct 12 2016, 5:48 PM