Page MenuHomePhabricator

Replacement of stat1002 and stat1003
Closed, ResolvedPublic0 Story Points

Description

Boxes out of warranty

Details

Related Gerrit Patches:
operations/puppet : productionRemove stat1003 traces for decom
operations/puppet : productionstat1003: remove puppet configuration as part of decom
operations/puppet : productionRemove stat1002 from puppet as part of decom process
operations/puppet : productionstatistics: re-add working_path variable
operations/puppet : productionRemove stat1002 configuration as part of decom
operations/puppet : productionAllow rsync to dataset1001 for pagecounts-ez
operations/puppet : productionSync published datasets more often, allow users to rsync
operations/puppet : productionInstall libcgi-pm-perl for wikistats 1.0 ezachte
operations/puppet : productionInstall virtualenv bin on stat boxes
operations/puppet : productionMove geowiki from stat1003 to stat1006
operations/puppet : productionstatistics::packages: Add libssl-dev and comments
operations/puppet : productionMove reportupdater jobs from stat1003 -> stat1006
operations/puppet : productionRemove dependency on geowiki public data job
analytics/geowiki : masterRemove checking of public web page data, since it no longer exists
operations/puppet : productionRemove public data geowiki push
analytics/geowiki : masterNo longer push to public data repo, we don't use it
operations/puppet : productionSet up rsync module for /home on stat boxes
operations/puppet : productionUse https gerrit url to clone geowiki data-public on stat1006
operations/puppet : productionBackup eventlogging log data from stat1005 srv-log-eventlogging
operations/puppet : productionlogrotate reportupdater logs as proper user/group
operations/puppet : productionRsync MW API logs to stat1005
operations/puppet : productionMove refinery::job::data_check to stat1005 from stat1002
operations/puppet : productionRun reportupdater::jobs::hadoop from stat1005 instead of stat1002
operations/puppet : productionUse conditionals instead of new role files to deal with stat box migration
operations/puppet : productionAH, yes, statistics-privatedata-users should be on cruncher, it is a superset of perms
operations/puppet : productionRemove statistics-privatedata-users from stat1006.yaml (cruncher)
operations/puppet : productionAdd groups to stat1005
operations/puppet : productionApply role statistics::private_new to stat1005
operations/puppet : productionMove more stuff into profile::statistics::private
operations/puppet : productionOwn mediawiki/core checkout as stats
operations/puppet : productionMySQL client for stastistics::packages in stretch, /srv/mediawiki dir
operations/puppet : productionPackage fixes for stat boxes to stretch
operations/puppet : productiongeowiki already uses /srv on stat1003, so we can change the backup director path now
operations/puppet : productionRemove references to geowiki::params
operations/puppet : productionApply cruncher_new role to stat1006
operations/puppet : productionAdd statistics private, cruncher and web profiles, with a little refactoring
operations/puppet : productionRefactor geowiki a bit and make a geowiki profile
operations/puppet : productionCreate reportupdater::jobs profiles for stat boxes
operations/puppet : productionUse stretch for stat1006
operations/puppet : productionInstall hunspell-en-us instead of myspell-en-us in Stretch
operations/puppet : productionInstall jupyter-notebook for stretch
operations/puppet : productionDon't try to set up rsync server for hdfs archive on stat1005 yet
operations/puppet : productionAdd stat1005 in site.pp
operations/puppet : productionPrep for stat100[56]
operations/puppet : productionUse pulls rather than updates to pull cloudera jessie packages into stretch-wikimedia
operations/puppet : productionAdd cloudera-stretch to distributions-wikimedia for stretch-wikimedia
operations/puppet : productionImport cloudera jessie packages into a stretch wikimedia thirdparty component

Related Objects

StatusAssignedTask
ResolvedNone
DeclinedOttomata
ResolvedOttomata
ResolvedRobH
ResolvedOttomata
DeclinedNone
ResolvedRobH
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
Resolvedmpopov
Resolvedmforns
ResolvedOttomata
ResolvedOttomata
ResolvedAddshore
ResolvedGoranSMilovanovic
Resolvedelukey
DeclinedOttomata
ResolvedOttomata
ResolvedErik_Zachte
ResolvedCmjohnson
ResolvedCmjohnson

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 364814 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] AH, yes, statistics-privatedata-users should be on cruncher, it is a superset of perms

https://gerrit.wikimedia.org/r/364814

Change 364814 merged by Ottomata:
[operations/puppet@production] AH, yes, statistics-privatedata-users should be on cruncher, it is a superset of perms

https://gerrit.wikimedia.org/r/364814

Change 364817 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use conditionals instead of new role files to deal with stat box migration

https://gerrit.wikimedia.org/r/364817

Change 364817 merged by Ottomata:
[operations/puppet@production] Use conditionals instead of new role files to deal with stat box migration

https://gerrit.wikimedia.org/r/364817

Change 364823 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Run reportupdater::jobs::hadoop from stat1005 instead of stat1002

https://gerrit.wikimedia.org/r/364823

Change 364823 merged by Ottomata:
[operations/puppet@production] Run reportupdater::jobs::hadoop from stat1005 instead of stat1002

https://gerrit.wikimedia.org/r/364823

RobH removed a subscriber: RobH.Jul 12 2017, 7:11 PM

Change 364826 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Move refinery::job::data_check to stat1005 from stat1002

https://gerrit.wikimedia.org/r/364826

Change 364826 merged by Ottomata:
[operations/puppet@production] Move refinery::job::data_check to stat1005 from stat1002

https://gerrit.wikimedia.org/r/364826

Change 364829 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Rsync MW API logs to stat1005

https://gerrit.wikimedia.org/r/364829

Change 364829 merged by Ottomata:
[operations/puppet@production] Rsync MW API logs to stat1005

https://gerrit.wikimedia.org/r/364829

Change 365614 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] logrotate reportupdater logs as proper user/group

https://gerrit.wikimedia.org/r/365614

Change 365614 merged by Ottomata:
[operations/puppet@production] logrotate reportupdater logs as proper user/group

https://gerrit.wikimedia.org/r/365614

Change 365634 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Backup eventlogging log data from stat1005 srv-log-eventlogging

https://gerrit.wikimedia.org/r/365634

Change 365634 abandoned by Ottomata:
Backup eventlogging log data from stat1005 srv-log-eventlogging

Reason:
Ah, I forgot, we were going to stop backing up this data, since it exists on 3 different servers.

https://gerrit.wikimedia.org/r/365634

Change 365640 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Move geowiki from stat1003 to stat1006

https://gerrit.wikimedia.org/r/365640

Change 365640 merged by Ottomata:
[operations/puppet@production] Move geowiki from stat1003 to stat1006

https://gerrit.wikimedia.org/r/365640

Change 365666 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use https gerrit url to clone geowiki data-public on stat1006

https://gerrit.wikimedia.org/r/365666

Change 365666 merged by Ottomata:
[operations/puppet@production] Use https gerrit url to clone geowiki data-public on stat1006

https://gerrit.wikimedia.org/r/365666

Change 365668 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Set up rsync module for /home on stat boxes

https://gerrit.wikimedia.org/r/365668

Change 365668 merged by Ottomata:
[operations/puppet@production] Set up rsync module for /home on stat boxes

https://gerrit.wikimedia.org/r/365668

Change 365669 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/geowiki@master] No longer push to public data repo, we don't use it

https://gerrit.wikimedia.org/r/365669

Change 365670 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove public data geowiki push

https://gerrit.wikimedia.org/r/365670

Change 365669 merged by Ottomata:
[analytics/geowiki@master] No longer push to public data repo, we don't use it

https://gerrit.wikimedia.org/r/365669

Change 365670 merged by Ottomata:
[operations/puppet@production] Remove public data geowiki push

https://gerrit.wikimedia.org/r/365670

Change 365671 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/geowiki@master] Remove checking of public web page data, since it no longer exists

https://gerrit.wikimedia.org/r/365671

Change 365671 merged by Ottomata:
[analytics/geowiki@master] Remove checking of public web page data, since it no longer exists

https://gerrit.wikimedia.org/r/365671

Change 365672 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove dependency on geowiki public data job

https://gerrit.wikimedia.org/r/365672

Change 365672 merged by Ottomata:
[operations/puppet@production] Remove dependency on geowiki public data job

https://gerrit.wikimedia.org/r/365672

Change 365684 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Move reportupdater jobs from stat1003 -> stat1006

https://gerrit.wikimedia.org/r/365684

Change 365684 merged by Ottomata:
[operations/puppet@production] Move reportupdater jobs from stat1003 -> stat1006

https://gerrit.wikimedia.org/r/365684

Dzahn added a subscriber: Dzahn.Jul 17 2017, 9:36 PM

The new server ran out of disk space.

13:29 < icinga-wm> PROBLEM - Disk space on stat1006 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=95%)

Mentioned in SAL (#wikimedia-operations) [2017-07-18T07:32:10Z] <elukey> moved /home to /srv/home on stat1006 to free disk space (created symling from /home -> /srv/home too) - T152712

Change 366107 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[operations/puppet@production] statistics::packages: Add libssl-dev and comments

https://gerrit.wikimedia.org/r/366107

Change 366107 merged by Ottomata:
[operations/puppet@production] statistics::packages: Add libssl-dev and comments

https://gerrit.wikimedia.org/r/366107

Dzahn removed a subscriber: Dzahn.Jul 26 2017, 3:04 PM
leila added a subscriber: leila.Jul 27 2017, 9:59 PM

@Ottomata I checked stat1002:/a/. Can you copy psinger's folder to stat1005? that's my only request. thanks.

Done. /a/$USER directories from stat1002 are in /srv/stat1002-a/user_dirs_from_stat1002.

stat1002 has been powered off.

Hey, could you please install pip on stat1005?

Change 368461 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install virtualenv bin on stat boxes

https://gerrit.wikimedia.org/r/368461

Change 368461 merged by Ottomata:
[operations/puppet@production] Install virtualenv bin on stat boxes

https://gerrit.wikimedia.org/r/368461

Change 368612 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove stat1002 configuration as part of decom

https://gerrit.wikimedia.org/r/368612

Change 368763 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install libcgi-pm-perl for wikistats 1.0 ezachte

https://gerrit.wikimedia.org/r/368763

Change 368763 merged by Ottomata:
[operations/puppet@production] Install libcgi-pm-perl for wikistats 1.0 ezachte

https://gerrit.wikimedia.org/r/368763

Change 368794 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Sync published datasets more often, and allow users to rsync to speed up the process.

https://gerrit.wikimedia.org/r/368794

Change 368794 merged by Ottomata:
[operations/puppet@production] Sync published datasets more often, allow users to rsync

https://gerrit.wikimedia.org/r/368794

@Catrope just emailed:

I would love to migrate to stat1006 from stat1003, but stat1006 is unusably slow right now while stat1003 is snappy. Connecting to analytics-store from stat1006 times out sometimes, and even when it doesn't, simple DESCRIBE queries take 5-15 seconds. I just talked to Adam Wight on IRC and he's experiencing similar issues.

Really!? What is slow is the MySQL connection? Not actual usage of stat1006, right?

I can't seem to reproduce...

@Catrope just emailed:

I would love to migrate to stat1006 from stat1003, but stat1006 is unusably slow right now while stat1003 is snappy. Connecting to analytics-store from stat1006 times out sometimes, and even when it doesn't, simple DESCRIBE queries take 5-15 seconds. I just talked to Adam Wight on IRC and he's experiencing similar issues.

Really!? What is slow is the MySQL connection? Not actual usage of stat1006, right?

This seems to have been fixed now, it's not slow any more.

Change 370478 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Allow rsync to dataset1001 for pagecounts-ez

https://gerrit.wikimedia.org/r/370478

Change 370478 merged by Ottomata:
[operations/puppet@production] Allow rsync to dataset1001 for pagecounts-ez

https://gerrit.wikimedia.org/r/370478

Change 368612 merged by Elukey:
[operations/puppet@production] Remove stat1002 configuration as part of decom

https://gerrit.wikimedia.org/r/368612

Change 371487 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] statistics: re-add working_path variable

https://gerrit.wikimedia.org/r/371487

Change 371487 merged by Elukey:
[operations/puppet@production] statistics: re-add working_path variable

https://gerrit.wikimedia.org/r/371487

Change 371486 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove stat1002 from puppet as part of decom process

https://gerrit.wikimedia.org/r/371486

Change 371486 merged by Elukey:
[operations/puppet@production] Remove stat1002 from puppet as part of decom process

https://gerrit.wikimedia.org/r/371486

There is some cronspam from stat1006:

Cron Daemon root@stat1006.eqiad.wmnet via wikimedia.org

4:30 PM (1 hour ago)

to stats
Error: Value 54312 (2017-08-15) below lower absolute threshold 55000 for column 'Global North (5+)' of global_south (stride: 7)

There is some cronspam from stat1006:
Cron Daemon root@stat1006.eqiad.wmnet via wikimedia.org
4:30 PM (1 hour ago)
to stats
Error: Value 54312 (2017-08-15) below lower absolute threshold 55000 for column 'Global North (5+)' of global_south (stride: 7)

I have been trying to fix it for the past two weeks, probably going to move it to analytics-alerts@ to reduce the noise to ops :)

For the records I created https://phabricator.wikimedia.org/T173486 and moved the cron alert to analytics-alerts@.

Change 374332 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] stat1003: remove puppet configuration as part of decom

https://gerrit.wikimedia.org/r/374332

Today I rsynced /home from stat1003 -> stat1006 with rsync -av --update stat1003.eqiad.wmnet::home/ /home/. Only files that either did not exist on stat1006 or have a newer modification timestamp on stat1003 were copied over. The list of files that were copied is here: https://gist.github.com/ottomata/2743d43188d7d7446a133dac656b12bc

Change 374332 merged by Elukey:
[operations/puppet@production] stat1003: remove puppet configuration as part of decom

https://gerrit.wikimedia.org/r/374332

elukey added a comment.Sep 5 2017, 5:08 PM

stat1003 is official not a analytics host anymore and ssh keys have been removed accordingly, everything (including your home dirs) should already be on stat1006

Change 376248 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove stat1003 traces for decom

https://gerrit.wikimedia.org/r/376248

Change 376248 merged by Elukey:
[operations/puppet@production] Remove stat1003 traces for decom

https://gerrit.wikimedia.org/r/376248

Dzahn changed the status of subtask T173097: Decommission stat1002.eqiad.wmnet from Open to Stalled.Sep 7 2017, 2:36 PM
Nuria set the point value for this task to 0.Sep 14 2017, 4:16 PM

Puppet/Salt cleaned from stat1002/3, the last steps are for DC-ops in the related tasks.

elukey moved this task from In Progress to Done on the Analytics-Kanban board.Sep 15 2017, 8:29 AM
Nuria closed this task as Resolved.Sep 19 2017, 5:52 PM
Tbayer mentioned this in Unknown Object (Task).Jun 25 2018, 9:34 PM