Page MenuHomePhabricator

Replacement of stat1002 and stat1003
Closed, ResolvedPublic0 Estimated Story Points

Description

Boxes out of warranty

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+0 -2
operations/puppetproduction+19 -50
operations/puppetproduction+4 -66
operations/puppetproduction+2 -0
operations/puppetproduction+12 -35
operations/puppetproduction+1 -1
operations/puppetproduction+42 -8
operations/puppetproduction+1 -0
operations/puppetproduction+1 -0
operations/puppetproduction+5 -2
operations/puppetproduction+24 -25
operations/puppetproduction+4 -7
operations/puppetproduction+2 -4
analytics/geowikimaster+2 -129
operations/puppetproduction+1 -10
analytics/geowikimaster+1 -10
operations/puppetproduction+9 -0
operations/puppetproduction+1 -1
operations/puppetproduction+10 -6
operations/puppetproduction+4 -1
operations/puppetproduction+12 -7
operations/puppetproduction+3 -3
operations/puppetproduction+1 -1
operations/puppetproduction+15 -66
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+8 -2
operations/puppetproduction+54 -37
operations/puppetproduction+21 -24
operations/puppetproduction+2 -2
operations/puppetproduction+18 -3
operations/puppetproduction+103 -18
operations/puppetproduction+1 -2
operations/puppetproduction+3 -4
operations/puppetproduction+31 -1
operations/puppetproduction+176 -112
operations/puppetproduction+78 -66
operations/puppetproduction+133 -117
operations/puppetproduction+2 -0
operations/puppetproduction+8 -1
operations/puppetproduction+7 -1
operations/puppetproduction+1 -5
operations/puppetproduction+35 -0
operations/puppetproduction+19 -12
operations/puppetproduction+18 -15
operations/puppetproduction+2 -2
operations/puppetproduction+14 -0
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
DeclinedOttomata
ResolvedOttomata
ResolvedRobH
ResolvedOttomata
DeclinedNone
ResolvedRobH
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
Resolvedmpopov
Resolvedmforns
ResolvedOttomata
ResolvedOttomata
ResolvedAddshore
ResolvedGoranSMilovanovic
Resolvedelukey
DeclinedOttomata
ResolvedOttomata
ResolvedErik_Zachte
Resolved Cmjohnson
Resolved Cmjohnson

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 364814 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] AH, yes, statistics-privatedata-users should be on cruncher, it is a superset of perms

https://gerrit.wikimedia.org/r/364814

Change 364814 merged by Ottomata:
[operations/puppet@production] AH, yes, statistics-privatedata-users should be on cruncher, it is a superset of perms

https://gerrit.wikimedia.org/r/364814

Change 364817 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use conditionals instead of new role files to deal with stat box migration

https://gerrit.wikimedia.org/r/364817

Change 364817 merged by Ottomata:
[operations/puppet@production] Use conditionals instead of new role files to deal with stat box migration

https://gerrit.wikimedia.org/r/364817

Change 364823 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Run reportupdater::jobs::hadoop from stat1005 instead of stat1002

https://gerrit.wikimedia.org/r/364823

Change 364823 merged by Ottomata:
[operations/puppet@production] Run reportupdater::jobs::hadoop from stat1005 instead of stat1002

https://gerrit.wikimedia.org/r/364823

Change 364826 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Move refinery::job::data_check to stat1005 from stat1002

https://gerrit.wikimedia.org/r/364826

Change 364826 merged by Ottomata:
[operations/puppet@production] Move refinery::job::data_check to stat1005 from stat1002

https://gerrit.wikimedia.org/r/364826

Change 364829 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Rsync MW API logs to stat1005

https://gerrit.wikimedia.org/r/364829

Change 364829 merged by Ottomata:
[operations/puppet@production] Rsync MW API logs to stat1005

https://gerrit.wikimedia.org/r/364829

Change 365614 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] logrotate reportupdater logs as proper user/group

https://gerrit.wikimedia.org/r/365614

Change 365614 merged by Ottomata:
[operations/puppet@production] logrotate reportupdater logs as proper user/group

https://gerrit.wikimedia.org/r/365614

Change 365634 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Backup eventlogging log data from stat1005 srv-log-eventlogging

https://gerrit.wikimedia.org/r/365634

Change 365634 abandoned by Ottomata:
Backup eventlogging log data from stat1005 srv-log-eventlogging

Reason:
Ah, I forgot, we were going to stop backing up this data, since it exists on 3 different servers.

https://gerrit.wikimedia.org/r/365634

Change 365640 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Move geowiki from stat1003 to stat1006

https://gerrit.wikimedia.org/r/365640

Change 365640 merged by Ottomata:
[operations/puppet@production] Move geowiki from stat1003 to stat1006

https://gerrit.wikimedia.org/r/365640

Change 365666 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use https gerrit url to clone geowiki data-public on stat1006

https://gerrit.wikimedia.org/r/365666

Change 365666 merged by Ottomata:
[operations/puppet@production] Use https gerrit url to clone geowiki data-public on stat1006

https://gerrit.wikimedia.org/r/365666

Change 365668 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Set up rsync module for /home on stat boxes

https://gerrit.wikimedia.org/r/365668

Change 365668 merged by Ottomata:
[operations/puppet@production] Set up rsync module for /home on stat boxes

https://gerrit.wikimedia.org/r/365668

Change 365669 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/geowiki@master] No longer push to public data repo, we don't use it

https://gerrit.wikimedia.org/r/365669

Change 365670 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove public data geowiki push

https://gerrit.wikimedia.org/r/365670

Change 365669 merged by Ottomata:
[analytics/geowiki@master] No longer push to public data repo, we don't use it

https://gerrit.wikimedia.org/r/365669

Change 365670 merged by Ottomata:
[operations/puppet@production] Remove public data geowiki push

https://gerrit.wikimedia.org/r/365670

Change 365671 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/geowiki@master] Remove checking of public web page data, since it no longer exists

https://gerrit.wikimedia.org/r/365671

Change 365671 merged by Ottomata:
[analytics/geowiki@master] Remove checking of public web page data, since it no longer exists

https://gerrit.wikimedia.org/r/365671

Change 365672 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove dependency on geowiki public data job

https://gerrit.wikimedia.org/r/365672

Change 365672 merged by Ottomata:
[operations/puppet@production] Remove dependency on geowiki public data job

https://gerrit.wikimedia.org/r/365672

Change 365684 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Move reportupdater jobs from stat1003 -> stat1006

https://gerrit.wikimedia.org/r/365684

Change 365684 merged by Ottomata:
[operations/puppet@production] Move reportupdater jobs from stat1003 -> stat1006

https://gerrit.wikimedia.org/r/365684

The new server ran out of disk space.

13:29 < icinga-wm> PROBLEM - Disk space on stat1006 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=95%)

Mentioned in SAL (#wikimedia-operations) [2017-07-18T07:32:10Z] <elukey> moved /home to /srv/home on stat1006 to free disk space (created symling from /home -> /srv/home too) - T152712

Change 366107 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[operations/puppet@production] statistics::packages: Add libssl-dev and comments

https://gerrit.wikimedia.org/r/366107

Change 366107 merged by Ottomata:
[operations/puppet@production] statistics::packages: Add libssl-dev and comments

https://gerrit.wikimedia.org/r/366107

@Ottomata I checked stat1002:/a/. Can you copy psinger's folder to stat1005? that's my only request. thanks.

Done. /a/$USER directories from stat1002 are in /srv/stat1002-a/user_dirs_from_stat1002.

stat1002 has been powered off.

Hey, could you please install pip on stat1005?

Change 368461 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install virtualenv bin on stat boxes

https://gerrit.wikimedia.org/r/368461

Change 368461 merged by Ottomata:
[operations/puppet@production] Install virtualenv bin on stat boxes

https://gerrit.wikimedia.org/r/368461

Change 368612 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove stat1002 configuration as part of decom

https://gerrit.wikimedia.org/r/368612

Change 368763 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install libcgi-pm-perl for wikistats 1.0 ezachte

https://gerrit.wikimedia.org/r/368763

Change 368763 merged by Ottomata:
[operations/puppet@production] Install libcgi-pm-perl for wikistats 1.0 ezachte

https://gerrit.wikimedia.org/r/368763

Change 368794 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Sync published datasets more often, and allow users to rsync to speed up the process.

https://gerrit.wikimedia.org/r/368794

Change 368794 merged by Ottomata:
[operations/puppet@production] Sync published datasets more often, allow users to rsync

https://gerrit.wikimedia.org/r/368794

@Catrope just emailed:

I would love to migrate to stat1006 from stat1003, but stat1006 is unusably slow right now while stat1003 is snappy. Connecting to analytics-store from stat1006 times out sometimes, and even when it doesn't, simple DESCRIBE queries take 5-15 seconds. I just talked to Adam Wight on IRC and he's experiencing similar issues.

Really!? What is slow is the MySQL connection? Not actual usage of stat1006, right?

@Catrope just emailed:

I would love to migrate to stat1006 from stat1003, but stat1006 is unusably slow right now while stat1003 is snappy. Connecting to analytics-store from stat1006 times out sometimes, and even when it doesn't, simple DESCRIBE queries take 5-15 seconds. I just talked to Adam Wight on IRC and he's experiencing similar issues.

Really!? What is slow is the MySQL connection? Not actual usage of stat1006, right?

This seems to have been fixed now, it's not slow any more.

Change 370478 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Allow rsync to dataset1001 for pagecounts-ez

https://gerrit.wikimedia.org/r/370478

Change 370478 merged by Ottomata:
[operations/puppet@production] Allow rsync to dataset1001 for pagecounts-ez

https://gerrit.wikimedia.org/r/370478

Change 368612 merged by Elukey:
[operations/puppet@production] Remove stat1002 configuration as part of decom

https://gerrit.wikimedia.org/r/368612

Change 371487 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] statistics: re-add working_path variable

https://gerrit.wikimedia.org/r/371487

Change 371487 merged by Elukey:
[operations/puppet@production] statistics: re-add working_path variable

https://gerrit.wikimedia.org/r/371487

Change 371486 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove stat1002 from puppet as part of decom process

https://gerrit.wikimedia.org/r/371486

Change 371486 merged by Elukey:
[operations/puppet@production] Remove stat1002 from puppet as part of decom process

https://gerrit.wikimedia.org/r/371486

There is some cronspam from stat1006:

Cron Daemon root@stat1006.eqiad.wmnet via wikimedia.org

4:30 PM (1 hour ago)

to stats
Error: Value 54312 (2017-08-15) below lower absolute threshold 55000 for column 'Global North (5+)' of global_south (stride: 7)

There is some cronspam from stat1006:

Cron Daemon root@stat1006.eqiad.wmnet via wikimedia.org

4:30 PM (1 hour ago)

to stats
Error: Value 54312 (2017-08-15) below lower absolute threshold 55000 for column 'Global North (5+)' of global_south (stride: 7)

I have been trying to fix it for the past two weeks, probably going to move it to analytics-alerts@ to reduce the noise to ops :)

For the records I created https://phabricator.wikimedia.org/T173486 and moved the cron alert to analytics-alerts@.

Change 374332 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] stat1003: remove puppet configuration as part of decom

https://gerrit.wikimedia.org/r/374332

Today I rsynced /home from stat1003 -> stat1006 with rsync -av --update stat1003.eqiad.wmnet::home/ /home/. Only files that either did not exist on stat1006 or have a newer modification timestamp on stat1003 were copied over. The list of files that were copied is here: https://gist.github.com/ottomata/2743d43188d7d7446a133dac656b12bc

Change 374332 merged by Elukey:
[operations/puppet@production] stat1003: remove puppet configuration as part of decom

https://gerrit.wikimedia.org/r/374332

stat1003 is official not a analytics host anymore and ssh keys have been removed accordingly, everything (including your home dirs) should already be on stat1006

Change 376248 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove stat1003 traces for decom

https://gerrit.wikimedia.org/r/376248

Change 376248 merged by Elukey:
[operations/puppet@production] Remove stat1003 traces for decom

https://gerrit.wikimedia.org/r/376248

Nuria set the point value for this task to 0.Sep 14 2017, 4:16 PM

Puppet/Salt cleaned from stat1002/3, the last steps are for DC-ops in the related tasks.

Tbayer mentioned this in Unknown Object (Task).Jun 25 2018, 9:34 PM