Hi @tramm Any update on this from your side?
@jijiki Yes, of course it can wait, i just realized again the holiday situation.
It looks like the cause wasn't an actual RAID failure but a networking or DNS failure:
History of this host:
almost duplicate of T221125
@wiki_willy You're welcome. I also just added you to root@ mail, mainly because then you receive noc@ mail which is an alias for it. Prepare for a _little_ more mail coming to you inbox now. ;)
your shell account has been created. You should be able to ssh to the following hosts:
gzipping all files in /var/log/zuul that were not already gzipped saved almost 10G. usage of / back to 79% from 95%
Icinga alerting again:
cp3033 is shown as having CRIT redundancy
Tue, Apr 23
MariaDB [wikistats]> insert into wikisources (prefix,lang,loclang,loclanglink,method) values ('nap', 'Neapolitan', 'Nnapulitano', 'Lengua napulitana', '8'); Query OK, 1 row affected (0.00 sec)
MariaDB [wikistats]> insert into wikipedias (prefix,lang,loclang,method) values ('hyw', 'Western Armenian', 'Արեւմտահայերէն', '8');
Query OK, 1 row affected (0.00 sec)
There is still an open subtask at T220575
Note that planet does not use planet-venus anymore since T180498
Check and script etc have been removed from the repo and Icinga web UI.
Fri, Apr 19
You should have received the new password a week ago. If not for some reason, please simply reopen the ticket.
- resolved "Phabricator permissions to see NDA and Ops restricted tickets".
Hi @wiki_willy to move forward with your production shell access please create a SSH key. See https://wikitech.wikimedia.org/wiki/Production_shell_access#Generating_your_SSH_key where you can also find a preview of what config you will need. Please make a new keypair used exclusively for WMF production access and paste the public part here on the ticket.
19:10 <+logmsgbot> !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw2150.codfw.wmnet,service=nginx,cluster=jobrunner
17:42 < mutante> !log mw2150,mw2244,mw2245: initial puppet run, added to mw roles
note: phab1001 has more IPs on the interface than phab1003, adding the additional ones doesn't look puppetized !!
created kind of a runbook at https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens
Thu, Apr 18
confirmed membersip of 'wdoran" in wmf group
[mwmaint1002:~] $ ldapsearch -LLLx member=uid=evanp,ou=people,dc=wikimedia,dc=org dn dn: cn=wmf,ou=groups,dc=wikimedia,dc=org
ready to go, handing over to Cole as the weekly clinic-duty person. just needs the existing user to be added to the existing group, has approval
12:01 <+icinga-wm> RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 188.8.131.52, interfaces up: 55, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down 12:02 <+icinga-wm> RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 184.108.40.206, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
Techs have completed splicing and are hands off. It may be necessary to reset your services locally at your equipment. We will now proceed to form an official RFO which we will share at a later time.
@db2047:~# hpssacli ctrl all show config
Device not healthy -SMART-
@RobH Ideally i would like to use the same IP i had used for phab1002, so waiting for that decom task to be past "remove production dns".
ready for non-interrupt steps. if you get past removing the prod DNS entries i may use the same IP for phab1003. that will avoid having to ask DBAs for updating mysql grants.
mw2150 was not in this ticket until now, but it was in site.pp as another spare under the "Former imagescalers" section. added to ticket. checking if it has been reinstalled.
13:40 <+icinga-wm> RECOVERY - mediawiki-installation DSH group on mw2151 is OK: OK https://wikitech.wikimedia.org/wiki/Application_servers%23Apache_setup_checklist
13:49 < mutante> !log mw2151 - scap pull
Wed, Apr 17
" Field tech isolated the fault location and is en route to perform a survey of the damage. "
mw2245 reallocated -> reserved for thumbor (T218323)
re: the next checkbox above " Prioritize which "junk" domains should be in the primary (works for non-SNI) SAN list"