User Details
- User Since
- Dec 17 2014, 7:12 PM (530 w, 3 d)
- Roles
- Disabled, Bot
- LDAP User
- Unknown
- MediaWiki User
- Unknown
Mar 2 2019
Aug 5 2015
Mar 11 2015
Jan 12 2015
Dec 18 2014
workaround:
Host *.mgmt.frack.*
User root
PasswordAuthentication=yes
ChallengeResponseAuthentication=no
GSSAPIAuthentication=no
HostbasedAuthentication=no
PubkeyAuthentication=no
RSAAuthentication=no
Compression=no
ForwardAgent=no
ForwardX11=no
KexAlgorithms=diffie-hellman-group1-sha1
MACs=hmac-md5,hmac-sha1
Ciphers=aes128-cbc,3des-cbc
HostKeyAlgorithms=ssh-rsa,ssh-dss
ProxyCommand /usr/bin/ssh -q -W %h:%p tellurium.wikimedia.org
Issue taken by jgreen
I stand corrected, H310 and config is all, Papaul did the right stuff.
On Tue Dec 09 17:54:59 2014, robh wrote:
Papaul,
The request was to add the 4 SSDs in addition to the existing two HDDs. Now
when I go into bios, I only see the 4 SSDS.Please advise if you mistakenly removed the HDDs. They need to be in the two
lowest disk slots, with the SSDs in the remaining slots.On Tue Dec 09 17:18:55 2014, ptshibamba wrote:
I got some other type of screws that fits at the hardware store was
nut sure
that it was going to fit into the brackets so i just got 20. But it
seems like
those screws works so i will go and get a 100 which is just $4 and
keep on site
in case i need them.the 4 drives are in place
racktable has been updated
mgmt cconfiguration updated as well and visible labelthe server is plug in to ge-5/0/22
i will keep this ticket open onto i put in the reference of the screws
used.~~
Rob Halsell
Operations Engineer
Wikimedia Foundation, Inc.
E-Mail: <rhalsell at wikimedia>
--
Rob Halsell
Operations Engineer
Wikimedia Foundation, Inc.
E-Mail: <rhalsell at wikimedia>
Status changed from 'new' to 'open' by robh
Dependency by ticket #9002 deleted by robh
Papaul,
The request was to add the 4 SSDs in addition to the existing two HDDs. Now
when I go into bios, I only see the 4 SSDS.
Please advise if you mistakenly removed the HDDs. They need to be in the two
lowest disk slots, with the SSDs in the remaining slots.
On Tue Dec 09 17:18:55 2014, ptshibamba wrote:
I got some other type of screws that fits at the hardware store was
nut sure
that it was going to fit into the brackets so i just got 20. But it
seems like
those screws works so i will go and get a 100 which is just $4 and
keep on site
in case i need them.the 4 drives are in place
racktable has been updated
mgmt cconfiguration updated as well and visible labelthe server is plug in to ge-5/0/22
i will keep this ticket open onto i put in the reference of the screws
used.
--
Rob Halsell
Operations Engineer
Wikimedia Foundation, Inc.
E-Mail: <rhalsell at wikimedia>
wiring and racktable complete
Reference by ticket #9002 added by robh
I got some other type of screws that fits at the hardware store was nut sure
that it was going to fit into the brackets so i just got 20. But it seems like
those screws works so i will go and get a 100 which is just $4 and keep on site
in case i need them.
the 4 drives are in place
racktable has been updated
mgmt cconfiguration updated as well and visible label
the server is plug in to ge-5/0/22
i will keep this ticket open onto i put in the reference of the screws used.
i have no screws on site the fets the SSD drives. i wil run by home depot and
see if i can find some.
Dependency by ticket #9002 added by robh
Status changed from 'new' to 'stalled' by reedy
Actually, I think we might want http://packages.ubuntu.com/trusty/ttf-unifont
Asking for clarification
new -> stalled
Subject changed from 'relocated helium to row B and attach new disk shelf' to 'relocatedhelium to row A and attach new disk shelf' by cmjohnson
Status changed from 'new' to 'open' by cmjohnson
After talking with Rob we decided to put in A8 instead and match with codfw
--
Chris Johnson
Operations Engineer
Wikimedia Foundation, Inc
(415) 578-0844
<cmjohnson at wikimedia>
Issue taken by cmjohnson
Dependency on ticket #8986 added by robh
Dependency on ticket #8762 added by robh
Status changed from 'new' to 'resolved' by reedy
Looks to be done...
https://ca.wikimedia.org/w/index.php?title=Special:ListUsers&group=sysop
* Benoit Rochon (Talk | contribs) (bureaucrat, administrator, translation
administrator) (Created on 25 September 2014 at 11:52)
While trying to connect to the pfw's yesterday, we were able to get RX but not
TX. After a couple of hours troubleshooting yesterday I came up with a
temporary solution which was to to put in one of the opengear from Tampa in
rack 8 and use a short cable to connect the opengear to the pfws. This was
tested on pfw1 and it same to work with no problem. this was just a temporary
solution so that Faidon can have some work done.
After leaving site yesterday, this issue was stay in my mine because it did not
make sense .
After configuring the opengear this morning (scs-c8-codfw 10.193.0.20) with the
management users name and password, Faidon was able to access pfw1 and while I
was running some test on pfw2, pfw2 was not working with the solution that
works for pfw1. so my first approach as all technician will do was to verify
the physical layer. the cable was already verified but what we did not verify
was the physical console port. so I took the card out and verify the console
port on the card it same that the three last pins on the console port are lose.
i will have to verify also pfw1 to see if we are having the same problem.
i have attached a picture of the card from pfw2.
Issue taken by ptshibamba
Status changed from 'resolved' to 'open' by Coren
Status changed from 'open' to 'stalled' by dzahn
requirement for this task is:
https://gerrit.wikimedia.org/r/#/c/177120/
a script to allow taking one of the servers at a time out of rotation to
reinstall
As a pastcript, and after IRC conversation, the canonical name for the shelves
should be:
labstore-array0-codfw to labstore-array3-codfw
Status changed from 'new' to 'resolved' by ptshibamba
14:45 <cscott> i don't have a strong opinion on software raid. all of the data
on the machine is just cache, it can be easily reconstructed if there's a disk
failure.
14:49 <cscott> if i had a choice between software raid and a lot more disk
space, i'd like to try a lot more disk space. :)
---
14:42 <cscott> ok, deploy finished. i just increased the cache lifetime for ocg
from 2 days to 4, since we're currently using only 15% of our available cache
space on
/srv.
14:42 <cscott> so it might be good to wait at least 2 days to let that settle
down first
labstore2001 and 2002 are wired .please see below for more info
access port:
labstore2001 = ge-1/0/0
labstore 2002 = ge-1/0/1
IDRAC information
labstore2001 = 10.193.1.248
labstrore2002 = 10.193.1.249
If you have any questions please let me know. Thanks
Issue taken by dzahn
Wiring diagram
Status changed from 'new' to 'open' by RT_System
On Mon Nov 10 19:54:36 2014, dzahn wrote:
as mentioned in ops meeting today and requested by Giuseppe:
reinstall OCG servers with a new partitioning scheme, use LVM, leave
some free
extents, one by one
My suggestions:
- make /, /var/log and /srv as separate partitions, make them based on actual
disk usage (but ./srv is going to be larger in the future, check with cscott)
- use LVM and leave at least a 20% of unallocated space
- maybe keep the software RAID beneath that - I don't really see a reason for
having it, but think it through.
-- Papaul,
Please rack labstore2002 and the 2 associated disk shelves in B1. You will need
to relocate the server labeled labstore3 (feel free to put that in B8 for now.
Once you have that connected you will need to daisy chain labstore2002 and 2
shelves with labstore1001 and 2 shelves.
Chris Johnson
Operations Engineer
Wikimedia Foundation, Inc
(415) 578-0844
<cmjohnson at wikimedia>
Issue taken by ptshibamba
As a fruther note, labstore2001 and labstore2002 need to be rack together with
all four shelves, preferably with room for two more shelves contiguously
(short-term expansion plans).
Both servers need to be chained to all shelves, on the alternate datapaths.
Dependency by ticket #8830 added by Coren
And this: https://gerrit.wikimedia.org/r/#/c/177863/
Queue changed from procurement to codfw by Coren
Reference to ticket #8827 added by Coren
also see https://gerrit.wikimedia.org/r/#/c/172476/ now
Status changed from 'new' to 'open' by RT_System
Reference to ticket #8227 deleted by Coren
Bugzilla ticket 35611 added by dzahn
Reference to ticket #8227 added by Coren
also see https://gerrit.wikimedia.org/r/#/c/172313/1
Status changed from 'open' to 'resolved' by dzahn
key length is 4096 bits. passphrase is 20 chars a-zA-Z0-9
merged: https://gerrit.wikimedia.org/r/#/c/173103/
merged : https://gerrit.wikimedia.org/r/#/c/172919/
Subject changed from 'Provision more job runners' to 'Provision more job runners - stalled until migrated to HHVM' by glavagetto
On 11/14/2014 10:20 AM, Ariel T. Glenn via RT wrote:
It looks like blanks are not santitized to underscores any more on the gmetad
side, so you now see the graphs here:I can change thee gmetric cron job on terbium if you like, to have underscores
in the name of the metric, or the link can be chenged; which do you (collective
you) prefer?
I'm less worried about the link change. A bigger problem is that this
separates old (historical) data from new data for metrics with a space in
the metric name. I'm not sure how common those are. If they are common, then
moving back to underscore version across the board would preserve the
connection with the old data, but lose the last three weeks or so.
Gabriel
Issue taken by glavagetto
Status changed from 'new' to 'open' by RT_System
all done as requested. thanks for the detailed preparation.
private key added to private repo, public key added to public repo,
merged the change to provision private key on tin and watched it.
<root at tin:~/> ls
authorized_keys known_hosts mwdeploy_rsa
passphrase in /srv/passwords/mediawiki-deployment-key-passphrase on iron
Status changed from 'new' to 'open' by RT_System
Status changed from 'open' to 'stalled' by glavagetto
Όταν Δευ Νοε 10 17:44:50 2014, gwicke γράψε:
The ganglia monitoring of the global job queue length on terbium
stopped
working about two weeks ago:
https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=terbium.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1395860566&v=648583&m=Global_JobQueue_length&z=large
The job queue lengths as reported by mwscript seem to be within the
normal
range though, and jobs are being processed.It is possible that this is related to some ganglia-related changes on
October 24th:https://wikitech.wikimedia.org/wiki/Server_Admin_Log#October_24
19:53 bblack: nickel's basically dead, uranium has been promoted to
prod
ganglia a little early for now
17:07 cmjohnson: getting ready to replace a failed disk on ganglia
(server:nickel)...it will be offline for a few minutesGabriel
It looks like blanks are not santitized to underscores any more on the gmetad
side, so you now see the graphs here:
https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=terbium.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1395860566&v=648583&m=Global%20JobQueue%20length&z=large
I can change thee gmetric cron job on terbium if you like, to have underscores
in the name of the metric, or the link can be chenged; which do you (collective
you) prefer?
Issue taken by dzahn
Status changed from 'new' to 'open' by RT_System
On Fri Nov 14 19:40:49 2014, aschulz wrote:
It would be nice to have a few more job runners, given their amount of
CPU
usage. See Gabriel's comment on
https://gerrit.wikimedia.org/r/#/c/161473/ for
details.Maybe 2-4 more runners would be enough (currently there are ~16). I
suspect
that the HHVM migration will help us out here as well. I don't see a
reason why
similar specs won't suffice.
I think we're going to migrate some jobrunners to HHVM this week, so I'd
postpone the problem of provisioning more JRs to after the migration is
underway and we have a good idea of how it will perform. New => stalled
Given to mark by cmjohnson
-- Assigning to Mark
Chris Johnson
Operations Engineer
Wikimedia Foundation, Inc
(415) 578-0844
<cmjohnson at wikimedia>
AdminCc jeremyb added by jeremyb
Dependency by ticket #8772 deleted by dzahn
Reference by ticket #8772 added by dzahn
worked around it by finding a reason to touch the zone file itself (vs. just
the langlist), sorted something alphabetically, and yea, it generates it just
fine then. so we have the new WP language "mai" active and that's what i wanted
for now.
so the bug is that we don't regenerate if only a .tmpl file in ./helpers/ is
touched but not an actual zone template and it's lower priority now
Dependency by ticket #8772 added by dzahn
On Fri Oct 03 13:09:33 2014, <jared.zimmerman at wikimedia> wrote:
thanks all, I'll get a techsupport ticket started for the 4 you mentioned.
*Jared Zimmerman * \\ Director of User Experience \\ Wikimedia Foundation
M +1 415 609 4043 \\ @Jaredzimmerman http://loo.ms/g0
On Fri, Oct 3, 2014 at 1:04 PM, Daniel Zahn via RT <
ops-requests@wikimedia.org> wrote:On Fri Oct 03 19:36:17 2014, <jared.zimmerman at wikimedia> wrote:
How does this affect private lists such as staff, sf staff, emgt,
the
private design list? surely we don't need community support for
changing
these internal lists?I did not claim we need community support for changing them, i merely
answered
Jeremy's question about the last "renaming campaign" of (public) lists.
Also,
some lists are mailman and some are Google. so in this example:staff is run by Philippe/legal.
"sf staff": "wmfsf" is also a list controlled by techsupport. OIT adds
people
here when they get hired.emgt is a Google group, not a mailman list, that's outside our control in
ops,
it's also techsupport.design-team is Brandon Harris
It just needs coordination via the admins, we can't just go in with a root
pass
and change them all, and for some of them it's also technically outside of
our
control.
OIT should be able to do the changes needed for all of those internal lists but
if they need help let them know they can reach out to me.
<jared.zimmerman at wikimedia> wrote:
Requestor khorn added by jgreen
On Fri Oct 03 19:36:17 2014, <jared.zimmerman at wikimedia> wrote:
How does this affect private lists such as staff, sf staff, emgt,
the
private design list? surely we don't need community support for
changing
these internal lists?
I did not claim we need community support for changing them, i merely answered
Jeremy's question about the last "renaming campaign" of (public) lists. Also,
some lists are mailman and some are Google. so in this example:
staff is run by Philippe/legal.
"sf staff": "wmfsf" is also a list controlled by techsupport. OIT adds people
here when they get hired.
emgt is a Google group, not a mailman list, that's outside our control in ops,
it's also techsupport.
design-team is Brandon Harris
It just needs coordination via the admins, we can't just go in with a root pass
and change them all, and for some of them it's also technically outside of our
control.
Requestor jgreen deleted by jgreen
On Fri Oct 03 01:49:52 2014, jeremyb wrote:
do we have anything
written about how to choose a title (not name) for a new list?
https://meta.wikimedia.org/wiki/Mailing_lists/Standardization#Naming_scheme
https://meta.wikimedia.org/wiki/Mailing_lists/Administration#Duties_of_list_administrators
On Fri Oct 03 17:02:09 2014, <jared.zimmerman at wikimedia> wrote:
Whats a reasonable next step here
4 out of the 8 lists mentioned can be fixed at once by techsupport@ because
they either control the list or they are Google aliases/groups.
<jared.zimmerman at wikimedia> wrote:
Given to mark by ptshibamba