madhuvishy (Madhu)
User

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 13 2015, 10:09 PM (135 w, 5 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
MViswanathan (WMF)

Recent Activity

Fri, Nov 17

madhuvishy added a comment to T171508: Investigate and implement alternative for showmount based check at instance boot time.

Raw notes from Etherpad in rolling this all out:

Fri, Nov 17, 12:00 AM · cloud-services-team (Kanban), Patch-For-Review, Cloud-Services

Thu, Nov 16

madhuvishy added a comment to T171541: Setup periodic rsync jobs from dataset1001/dumpsdata1001|2 to labstore1006|7.

OK, I delcare the patch ready to merge, as soon as the following happen on labstore1006:

  • a new directory /srv/dumps/xmldatadumps created with owner/group root and 755 perms
  • move all directories and files under /srv/dumps, to /srv/dumps/xmldatadumps

This is all done now :)

Thu, Nov 16, 8:22 PM · Patch-For-Review, User-ArielGlenn, Data-Services, Datasets-General-or-Unknown
madhuvishy added a project to T180659: Investigate the use of the shared NFS mount from labstore1003 to dataset1001: Data-Services.
Thu, Nov 16, 6:31 AM · Patch-For-Review, Data-Services
madhuvishy created T180659: Investigate the use of the shared NFS mount from labstore1003 to dataset1001.
Thu, Nov 16, 6:31 AM · Patch-For-Review, Data-Services

Mon, Nov 13

madhuvishy closed T173647: Prepare and check storage layer for hif.wiktionary as Resolved.

Everything seems to be good now, I'm resolving this task. Thanks a ton @Marostegui!

Mon, Nov 13, 7:05 PM · cloud-services-team (Kanban), Cloud-Services, DBA
madhuvishy added a comment to T173647: Prepare and check storage layer for hif.wiktionary.

@jcrespo Understood, I wasn't aware of that. We are in the right track then :)

Mon, Nov 13, 7:00 PM · cloud-services-team (Kanban), Cloud-Services, DBA
madhuvishy added a comment to T173647: Prepare and check storage layer for hif.wiktionary.

@jcrespo That makes sense, I didn't know the private data was the reason we didn't do the wildcard grants. Lets leave it as is then, @aborrero may be soon working on automating a little better our flow to import a new DB into the replicas and set up access, and we can explore giving the grant on a per view database level every time we that in an automated fashion.

Mon, Nov 13, 6:57 PM · cloud-services-team (Kanban), Cloud-Services, DBA

Fri, Nov 10

madhuvishy added a comment to T173647: Prepare and check storage layer for hif.wiktionary.

@aborrero and I caught up on this, and it looks like all the DNS records are created now:

Fri, Nov 10, 9:12 PM · cloud-services-team (Kanban), Cloud-Services, DBA

Thu, Nov 9

madhuvishy added a comment to T180179: Evaluate the possibility to add Juniper images to Openstack.

Noting here that proprietary software is not usually installed on WMCS environments per https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use#What_uses_of_Labs_do_we_not_like.3F (Proprietary Software).

Thu, Nov 9, 11:58 PM · cloud-services-team (Kanban), Cloud-VPS, netops, Traffic, Operations
madhuvishy added a comment to T173647: Prepare and check storage layer for hif.wiktionary.

@Marostegui right, okay. Thanks! Do we have a ticket for this issue?

Thu, Nov 9, 6:55 PM · cloud-services-team (Kanban), Cloud-Services, DBA
madhuvishy added a comment to T173647: Prepare and check storage layer for hif.wiktionary.

@aborrero FYI, after Manuel's magic, I've run

Thu, Nov 9, 6:50 PM · cloud-services-team (Kanban), Cloud-Services, DBA
madhuvishy added a comment to T173647: Prepare and check storage layer for hif.wiktionary.

@Marostegui Worked now! what did you have to do?

Thu, Nov 9, 6:47 PM · cloud-services-team (Kanban), Cloud-Services, DBA
madhuvishy added a comment to T173647: Prepare and check storage layer for hif.wiktionary.

Also running directly on labsdb1011,

Thu, Nov 9, 6:00 PM · cloud-services-team (Kanban), Cloud-Services, DBA

Wed, Nov 8

madhuvishy added a comment to T171508: Investigate and implement alternative for showmount based check at instance boot time.

The pam_nologin behavior you're reporting sounds very odd indeed. If it's actually the case it will be CVE-worthy! It's an old, popular and well-audited piece of code though, so it'd be surprising to me if the root cause lies with pam_nologin and not somewhere in our configuration. It's not impossible of course, bugs and CVEs do happen :)

Have you encountered this behavior only during early/first boot, or is this reproducible after the first boot when e.g. creating that file? Is this perhaps a race that occurs while puppet is running and changing the system's configuration? Maybe something as innocent as sshd's UsePAM setting, or another PAM configuration, given that we're messing with it in the first puppet run to add LDAP auth?

When the config is account required nologin.so I've only been able to reproduce this behavior during the firstboot stage. I've tried applying auth required nologin.so post boot to see how the behavior changes, and been able to log in every time, despite that config existing.

Wed, Nov 8, 9:29 PM · cloud-services-team (Kanban), Patch-For-Review, Cloud-Services

Fri, Nov 3

madhuvishy closed T179153: pawikisource_p.page table not available as Resolved.

I've fixed the grants for pawikisource_p now.

Fri, Nov 3, 12:33 AM · Data-Services

Thu, Nov 2

madhuvishy added a comment to T168584: Labsdb* servers need to be rebooted.

fyi @Cmjohnson We are not doing the labsdb1003 reboot on Tuesday Nov 7, due to T179464.

Thu, Nov 2, 11:53 PM · Patch-For-Review, Scoring-platform-team (Current), DBA, cloud-services-team, Operations
madhuvishy awarded T179461: Use the term "developer account" for Wikimedia LDAP accounts a Love token.
Thu, Nov 2, 10:06 PM · Operations, Cloud-Services, Developer-Relations

Wed, Nov 1

madhuvishy added a comment to T179464: labsdb1001 crashed - storage issue.

It looks like it may be time to say goodbye to this server. I've spent some time today looking at the state of the storage configuration, and the damage, and if anything at all might be possible to recover the disk.

Wed, Nov 1, 11:18 PM · Operations, cloud-services-team (Kanban)
madhuvishy edited P6241 Badblocks labsdb1001.
Wed, Nov 1, 9:39 PM
madhuvishy created P6241 Badblocks labsdb1001.
Wed, Nov 1, 8:11 PM
madhuvishy added a comment to T179464: labsdb1001 crashed - storage issue.

Disk setup for labsdb1001

Wed, Nov 1, 5:06 PM · Operations, cloud-services-team (Kanban)

Mon, Oct 30

madhuvishy added a comment to T168584: Labsdb* servers need to be rebooted.

The 1001 reboot is all done. Notes from my planning etherpad:

Mon, Oct 30, 5:24 PM · Patch-For-Review, Scoring-platform-team (Current), DBA, cloud-services-team, Operations
madhuvishy closed T178128: Access to raw database tables on labsdb* for wmcs-admin users as Resolved.
Mon, Oct 30, 1:27 PM · Patch-For-Review, cloud-services-team (Kanban), Ops-Access-Requests, Operations, DBA

Fri, Oct 27

madhuvishy added a comment to T178128: Access to raw database tables on labsdb* for wmcs-admin users.

I've now rolled this out to labsdb10[01|03|09|10|11]. @Marostegui Is there a file/config/logs somewhere you'd like me to persist these grants? Thanks for your help :)

Fri, Oct 27, 6:07 PM · Patch-For-Review, cloud-services-team (Kanban), Ops-Access-Requests, Operations, DBA
madhuvishy added a comment to T178128: Access to raw database tables on labsdb* for wmcs-admin users.

Cool, I've run

Fri, Oct 27, 5:38 PM · Patch-For-Review, cloud-services-team (Kanban), Ops-Access-Requests, Operations, DBA
madhuvishy added a comment to T178128: Access to raw database tables on labsdb* for wmcs-admin users.

@Marostegui Sounds good, thanks

Fri, Oct 27, 5:26 PM · Patch-For-Review, cloud-services-team (Kanban), Ops-Access-Requests, Operations, DBA
madhuvishy added a comment to T178128: Access to raw database tables on labsdb* for wmcs-admin users.

@Marostegui Yeah that sounds right to me! Cool if I run that across the wiki replicas?

Fri, Oct 27, 5:10 PM · Patch-For-Review, cloud-services-team (Kanban), Ops-Access-Requests, Operations, DBA

Thu, Oct 26

madhuvishy updated the task description for T178807: Onboard aborrero to WMF.
Thu, Oct 26, 9:19 PM · Patch-For-Review, cloud-services-team
madhuvishy added a comment to T168584: Labsdb* servers need to be rebooted.

Started a planning doc for the reboots here - https://etherpad.wikimedia.org/p/labsdb-reboots

Thu, Oct 26, 6:16 PM · Patch-For-Review, Scoring-platform-team (Current), DBA, cloud-services-team, Operations
madhuvishy added a comment to T179075: User s53550 unable to connect to tools-db with given credentials.

Fixed! @MusikAnimal can you verify that your credentials work now and close this? Thank you :)

Thu, Oct 26, 4:21 PM · cloud-services-team (Kanban), Data-Services

Wed, Oct 25

madhuvishy added a comment to T179024: nfsiostat collector appears to be broken.

+1 That sounds like the right thing to do

Wed, Oct 25, 9:47 PM · Patch-For-Review, cloud-services-team
madhuvishy updated subscribers of T178128: Access to raw database tables on labsdb* for wmcs-admin users.
Wed, Oct 25, 9:18 PM · Patch-For-Review, cloud-services-team (Kanban), Ops-Access-Requests, Operations, DBA
madhuvishy added a comment to T178128: Access to raw database tables on labsdb* for wmcs-admin users.

@jcrespo @bd808 I looked at the accounts set up we have now, and it looks like the labsdbadmin user is already set up with remote (specific ips) permissions, but it only has Grant_priv and Create_user_priv, which in turn we use to Create and grant accounts for toolforge users/tool accounts.

Wed, Oct 25, 9:14 PM · Patch-For-Review, cloud-services-team (Kanban), Ops-Access-Requests, Operations, DBA
madhuvishy added a comment to T179024: nfsiostat collector appears to be broken.

nfsiostat.py has

Wed, Oct 25, 8:13 PM · Patch-For-Review, cloud-services-team
madhuvishy added a comment to T178920: tools-package-builder-01.tools.eqiad.wmflabs Puppet failing for pbuilder changes.

Awesome thanks @akosiaris!

Wed, Oct 25, 4:30 PM · cloud-services-team (Kanban), Toolforge

Tue, Oct 24

madhuvishy updated the task description for T142807: Migrate all users to new Wiki Replica cluster and decommission old hardware.
Tue, Oct 24, 8:17 PM · Goal, cloud-services-team (FY2017-18), Data-Services, DBA
madhuvishy added a comment to T168584: Labsdb* servers need to be rebooted.

I've updated the lists, and our wiki here -https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown

Tue, Oct 24, 8:14 PM · Patch-For-Review, Scoring-platform-team (Current), DBA, cloud-services-team, Operations
madhuvishy added a comment to T168584: Labsdb* servers need to be rebooted.

Proposed timing for the 2 reboots:

Tue, Oct 24, 7:53 PM · Patch-For-Review, Scoring-platform-team (Current), DBA, cloud-services-team, Operations
madhuvishy reopened T168584: Labsdb* servers need to be rebooted as "Open".

Reopening since we are scheduling the labsdb1001 and 1003 reboots over the next couple weeks.

Tue, Oct 24, 7:04 PM · Patch-For-Review, Scoring-platform-team (Current), DBA, cloud-services-team, Operations
madhuvishy reopened T168584: Labsdb* servers need to be rebooted, a subtask of T168445: Reboots of cloud servers, as Open.
Tue, Oct 24, 7:04 PM · cloud-services-team, Operations
madhuvishy reopened T168584: Labsdb* servers need to be rebooted, a subtask of T142807: Migrate all users to new Wiki Replica cluster and decommission old hardware, as Open.
Tue, Oct 24, 7:04 PM · Goal, cloud-services-team (FY2017-18), Data-Services, DBA
madhuvishy added a comment to T178805: Increase Tools available quota.

+1

Tue, Oct 24, 6:30 PM · Cloud-VPS (Quota-requests)
madhuvishy added a comment to T178920: tools-package-builder-01.tools.eqiad.wmflabs Puppet failing for pbuilder changes.

@akosiaris Hello, we have a package builder node in tools that seems to be running into some trouble with the new buster stuff in puppet (logs in task description)

Tue, Oct 24, 6:07 PM · cloud-services-team (Kanban), Toolforge
madhuvishy added a comment to T171508: Investigate and implement alternative for showmount based check at instance boot time.

I've decided to look at alternatives because the /etc/nologin mechanism seems to be flaky.

Tue, Oct 24, 2:09 AM · cloud-services-team (Kanban), Patch-For-Review, Cloud-Services

Oct 16 2017

madhuvishy added a comment to T171508: Investigate and implement alternative for showmount based check at instance boot time.

I've decided to look at alternatives because the /etc/nologin mechanism seems to be flaky.

Oct 16 2017, 6:33 PM · cloud-services-team (Kanban), Patch-For-Review, Cloud-Services

Oct 11 2017

madhuvishy added a project to T177603: Proposal: Improvements for the Toolforge 'webservice' command: Cloud-Services.
Oct 11 2017, 8:52 PM · Cloud-Services, Outreachy (Round-15)
madhuvishy added a comment to T177603: Proposal: Improvements for the Toolforge 'webservice' command.

Hi @Sowjanyavemuri, please feel free to reach out if you need any help completing your proposal. We are available on #wikimedia-cloud or here on Phabricator for any questions, or if you'd like to start learning about our different systems. https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_Introduction is a good place to start reading :)

Oct 11 2017, 8:51 PM · Cloud-Services, Outreachy (Round-15)

Oct 10 2017

madhuvishy renamed T148783: Make maintenance script for sending annual survey emails from Make maintance script for sending annual survey emails to Make maintenance script for sending annual survey emails.
Oct 10 2017, 3:49 PM · Patch-For-Review, Toolforge, MediaWiki-extensions-WikimediaMaintenance

Oct 2 2017

madhuvishy added a comment to T177103: Catchpoint tests failing under Toolforge availability product.

The webservice tests should be fixed too! I'll let @chasemp verify and resolve this.

Oct 2 2017, 10:49 PM · Patch-For-Review, Toolforge
madhuvishy added a comment to T177103: Catchpoint tests failing under Toolforge availability product.

Also fixed the labsdb1005 check with https://gerrit.wikimedia.org/r/381885

Oct 2 2017, 10:10 PM · Patch-For-Review, Toolforge
madhuvishy added a comment to T177103: Catchpoint tests failing under Toolforge availability product.

For the labsdb1001 & 1003 tests, the error was:

Oct 2 2017, 10:06 PM · Patch-For-Review, Toolforge

Oct 1 2017

madhuvishy closed T177164: puppet-phabricator and gerrit-test3 have gone down as Resolved.

These instances should be up now.

Oct 1 2017, 10:40 PM · Cloud-VPS

Sep 26 2017

madhuvishy closed T176597: Request creation of webperf VPS project as Resolved.

Project https://wikitech.wikimedia.org/wiki/Nova_Resource:Webperf created with User phedenskog as projectadmin

Sep 26 2017, 4:15 PM · Cloud-VPS (Project-requests)

Sep 25 2017

madhuvishy added a comment to T171508: Investigate and implement alternative for showmount based check at instance boot time.

Thanks @MoritzMuehlenhoff, I tried that and came up with these two plots, don't really see much of a difference as far as auth related services go.

Sep 25 2017, 11:15 PM · cloud-services-team (Kanban), Patch-For-Review, Cloud-Services
madhuvishy added a comment to T176645: lots of cloud-local puppetmasters broken.

I went through the list here -https://tools.wmflabs.org/openstack-browser/puppetclass/role::puppetmaster::standalone and upgraded apache and ran puppet.

Sep 25 2017, 4:58 PM · cloud-services-team (Kanban)
madhuvishy added a comment to T176645: lots of cloud-local puppetmasters broken.

@Andrew do we have a list of instances, they just need a apt-get install --upgrade apache2 and restart for apache. There was an unattended apache upgrade that rolled out last week, and I fixed tools and labs-puppetmaster before it rolled out.

Sep 25 2017, 4:15 PM · cloud-services-team (Kanban)

Sep 20 2017

madhuvishy added a comment to T171508: Investigate and implement alternative for showmount based check at instance boot time.

I was able to change the firstboot script, that runs when a new instance is created and being booted for the first time, to create an /etc/nologin file at the beginning of it's run, and deletes the file at the end, after ensuring that NFS is mounted. I tested this by building test images for Trusty and Jessie.

Sep 20 2017, 4:32 PM · cloud-services-team (Kanban), Patch-For-Review, Cloud-Services

Sep 19 2017

madhuvishy added a comment to T171539: Puppetize and setup initial lvms and directory structures for labstore1006|7.

Steps to set up lvms

Sep 19 2017, 8:49 PM · Patch-For-Review, Data-Services

Sep 18 2017

madhuvishy updated subscribers of T159617: Enable downloading notebooks as PDF.

@HaeB I've enabled pdf exports for SWAP now, but it may error out if it runs into unicode characters it can't parse, I'm not sure why, and it seems to be a longer task, so tabling it here for now.

Sep 18 2017, 8:03 PM · Patch-For-Review, PAWS

Sep 15 2017

madhuvishy updated the task description for T175768: Improvements for the Toolforge 'webservice' command.
Sep 15 2017, 5:33 PM · Outreachy (Round-15), Toolforge
madhuvishy committed R2073:3a597a95ffc5: Strip trailing fullstops in project proxy domain names (authored by madhuvishy).
Strip trailing fullstops in project proxy domain names
Sep 15 2017, 5:23 PM

Sep 8 2017

madhuvishy closed T167984: rack/setup/install labstore100[67].wikimedia.org as Resolved.

@fgiunchedi Thank you! That seems to have fixed it. Resolving this task. Thanks everyone :)

Sep 8 2017, 5:51 PM · Cloud-Services, Operations

Sep 6 2017

madhuvishy added a comment to T167984: rack/setup/install labstore100[67].wikimedia.org.

@RobH @Cmjohnson I'm able to log in to both machines with their .wikimedia.org hostnames and run puppet fine.

Sep 6 2017, 5:32 PM · Cloud-Services, Operations

Sep 5 2017

madhuvishy added a comment to T167984: rack/setup/install labstore100[67].wikimedia.org.

All of these solutions so far require onsite.

Sep 5 2017, 4:32 PM · Cloud-Services, Operations

Sep 3 2017

madhuvishy added a comment to T174850: wikistream.wmflabs.org down - unable to ssh to ws-web.

Looking at auth.log on ws-web, I saw a bunch of:

Sep 3 2017, 6:01 PM · Cloud-VPS

Aug 31 2017

madhuvishy closed T135405: Replicate CentralNotice tables to Labs as Resolved.

I'm closing this as resolved since running the maintain-views script for the new views went fine. Please reopen if there are any issues. Thanks!

Aug 31 2017, 11:22 PM · cloud-services-team, Patch-For-Review, Data-Services, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, DBA
madhuvishy closed T135405: Replicate CentralNotice tables to Labs, a subtask of T150767: Wikireplica service for tools and labs - issues and missing available views (tracking), as Resolved.
Aug 31 2017, 11:22 PM · Data-Services, Tracking, DBA
madhuvishy closed T174740: Openstack-browser server pages fail to load information with 'Unknown server' error as Resolved.
Aug 31 2017, 11:15 PM · Toolforge
madhuvishy committed R2073:ae88ca4d5f54: Update puppetmaster url to latest (authored by madhuvishy).
Update puppetmaster url to latest
Aug 31 2017, 11:08 PM
madhuvishy added a comment to T174740: Openstack-browser server pages fail to load information with 'Unknown server' error.

Looks like it is talking to the old puppetmaster url and failing

Aug 31 2017, 10:53 PM · Toolforge
madhuvishy created T174740: Openstack-browser server pages fail to load information with 'Unknown server' error.
Aug 31 2017, 10:52 PM · Toolforge
madhuvishy added a comment to T135405: Replicate CentralNotice tables to Labs.

@Marostegui @Reedy Merged and ran maintain-views in all the labs replicas (1001/3/9/10/11)

Aug 31 2017, 10:33 PM · cloud-services-team, Patch-For-Review, Data-Services, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, DBA
madhuvishy closed T174611: update *.wmflabs.org by 2017-10-16 as Resolved.

This is all done, new private key committed in ops/private. New certs are showing up okay!

Aug 31 2017, 9:19 PM · cloud-services-team (Kanban), Operations

Aug 30 2017

madhuvishy created T174590: Revert patch that adds a temporary exception to the block-for-export check for the testlabs project.
Aug 30 2017, 6:04 PM · cloud-services-team (Kanban), Cloud-Services
madhuvishy added a comment to T174468: VPS Project dumps is using 1.7T at /data/project on NFS.

Thanks @Hydriz @Nemo_bis! Do keep the ticket updated as the storage space gets freed up.

Aug 30 2017, 5:49 PM · Cloud-VPS
madhuvishy closed T171623: Split up labstore external shelf storage available in codfw between labstore2001 and 2 as Resolved.

Thank you so much, that all looks right. Closing this as resolved!

Aug 30 2017, 5:43 AM · ops-codfw, DC-Ops, Data-Services, Operations
madhuvishy closed T171623: Split up labstore external shelf storage available in codfw between labstore2001 and 2, a subtask of T126083: overhaul labstore setup [tracking], as Resolved.
Aug 30 2017, 5:43 AM · Data-Services, Tracking, Operations
madhuvishy created P5945 Error on building new Debian Jessie image.
Aug 30 2017, 5:26 AM

Aug 29 2017

madhuvishy added a comment to T174469: LDAP account that is not attached on wikitech has no means for password reset.

@Vacio You are in the right place! If you can hop on to the #wikimedia-cloud IRC channel sometime, we can help you figure this out easier real time :)

Aug 29 2017, 6:32 PM · Striker, wikitech.wikimedia.org
madhuvishy added a comment to T169849: Architecture and puppetize setup for dumpsdata boxes.

@ArielGlenn Sounds good, I would push towards a larger window of atleast 2 hours - 45 minutes to an hour for 3 rsyncs + some cleanup seems like cutting it close.

Aug 29 2017, 6:11 PM · Patch-For-Review, Dumps-Generation, Operations
madhuvishy added a comment to T174469: LDAP account that is not attached on wikitech has no means for password reset.

@Vacio Could you please elaborate on what the problem is? Did you try signing up to wikitech and did you run into an error? If so what? You can create a wikitech account at https://wikitech.wikimedia.org/w/index.php?title=Special:CreateAccount

Aug 29 2017, 5:57 PM · Striker, wikitech.wikimedia.org
madhuvishy triaged T174467: VPS Project wikidumpparse is using 795G at /home on NFS as Normal priority.
Aug 29 2017, 5:48 PM · Cloud-VPS
madhuvishy triaged T174468: VPS Project dumps is using 1.7T at /data/project on NFS as Normal priority.
Aug 29 2017, 5:48 PM · Cloud-VPS
madhuvishy created T174468: VPS Project dumps is using 1.7T at /data/project on NFS.
Aug 29 2017, 5:47 PM · Cloud-VPS
madhuvishy added a project to T174467: VPS Project wikidumpparse is using 795G at /home on NFS: Cloud-VPS.
Aug 29 2017, 5:44 PM · Cloud-VPS
madhuvishy created T174467: VPS Project wikidumpparse is using 795G at /home on NFS.
Aug 29 2017, 5:43 PM · Cloud-VPS
madhuvishy closed T168584: Labsdb* servers need to be rebooted as Resolved.

1001/3 have not been rebooted because of the fear of catastrophic hardware failure and their impending decomm.

Aug 29 2017, 5:29 PM · Patch-For-Review, Scoring-platform-team (Current), DBA, cloud-services-team, Operations
madhuvishy closed T168584: Labsdb* servers need to be rebooted, a subtask of T168445: Reboots of cloud servers, as Resolved.
Aug 29 2017, 5:29 PM · cloud-services-team, Operations
madhuvishy added a comment to T168584: Labsdb* servers need to be rebooted.

@Marostegui We talked about this today in our meeting, and think that since we don't have significant user traffic moved over from 1001/3 to the new WikiReplica servers yet, we should hold off from rebooting these server for longer, given that Moritz mentioned during our last discussion that we can afford to hold off, and the immediate attack vectors have already been plugged in place.

Aug 29 2017, 4:34 PM · Patch-For-Review, Scoring-platform-team (Current), DBA, cloud-services-team, Operations
madhuvishy added a comment to T171623: Split up labstore external shelf storage available in codfw between labstore2001 and 2.

@Papaul Yup that's perfect, thanks!

Aug 29 2017, 3:17 PM · ops-codfw, DC-Ops, Data-Services, Operations
madhuvishy added a comment to T171623: Split up labstore external shelf storage available in codfw between labstore2001 and 2.

@Papaul, Hardware RAID 10 on both labstore2001 and 2002, with 6 or 8 disks per logical/virtual RAID drive would be great (12 still feels like a really big disk).

Aug 29 2017, 5:33 AM · ops-codfw, DC-Ops, Data-Services, Operations

Aug 28 2017

madhuvishy added a comment to T167984: rack/setup/install labstore100[67].wikimedia.org.

@Cmjohnson Thank you!

Aug 28 2017, 9:12 PM · Cloud-Services, Operations
madhuvishy added a comment to T169849: Architecture and puppetize setup for dumpsdata boxes.

@ArielGlenn Thanks for the summary! Looks right - one note is that I would prefer that the dumpsdata host(primary or secondary) is the pristine source for both labstore1006 and 7, rather than the labstores trying to sync between each other.

Aug 28 2017, 6:26 PM · Patch-For-Review, Dumps-Generation, Operations
madhuvishy added a comment to T167984: rack/setup/install labstore100[67].wikimedia.org.

Update: We are still blocked on talking to HP Support about the disk shelves.

Aug 28 2017, 6:03 PM · Cloud-Services, Operations

Aug 25 2017

madhuvishy closed T151322: labstore systemd state Icinga alarms as Resolved.

2001 is done too.

Aug 25 2017, 5:35 AM · cloud-services-team (Kanban), Operations, Cloud-Services
madhuvishy added a comment to T171623: Split up labstore external shelf storage available in codfw between labstore2001 and 2.

@Papaul, thanks for splitting up the shelves! I've reimaged the servers, and that part looks right

Aug 25 2017, 5:25 AM · ops-codfw, DC-Ops, Data-Services, Operations

Aug 23 2017

madhuvishy added a comment to T167984: rack/setup/install labstore100[67].wikimedia.org.

Current status: We are not really sure why the disk shelves don't show up. As the next step, @Cmjohnson will try and call HP support and have them help troubleshoot, hopefully on Friday.

Aug 23 2017, 7:16 PM · Cloud-Services, Operations
madhuvishy added a comment to T167984: rack/setup/install labstore100[67].wikimedia.org.

The interface flapping issue was because of a mis-connected cable, which @Cmjohnson's fixed now. Both management interfaces are now accessible.

Aug 23 2017, 5:59 PM · Cloud-Services, Operations

Aug 21 2017

madhuvishy added a comment to T167984: rack/setup/install labstore100[67].wikimedia.org.

@Cmjohnson I also can't even seem to get into the management interface for labstore1006

Aug 21 2017, 8:45 PM · Cloud-Services, Operations
madhuvishy added a comment to T167984: rack/setup/install labstore100[67].wikimedia.org.

@Cmjohnson I tried getting into the management interface for 1007, and powercycled it, booted from network and was looking at console:

Aug 21 2017, 8:42 PM · Cloud-Services, Operations