Page MenuHomePhabricator

Labs team reliability goal for Q1 2015/16
Closed, ResolvedPublic

Description

Tracking task for Labs team reliability goal for Q1 2015/16.

  1. Meet or exceed 99.5% uptime for each Labs infrastructure service
  2. Remove all Labs support host SPOFs, using redundancy or hot spares
  3. Finish NFS migration to RAID10 storage, and implement NFS sharding
  4. Audit Labs projects on NFS dependencies and support migration to alternatives where appropriate

Related Objects

StatusSubtypeAssignedTask
Resolvedyuvipanda
ResolvedAndrew
Duplicatecoren
Resolved jcrespo
Resolvedyuvipanda
ResolvedAndrew
ResolvedAndrew
ResolvedRobH
Resolved jcrespo
DeclinedNone
Resolved chasemp
DeclinedNone
DuplicateNone
ResolvedAndrew
DuplicateNone
Resolvedyuvipanda
DeclinedNone
ResolvedKrinkle
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedhashar
Resolvedhashar
Resolvedyuvipanda
Resolvedyuvipanda
OpenNone
Declinedyuvipanda
Resolvedyuvipanda
ResolvedNone
Resolvedyuvipanda
ResolvedAndrew
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
ResolvedAndrew
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
ResolvedNegative24
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedhashar
Resolvedhashar
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
DeclinedNone
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedcoren
Resolvedyuvipanda
Resolvedyuvipanda
Resolved jkroll
ResolvedKrinkle
DeclinedAndrew
Resolved Bstorm
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedKrenair
ResolvedAndrew
Resolved Bstorm
ResolvedAndrew
ResolvedNone
ResolvedAndrew
Resolvedjsn.sherman
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedSmalyshev
ResolvedAndrew
ResolvedAndrew
DeclinedNone
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedRobH
Resolved Cmjohnson
Resolved Cmjohnson
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
Resolvedcoren
InvalidNone
Resolvedcoren
Resolvedcoren
Declinedcoren
ResolvedNone
ResolvedNone
Resolvedcoren
Resolvedcoren
Declinedyuvipanda
Resolvedcoren
Resolvedcoren
Resolvedcoren
Resolved Cmjohnson
Resolved chasemp
Resolved chasemp
Resolvedcoren
Resolvedcoren
Resolvedmark
Resolved Cmjohnson
Resolvedcoren
Resolvedcoren
Resolvedcoren
Resolvedfaidon
Declinedfaidon
Resolvedcoren
ResolvedAndrew
Resolvedyuvipanda
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
DeclinedAndrew
ResolvedAndrew
ResolvedAndrew
Declinedyuvipanda
ResolvedAndrew
ResolvedAndrew
ResolvedNone
ResolvedAndrew
Declinedcoren
Resolved jcrespo

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added a project: Cloud-Services.
yuvipanda subscribed.

Not sure about the NFS sharding one - @mark / @coren is that just the tools / others / maps being on different arrays? Or is there more to that? :)

@yuvipanda: It's keeping the filesystem reasonably small (and operations on them more parallelizable) by spliting along project lines, yes. So right now we've spun off tools and maps with everything else together, but making sure that we locate outliers and split them as an ongoing thing is part of this conceptually.

chasemp set Security to None.
yuvipanda claimed this task.

I guess?