**Goal:**
* Rack and provision new Dumps servers
* Migrate one customer use case to them (NFS to VPS & Toolforge?)
**Initial reading**
* Planning task and brainstorm - https://phabricator.wikimedia.org/T118154#3133894
* Hardware specs for labstore1006/7 - https://phabricator.wikimedia.org/T161311
* Rack/setup/install task for labstore1006/7 - https://phabricator.wikimedia.org/T167984
**Current status summary: (some of this text is borrowed from the phab tasks listed above)**
* Dataset1001 is the canonical dump storage server, and handles all the client usecases. Snapshot* hosts handle actual dumps production.
* Dataset1001 and the snapshot* hosts share data between each other through nfs mounts.
* There are 4 different client use cases
** NFS mounts for Cloud VPS projects
*** Dataset1001 has mounted an NFS share from labstore1003 and copies the relevant files to that for Labs consumption
** NFS mounts for stat boxes for wikistats data generation
*** Analytics mounts directly a share on dataset1001 for consumption
** Web service for public download access to dumps
*** Uses copy on dataset1001 directly
** Rsync mirrors
*** Use copy on dataset1001 directly
* Dataset1001 is currently in the public vlan, and uses about 50T of storage
* ms1001 is the backup server for dataset1001
* labstore1003 is a SPOF
**Where we're going (focusing a bit more on the Cloud Services team's work for upcoming quarters)**
* Separate the dumps generation layer from the dumps serving layer
* Have one cluster (internal vlan) - Dumpsdata* and Snapshot* - that handles all the generation, one cluster (public vlan) - Labstore1006 and 1007 - that handles all the client serving use cases (Analytics, VPS, Web, Rsync mirrors)
* Labstore1006 and 7 each have 72TB (36TB in internal drives, and 36TB in external shelves) storage available post RAID 10
* Labstore1006 and 7 will get their data through a periodic rsync from the canonical dumps server (dataset/dumpsdata)
* Labstore1006 and 7 will be set up completely independently, with copies of data from pristine source, and the client use cases will be sharded between them
* We are hoping to put Analytics NFS mounts and VPS NFS mounts usecases in one server, and web and rsync mirrors on the other
* Labstore1006 and 7 should have the capability to failover client services between each other (as in provide all client use cases from a single server during maint. if necessary)
**Rough Task breakdown for the Q1 goal**
[] Finish rack/setup/install for labstore1006/7 - T167984
[] Puppetize and setup initial lvms and directory structures
[] Setup periodic rsync jobs from dataset1001/dumpsdata1001/2(?) to labstore1006 and 1007
[] Setup NFS kernel server to serve dumps to VPS instances
** Investigate alternatives for showmount check at instance boot time, so we can only open up port 2049 TCP from labstore1006/7
** Figure out how NFS failovers will work (cluster IP failover will not work in the absence of matching underlying drbd volumes)
[] Test mounting the shares on instances
[] Test mounting the shares on stat boxes
[] Investigate and setup the web service component that serves dumps to users
[] Investigate the rsync mirrors setup
[] Migrate dumps labstore1003 to labstore1006/7 (this step will need more breaking down eventually)
[] Migrate the stat* mount from dataset1001 to labstore1006/7 (coordinate with Analytics)
**Open questions**
* How are the rsync mirrors setup?
* What are the QoS mechanisms for web and rsync mirrors?
* How are outages currently handled?