Page MenuHomePhabricator

Implement storage policies for swift
Closed, ResolvedPublic

Description

Newer versions of Swift include support for per-container storage policies (http://docs.openstack.org/developer/swift/overview_policies.html) to be able to provide different level of durability/guarantees on a container.

In our deployment this would come handy for example to store less than three copies for container thumbnails or allocate "low latency" containers on SSDs instead of spinning disks.

Broad steps to make storage policies a reality:

  • All swift backends to jessie and a minimum (TBD) version of swift
  • Test in esams how adding storage policies would work and e.g. what stats and tools it affects
  • Consider what storage policies make sense, I think at least two dimensions: replication guarantees (e.g. 1/2/3 times) and latency "guarantees" (ssd/hdd)
  • Run swift-dispersion-populate for each policy to get runtime dispersion stats

SSD considerations

As of Nov 2016 we have a mixture of SSDs sizes across the cluster, those are used for the OS (raid1, space varies 40-50G) and swift (each SSD on its own, ~90G) for containers and accounts. The remaining space is allocated to a partition of ~40G (200G SSD) or ~140G (300G SSD) which we can reuse for objects instead.

Details

Related Gerrit Patches:
operations/puppet : productionswift: delete swift-object-reconstructor unit
operations/puppet : productionhieradata: enable swift storage policies in eqiad
operations/puppet : productionhieradata: enable swift storage policies in codfw
operations/puppet : productionswift: fix duplicate dispersion cron name
operations/puppet : productionswift: introduce storage policies
operations/puppet : productionUse fourth partition on ms-be SSD for swift data
operations/puppet : productionswift: introduce container-reconciler
operations/puppet : productionswift: make swift-dispersion-stats policy-aware

Event Timeline

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptNov 25 2016, 6:07 PM
fgiunchedi mentioned this in Unknown Object (Task).Nov 29 2016, 8:11 PM

WRT minimum swift version, we're running 2.2 and 2.10 is on the cards (https://phabricator.wikimedia.org/T162609) here's the relevant changelog entries between 2.2 and 2.10

* Write requests to a replicated storage policy with an even number
  of replicas now have a quorum size of half the replica count
  instead of half-plus-one.

* `swift-recon` can now query hosts by storage policy.

* Storage policies now support having more than one name.

  This allows operators to fix a typo without breaking existing clients,
  or, alternatively, have "short names" for policies. This is implemented
  with the "aliases" config key in the storage policy config in
  swift.conf. The aliases value is a list of names that the storage
  policy may also be identified by. The storage policy "name" is used to
  report the policy to users (eg in container headers). The aliases have
  the same naming restrictions as the policy's primary name.

* Swift now emits StatsD metrics on a per-policy basis.

* Added storage policy support to dispersion tools.

* Improved storage policy support for quarantine stats in swift-recon.

* The proxy log line now includes the request's storage policy index.
fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.May 5 2017, 1:30 PM

Change 353878 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: introduce storage policies

https://gerrit.wikimedia.org/r/353878

fgiunchedi renamed this task from Consider storage policies for swift to Implement storage policies for swift.May 15 2017, 3:52 PM

I've cherry picked https://gerrit.wikimedia.org/r/353878 in beta and tested on the swift 2.10 cluster there and added a corresponding object ring for the new policy, all seems to be working!
The new policy initially for ssd and 3x replication is called lowlatency. ATM on the two SSDs on ms-be machines we create four partitions: root (50G) / swap (1G) / container (90G) / not-allocated (the rest)
The fourth partition varies in size depending on overall SSD size, namely 310G / 135G / 38G. Counting how many machines have how much space available from below, in total that's 310 * 12 + 135 * 6 + 38 * 6 = 4758 GB raw or 1586 GB after replication (roughly) available.

We'd probably don't want to fully allocate all space for SSD provisioning (to be confirmed) so taking out e.g. 10% that'd be around 1420GB available (after replication)

root@neodymium:~# cumin  'ms-be1*' 'grep 4$ /proc/partitions | sort'
39 hosts will be targeted:
ms-be[1001-1039].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====                                                                                                             
(12) ms-be[1028-1039].eqiad.wmnet                                                                                                  
----- OUTPUT of 'grep 4$ /proc/partitions | sort' -----                                                                            
   8        4  311591936 sda4                                                                                                      
   8       20  311591936 sdb4                                                                                                      
===== NODE GROUP =====                                                                                                             
(6) ms-be[1016-1021].eqiad.wmnet                                                                                                   
----- OUTPUT of 'grep 4$ /proc/partitions | sort' -----                                                                            
   8        4  135776512 sda4                                                                                                      
   8       20  135776512 sdb4                                                                                                      
===== NODE GROUP =====                                                                                                             
(6) ms-be[1022-1027].eqiad.wmnet                                                                                                   
----- OUTPUT of 'grep 4$ /proc/partitions | sort' -----                                                                            
   8        4   38098944 sda4                                                                                                      
   8       20   38098944 sdb4                                                                                                      
================

Change 356198 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: introduce container-reconciler

https://gerrit.wikimedia.org/r/356198

fgiunchedi updated the task description. (Show Details)Jun 2 2017, 10:21 AM

Change 356810 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: make swift-dispersion-stats policy-aware

https://gerrit.wikimedia.org/r/356810

Change 356810 merged by Filippo Giunchedi:
[operations/puppet@production] swift: make swift-dispersion-stats policy-aware

https://gerrit.wikimedia.org/r/356810

Change 356198 merged by Filippo Giunchedi:
[operations/puppet@production] swift: introduce container-reconciler

https://gerrit.wikimedia.org/r/356198

fgiunchedi updated the task description. (Show Details)Jun 29 2017, 11:51 AM

Mentioned in SAL (#wikimedia-operations) [2017-06-29T11:51:27Z] <godog> create xfs filesystems on fourth partition on ms-be machines - T151648

Change 362208 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] Use fourth partition on ms-be SSD for swift data

https://gerrit.wikimedia.org/r/362208

Change 362208 merged by Filippo Giunchedi:
[operations/puppet@production] Use fourth partition on ms-be SSD for swift data

https://gerrit.wikimedia.org/r/362208

Change 353878 merged by Filippo Giunchedi:
[operations/puppet@production] swift: introduce storage policies

https://gerrit.wikimedia.org/r/353878

Change 362949 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: enable swift storage policies in codfw

https://gerrit.wikimedia.org/r/362949

Change 362950 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: fix duplicate dispersion cron name

https://gerrit.wikimedia.org/r/362950

Change 362950 merged by Filippo Giunchedi:
[operations/puppet@production] swift: fix duplicate dispersion cron name

https://gerrit.wikimedia.org/r/362950

Change 362949 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: enable swift storage policies in codfw

https://gerrit.wikimedia.org/r/362949

Change 362958 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: enable swift storage policies in eqiad

https://gerrit.wikimedia.org/r/362958

Change 362959 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: delete swift-object-reconstructor unit

https://gerrit.wikimedia.org/r/362959

Change 362958 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: enable swift storage policies in eqiad

https://gerrit.wikimedia.org/r/362958

Change 362959 merged by Filippo Giunchedi:
[operations/puppet@production] swift: delete swift-object-reconstructor unit

https://gerrit.wikimedia.org/r/362959

fgiunchedi updated the task description. (Show Details)Jul 3 2017, 10:42 AM
fgiunchedi closed this task as Resolved.Jul 3 2017, 11:26 AM

This is completed, we have a lowlatency storage policy that will store objects in 3x SSDs

fgiunchedi mentioned this in Unknown Object (Task).Jan 3 2018, 10:02 AM