Change Details

current situation: * codfw: swift 1.13.1 (icehouse) on trusty * eqiad: swift 1.13.1 (icehouse) on precise (save for 6 machines on trusty) * esams: swift 1.13.1 (icehouse) on precise (cluster not used) swift changelog: https://github.com/openstack/swift/blob/master/CHANGELOG jessie ships with swift 2.2 (juno) though the latest upstream version is 2.5 (liberty) ```lines=10,name=changelog diff 2.2.0 -> 2.5 swift (2.5.0, OpenStack Liberty) * Added the ability to specify ranges for Static Large Object (SLO) segments. * Replicator configs now support an "rsync_module" value to allow for per-device rsync modules. This setting gives operators the ability to fine-tune replication traffic in a Swift cluster and isolate replication disk IO to a particular device. Please see the docs and sample config files for more information and examples. * Significant work has gone in to testing, fixing, and validating Swift's erasure code support at different scales. * Swift now emits StatsD metrics on a per-policy basis. * Fixed an issue with Keystone integration where a COPY request to a service account may have succeeded even if a service token was not included in the request. * Ring validation now warns if a placement partition gets assigned to the same device multiple times. This happens when devices in the ring are unbalanced (e.g. two servers where one server has significantly more available capacity). * Various other minor bug fixes and improvements. swift (2.4.0) * Dependency changes - Added six requirement. This is part of an ongoing effort to add support for Python 3. - Dropped support for Python 2.6. * Config changes - Recent versions of Python restrict the number of headers allowed in a request to 100. This number may be too low for custom middleware. The new "extra_header_count" config value in swift.conf can be used to increase the number of headers allowed. - Renamed "run_pause" setting to "interval" (current configs with run_pause still work). Future versions of Swift may remove the run_pause setting. * Versioned writes middleware The versioned writes feature has been refactored and reimplemented as middleware. You should explicitly add the versioned_writes middleware to your proxy pipeline, but do not remove or disable the existing container server config setting ("allow_versions"), if it is currently enabled. The existing container server config setting enables existing containers to continue being versioned. Please see http://swift.openstack.org/middleware.html#how-to-enable-object-versioning-in-a-swift-cluster for further upgrade notes. * Allow 1+ object-servers-per-disk deployment Enabled by a new > 0 integer config value, "servers_per_port" in the [DEFAULT] config section for object-server and/or replication server configs. The setting's integer value determines how many different object-server workers handle requests for any single unique local port in the ring. In this mode, the parent swift-object-server process continues to run as the original user (i.e. root if low-port binding is required), binds to all ports as defined in the ring, and forks off the specified number of workers per listen socket. The child, per-port servers drop privileges and behave pretty much how object-server workers always have, except that because the ring has unique ports per disk, the object-servers will only be handling requests for a single disk. The parent process detects dead servers and restarts them (with the correct listen socket), starts missing servers when an updated ring file is found with a device on the server with a new port, and kills extraneous servers when their port is found to no longer be in the ring. The ring files are stat'ed at most every "ring_check_interval" seconds, as configured in the object-server config (same default of 15s). In testing, this deployment configuration (with a value of 3) lowers request latency, improves requests per second, and isolates slow disk IO as compared to the existing "workers" setting. To use this, each device must be added to the ring using a different port. * Do container listing updates in another (green)thread The object server has learned the "container_update_timeout" setting (with a default of 1 second). This value is the number of seconds that the object server will wait for the container server to update the listing before returning the status of the object PUT operation. Previously, the object server would wait up to 3 seconds for the container server response. The new behavior dramatically lowers object PUT latency when container servers in the cluster are busy (e.g. when the container is very large). Setting the value too low may result in a client PUT'ing an object and not being able to immediately find it in listings. Setting it too high will increase latency for clients when container servers are busy. * TempURL fixes (closes CVE-2015-5223) Do not allow PUT tempurls to create pointers to other data. Specifically, disallow the creation of DLO object manifests via a PUT tempurl. This prevents discoverability attacks which can use any PUT tempurl to probe for private data by creating a DLO object manifest and then using the PUT tempurl to head the object. * Ring changes - Partition placement no longer uses the port number to place partitions. This improves dispersion in small clusters running one object server per drive, and it does not affect dispersion in clusters running one object server per server. - Added ring-builder-analyzer tool to more easily test and analyze a series of ring management operations. - Stop moving partitions unnecessarily when overload is on. * Significant improvements and bug fixes have been made to erasure code support. This feature is suitable for beta testing, but it is not yet ready for broad production usage. * Bulk upload now treats user xattrs on files in the given archive as object metadata on the resulting created objects. * Emit warning log in object replicator if "handoffs_first" or "handoff_delete" is set. * Enable object replicator's failure count in swift-recon. * Added storage policy support to dispersion tools. * Support keystone v3 domains in swift-dispersion. * Added domain_remap information to the /info endpoint. * Added support for a "default_reseller_prefix" in domain_remap middleware config. * Allow SLO PUTs to forgo per-segment integrity checks. Previously, each segment referenced in the manifest also needed the correct etag and bytes setting. These fields now allow the "null" value to skip those particular checks on the given segment. * Allow rsync to use compression via a "rsync_compress" config. If set to true, compression is only enabled for an rsync to a device in a different region. In some cases, this can speed up cross-region replication data transfer. * Added time synchronization check in swift-recon (the --time option). * The account reaper now runs faster on large accounts. * Various other minor bug fixes and improvements. swift (2.3.0, OpenStack Kilo) * Erasure Code support (beta) Swift now supports an erasure-code (EC) storage policy type. This allows deployers to achieve very high durability with less raw capacity as used in replicated storage. However, EC requires more CPU and network resources, so it is not good for every use case. EC is great for storing large, infrequently accessed data in a single region. Swift's implementation of erasure codes is meant to be transparent to end users. There is no API difference between replicated storage and EC storage. To support erasure codes, Swift now depends on PyECLib and liberasurecode. liberasurecode is a pluggable library that allows for the actual EC algorithm to be implemented in a library of your choosing. As a beta release, EC support is nearly fully feature complete, but it is lacking support for some features (like multi-range reads) and has not had a full performance characterization. This feature relies on ssync for durability. Deployers are urged to do extensive testing and not deploy production data using an erasure code storage policy. Full docs are at http://swift.openstack.org/overview_erasure_code.html * Add support for container TempURL Keys. * Make more memcache options configurable. connection_timeout, pool_timeout, tries, and io_timeout are all now configurable. * Swift now supports composite tokens. This allows another service to act on behalf of a user, but only with that user's consent. See http://swift.openstack.org/overview_auth.html for more details. * Multi-region replication was improved. When replicating data to a different region, only one replica will be pushed per replication cycle. This gives the remote region a chance to replicate the data locally instead of pushing more data over the inter-region network. * Internal requests from the ratelimit middleware now properly log a swift_source. See http://swift.openstack.org/logs.html for details. * Improved storage policy support for quarantine stats in swift-recon. * The proxy log line now includes the request's storage policy index. * Ring checker has been added to swift-recon to validate if rings are built correctly. As part of this feature, storage servers have learned the OPTIONS verb. * Add support of x-remove- headers for container-sync. * Rings now support hostnames instead of just IP addresses. * Swift now enforces that the API version on a request is valid. Valid versions are configured via the valid_api_versions setting in swift.conf * Various other minor bug fixes and improvements. swift (2.2.2) * Data placement changes This release has several major changes to data placement in Swift in order to better handle different deployment patterns. First, with an unbalance-able ring, less partitions will move if the movement doesn't result in any better dispersion across failure domains. Also, empty (partition weight of zero) devices will no longer keep partitions after rebalancing when there is an unbalance-able ring. Second, the notion of "overload" has been added to Swift's rings. This allows devices to take some extra partitions (more than would normally be allowed by the device weight) so that smaller and unbalanced clusters will have less data movement between servers, zones, or regions if there is a failure in the cluster. Finally, rings have a new metric called "dispersion". This is the percentage of partitions in the ring that have too many replicas in a particular failure domain. For example, if you have three servers in a cluster but two replicas for a partition get placed onto the same server, that partition will count towards the dispersion metric. A lower value is better, and the value can be used to find the proper value for "overload". The overload and dispersion metrics have been exposed in the swift-ring-build CLI tools. See http://docs.openstack.org/developer/swift/overview_ring.html for more info on how data placement works now. * Improve replication of large out-of-sync, out-of-date containers. * Added console logging to swift-drive-audit with a new log_to_console config option (default False). * Optimize replication when a device and/or partition is specified. * Fix dynamic large object manifests getting versioned. This was not intended and did not work. Now it is properly prevented. * Fix the GET's response code when there is a missing segment in a large object manifest. * Change black/white listing in ratelimit middleware to use sysmeta. Instead of using the config option, operators can set "X-Account-Sysmeta-Global-Write-Ratelimit: WHITELIST" or "X-Account-Sysmeta-Global-Write-Ratelimit: BLACKLIST" on an account to whitelist or blacklist it for ratelimiting. Note: the existing config options continue to work. * Use TCP_NODELAY on outgoing connections. * Improve object-replicator startup time. * Implement OPTIONS verb for storage nodes. * Various other minor bug fixes and improvements. swift (2.2.1) * Swift now rejects object names with Unicode surrogates. * Return 403 (instead of 413) on unauthorized upload when over account quota. * Fix a rare condition when a rebalance could cause swift-ring-builder to crash. This would only happen on old ring files when "rebalance" was the first command run. * Storage node error limits now survive a ring reload. * Speed up reading and writing xattrs for object metadata by using larger xattr value sizes. The change is moving from 254 byte values to 64KiB values. There is no migration issue with this. * Deleted containers beyond the reclaim age are now properly reclaimed. * Full Simplified Chinese translation (zh_CN locale) for errors and logs. * Container quota is now properly enforced during cross-account COPY. * ssync replication now properly uses the configured replication_ip. * Fixed issue were ssync did not replicate custom object headers. * swift-drive-audit now has the 'unmount_failed_device' config option (default to True) that controls if the process will unmount failed drives or not. * swift-drive-audit will now dump drive error rates to a recon file. The file location is controlled by the 'recon_cache_path' config value and it includes each drive and its associated number of errors. * When a filesystem does't support xattr, the object server now returns a 507 Insufficient Storage error to the proxy server. * Clean up empty account and container partitions directories if they are empty. This keeps the system healthy and prevents a large number of empty directories from slowing down the replication process. * Show the sum of every policy's amount of async pendings in swift-recon. * Various other minor bug fixes and improvements. ``` there are jessie backports provided at http://liberty-jessie.pkgs.mirantis.com/ we could use/import. Dependencies to backport are relatively self contained: `python-eclib`, `liberasurecode`, `python-eventlet`

current situation: * codfw: swift 1.13.1/2.2.0 on trusty/jessie * eqiad: swift 1.13.1/2.2.0 on trusty/jessie * esams: swift 2.2.0 on jessie (cluster not used) swift changelog: https://github.com/openstack/swift/blob/master/CHANGELOG jessie ships with swift 2.2 (juno) though the latest version in backports is 2.7 (mitaka) ```lines=10,name=changelog diff 2.2.0 -> 2.7.0 swift (2.7.0, OpenStack Mitaka) * Bump PyECLib requirement to >= 1.2.0 * Update container on fast-POST "Fast-POST" is the mode where `object_post_as_copy` is set to `False` in the proxy server config. This mode now allows for fast, efficient updates of metadata without needing to fully recopy the contents of the object. While the default still is `object_post_as_copy` as True, the plan is to change the default to False and then deprecate post-as-copy functionality in later releases. Fast-POST now supports container-sync functionality. * Add concurrent reads option to proxy. This change adds 2 new parameters to enable and control concurrent GETs in Swift, these are `concurrent_gets` and `concurrency_timeout`. `concurrent_gets` allows you to turn on or off concurrent GETs; when on, it will set the GET/HEAD concurrency to the replica count. And in the case of EC HEADs it will set it to ndata. The proxy will then serve only the first valid source to respond. This applies to all account, container, and replicated object GETs and HEADs. For EC only HEAD requests are affected. The default for `concurrent_gets` is off. `concurrency_timeout` is related to `concurrent_gets` and is the amount of time to wait before firing the next thread. A value of 0 will fire at the same time (fully concurrent), but setting another value will stagger the firing allowing you the ability to give a node a short chance to respond before firing the next. This value is a float and should be somewhere between 0 and `node_timeout`. The default is `conn_timeout`, meaning by default it will stagger the firing. * Added an operational procedures guide to the docs. It can be found at http://swift.openstack.org/ops_runbook/index.html and includes information on detecting and handling day-to-day operational issues in a Swift cluster. * Make `handoffs_first` a more useful mode for the object replicator. The `handoffs_first` replication mode is used during periods of problematic cluster behavior (e.g. full disks) when replication needs to quickly drain partitions from a handoff node and move them to a primary node. Previously, `handoffs_first` would sort that handoff work before "normal" replication jobs, but the normal replication work could take quite some time and result in handoffs not being drained quickly enough. In order to focus on getting handoff partitions off the node `handoffs_first` mode will now abort the current replication sweep before attempting any primary suffix syncing if any of the handoff partitions were not removed for any reason - and start over with replication of handoffs jobs as the highest priority. Note that `handoffs_first` being enabled will emit a warning on start up, even if no handoff jobs fail, because of the negative impact it can have during normal operations by dog-piling on a node that was temporarily unavailable. * By default, inbound `X-Timestamp` headers are now disallowed (except when in an authorized container-sync request). This header is useful for allowing data migration from other storage systems to Swift and keeping the original timestamp of the data. If you have this migration use case (or any other requirement on allowing the clients to set an object's timestamp), set the `shunt_inbound_x_timestamp` config variable to False in the gatekeeper middleware config section of the proxy server config. * Requesting a SLO manifest file with the query parameters "?multipart-manifest=get&format=raw" will return the contents of the manifest in the format as was originally sent by the client. The "format=raw" is new. * Static web page listings can now be rendered with a custom label. By default listings are rendered with a label of: "Listing of /v1/<account>/<container>/<path>". This change adds a new custom metadata key/value pair `X-Container-Meta-Web-Listings-Label: My Label` that when set, will cause the following: "Listing of My Label/<path>" to be rendered instead. * Previously, static large objects (SLOs) had a minimum segment size (default to 1MiB). This limit has been removed, but small segments will be ratelimited. The config parameter `rate_limit_under_size` controls the definition of "small" segments (1MiB by default), and `rate_limit_segments_per_sec` controls how many segments per second can be served (default is 1). With the default values, the effective behavior is identical to the previous behavior when serving SLOs. * Container sync has been improved to perform a HEAD on the remote side of the sync for each object being synced. If the object exists on the remote side, container-sync will no longer transfer the object, thus significantly lowering the network requirements to use the feature. * The object auditor will now clean up any old, stale rsync temp files that it finds. These rsync temp files are left if the rsync process fails without completing a full transfer of an object. Since these files can be large, the temp files may end up filling a disk. The new auditor functionality will reap these rsync temp files if they are old. The new object-auditor config variable `rsync_tempfile_timeout` is the number of seconds old a tempfile must be before it is reaped. By default, this variable is set to "auto" or the rsync_timeout plus 900 seconds (falling back to a value of 1 day). * The Erasure Code reconstruction process has been made more efficient by not syncing data files when only the durable commit file is missing. * Fixed a bug where 304 and 416 response may not have the right Etag and Accept-Ranges headers when the object is stored in an Erasure Coded policy. * Versioned writes now correctly stores the date of previous versions using GMT instead of local time. * The deprecated Keystone middleware option is_admin has been removed. * Fixed log format in object auditor. * The zero-byte mode (ZBF) of the object auditor will now properly observe the `--once` option. * Swift keeps track, internally, of "dirty" parts of the partition keyspace with a "hashes.pkl" file. Operations on this file no longer require a read-modify-write cycle and use a new "hashes.invalid" file to track dirty partitions. This change will improve end-user performance for PUT and DELETE operations. * The object replicator's succeeded and failed counts are now logged. * `swift-recon` can now query hosts by storage policy. * The log_statsd_host value can now be an IPv6 address or a hostname which only resolves to an IPv6 address. * Erasure coded fragments now properly call fallocate to reserve disk space before being written. * Various other minor bug fixes and improvements. swift (2.6.0) * Dependency changes - Updated minimum version of eventlet to 0.17.4 to support IPv6. - Updated the minimum version of PyECLib to 1.0.7. * The ring rebalancing algorithm was updated to better handle edge cases and to give better (more balanced) rings in the general case. New rings will have better initial placement, capacity adjustments will move less data for better balance, and existing rings that were imbalanced should start to become better balanced as they go through rebalance cycles. * Added container and account reverse listings. A GET request to an account or container resource with a "reverse=true" query parameter will return the listing in reverse order. When iterating over pages of reverse listings, the relative order of marker and end_marker are swapped. * Storage policies now support having more than one name. This allows operators to fix a typo without breaking existing clients, or, alternatively, have "short names" for policies. This is implemented with the "aliases" config key in the storage policy config in swift.conf. The aliases value is a list of names that the storage policy may also be identified by. The storage policy "name" is used to report the policy to users (eg in container headers). The aliases have the same naming restrictions as the policy's primary name. * The object auditor learned the "interval" config value to control the time between each audit pass. * `swift-recon --all` now includes the config checksum check. * `swift-init` learned the --kill-after-timeout option to force a service to quit (SIGKILL) after a designated time. * `swift-recon` now correctly shows timestamps in UTC instead of local time. * Fixed bug where `swift-ring-builder` couldn't select device id 0. * Documented the previously undocumented `swift-ring-builder pretend_min_part_hours_passed` command. * The "node_timeout" config value now accepts decimal values. * `swift-ring-builder` now properly removes devices with zero weight. * `swift-init` return codes are updated via "--strict" and "--non-strict" options. Please see the usage string for more information. * `swift-ring-builder` now reports the min_part_hours lockout time remaining * Container sync has been improved to more quickly find and iterate over the containers to be synced. This reduced server load and lowers the time required to see data propagate between two clusters. Please see http://swift.openstack.org/overview_container_sync.html for more details about the new on-disk structure for tracking synchronized containers. * A container POST will now update that container's put-timestamp value. * TempURL header restrictions are now exposed in /info. * Error messages on static large object manifest responses have been greatly improved. * Closed a bug where an unfinished read of a large object would leak a socket file descriptor and a small amount of memory. (CVE-2016-0738) * Fixed an issue where a zero-byte object PUT with an incorrect Etag would return a 503. * Fixed an error when a static large object manifest references the same object more than once. * Improved performance of finding handoff nodes if a zone is empty. * Fixed duplication of headers in Access-Control-Expose-Headers on CORS requests. * Fixed handling of IPv6 connections to memcache pools. * Continued work towards python 3 compatibility. * Various other minor bug fixes and improvements. swift (2.5.0, OpenStack Liberty) * Added the ability to specify ranges for Static Large Object (SLO) segments. * Replicator configs now support an "rsync_module" value to allow for per-device rsync modules. This setting gives operators the ability to fine-tune replication traffic in a Swift cluster and isolate replication disk IO to a particular device. Please see the docs and sample config files for more information and examples. * Significant work has gone in to testing, fixing, and validating Swift's erasure code support at different scales. * Swift now emits StatsD metrics on a per-policy basis. * Fixed an issue with Keystone integration where a COPY request to a service account may have succeeded even if a service token was not included in the request. * Ring validation now warns if a placement partition gets assigned to the same device multiple times. This happens when devices in the ring are unbalanced (e.g. two servers where one server has significantly more available capacity). * Various other minor bug fixes and improvements. swift (2.4.0) * Dependency changes - Added six requirement. This is part of an ongoing effort to add support for Python 3. - Dropped support for Python 2.6. * Config changes - Recent versions of Python restrict the number of headers allowed in a request to 100. This number may be too low for custom middleware. The new "extra_header_count" config value in swift.conf can be used to increase the number of headers allowed. - Renamed "run_pause" setting to "interval" (current configs with run_pause still work). Future versions of Swift may remove the run_pause setting. * Versioned writes middleware The versioned writes feature has been refactored and reimplemented as middleware. You should explicitly add the versioned_writes middleware to your proxy pipeline, but do not remove or disable the existing container server config setting ("allow_versions"), if it is currently enabled. The existing container server config setting enables existing containers to continue being versioned. Please see http://swift.openstack.org/middleware.html#how-to-enable-object-versioning-in-a-swift-cluster for further upgrade notes. * Allow 1+ object-servers-per-disk deployment Enabled by a new > 0 integer config value, "servers_per_port" in the [DEFAULT] config section for object-server and/or replication server configs. The setting's integer value determines how many different object-server workers handle requests for any single unique local port in the ring. In this mode, the parent swift-object-server process continues to run as the original user (i.e. root if low-port binding is required), binds to all ports as defined in the ring, and forks off the specified number of workers per listen socket. The child, per-port servers drop privileges and behave pretty much how object-server workers always have, except that because the ring has unique ports per disk, the object-servers will only be handling requests for a single disk. The parent process detects dead servers and restarts them (with the correct listen socket), starts missing servers when an updated ring file is found with a device on the server with a new port, and kills extraneous servers when their port is found to no longer be in the ring. The ring files are stat'ed at most every "ring_check_interval" seconds, as configured in the object-server config (same default of 15s). In testing, this deployment configuration (with a value of 3) lowers request latency, improves requests per second, and isolates slow disk IO as compared to the existing "workers" setting. To use this, each device must be added to the ring using a different port. * Do container listing updates in another (green)thread The object server has learned the "container_update_timeout" setting (with a default of 1 second). This value is the number of seconds that the object server will wait for the container server to update the listing before returning the status of the object PUT operation. Previously, the object server would wait up to 3 seconds for the container server response. The new behavior dramatically lowers object PUT latency when container servers in the cluster are busy (e.g. when the container is very large). Setting the value too low may result in a client PUT'ing an object and not being able to immediately find it in listings. Setting it too high will increase latency for clients when container servers are busy. * TempURL fixes (closes CVE-2015-5223) Do not allow PUT tempurls to create pointers to other data. Specifically, disallow the creation of DLO object manifests via a PUT tempurl. This prevents discoverability attacks which can use any PUT tempurl to probe for private data by creating a DLO object manifest and then using the PUT tempurl to head the object. * Ring changes - Partition placement no longer uses the port number to place partitions. This improves dispersion in small clusters running one object server per drive, and it does not affect dispersion in clusters running one object server per server. - Added ring-builder-analyzer tool to more easily test and analyze a series of ring management operations. - Stop moving partitions unnecessarily when overload is on. * Significant improvements and bug fixes have been made to erasure code support. This feature is suitable for beta testing, but it is not yet ready for broad production usage. * Bulk upload now treats user xattrs on files in the given archive as object metadata on the resulting created objects. * Emit warning log in object replicator if "handoffs_first" or "handoff_delete" is set. * Enable object replicator's failure count in swift-recon. * Added storage policy support to dispersion tools. * Support keystone v3 domains in swift-dispersion. * Added domain_remap information to the /info endpoint. * Added support for a "default_reseller_prefix" in domain_remap middleware config. * Allow SLO PUTs to forgo per-segment integrity checks. Previously, each segment referenced in the manifest also needed the correct etag and bytes setting. These fields now allow the "null" value to skip those particular checks on the given segment. * Allow rsync to use compression via a "rsync_compress" config. If set to true, compression is only enabled for an rsync to a device in a different region. In some cases, this can speed up cross-region replication data transfer. * Added time synchronization check in swift-recon (the --time option). * The account reaper now runs faster on large accounts. * Various other minor bug fixes and improvements. swift (2.3.0, OpenStack Kilo) * Erasure Code support (beta) Swift now supports an erasure-code (EC) storage policy type. This allows deployers to achieve very high durability with less raw capacity as used in replicated storage. However, EC requires more CPU and network resources, so it is not good for every use case. EC is great for storing large, infrequently accessed data in a single region. Swift's implementation of erasure codes is meant to be transparent to end users. There is no API difference between replicated storage and EC storage. To support erasure codes, Swift now depends on PyECLib and liberasurecode. liberasurecode is a pluggable library that allows for the actual EC algorithm to be implemented in a library of your choosing. As a beta release, EC support is nearly fully feature complete, but it is lacking support for some features (like multi-range reads) and has not had a full performance characterization. This feature relies on ssync for durability. Deployers are urged to do extensive testing and not deploy production data using an erasure code storage policy. Full docs are at http://swift.openstack.org/overview_erasure_code.html * Add support for container TempURL Keys. * Make more memcache options configurable. connection_timeout, pool_timeout, tries, and io_timeout are all now configurable. * Swift now supports composite tokens. This allows another service to act on behalf of a user, but only with that user's consent. See http://swift.openstack.org/overview_auth.html for more details. * Multi-region replication was improved. When replicating data to a different region, only one replica will be pushed per replication cycle. This gives the remote region a chance to replicate the data locally instead of pushing more data over the inter-region network. * Internal requests from the ratelimit middleware now properly log a swift_source. See http://swift.openstack.org/logs.html for details. * Improved storage policy support for quarantine stats in swift-recon. * The proxy log line now includes the request's storage policy index. * Ring checker has been added to swift-recon to validate if rings are built correctly. As part of this feature, storage servers have learned the OPTIONS verb. * Add support of x-remove- headers for container-sync. * Rings now support hostnames instead of just IP addresses. * Swift now enforces that the API version on a request is valid. Valid versions are configured via the valid_api_versions setting in swift.conf * Various other minor bug fixes and improvements. swift (2.2.2) * Data placement changes This release has several major changes to data placement in Swift in order to better handle different deployment patterns. First, with an unbalance-able ring, less partitions will move if the movement doesn't result in any better dispersion across failure domains. Also, empty (partition weight of zero) devices will no longer keep partitions after rebalancing when there is an unbalance-able ring. Second, the notion of "overload" has been added to Swift's rings. This allows devices to take some extra partitions (more than would normally be allowed by the device weight) so that smaller and unbalanced clusters will have less data movement between servers, zones, or regions if there is a failure in the cluster. Finally, rings have a new metric called "dispersion". This is the percentage of partitions in the ring that have too many replicas in a particular failure domain. For example, if you have three servers in a cluster but two replicas for a partition get placed onto the same server, that partition will count towards the dispersion metric. A lower value is better, and the value can be used to find the proper value for "overload". The overload and dispersion metrics have been exposed in the swift-ring-build CLI tools. See http://docs.openstack.org/developer/swift/overview_ring.html for more info on how data placement works now. * Improve replication of large out-of-sync, out-of-date containers. * Added console logging to swift-drive-audit with a new log_to_console config option (default False). * Optimize replication when a device and/or partition is specified. * Fix dynamic large object manifests getting versioned. This was not intended and did not work. Now it is properly prevented. * Fix the GET's response code when there is a missing segment in a large object manifest. * Change black/white listing in ratelimit middleware to use sysmeta. Instead of using the config option, operators can set "X-Account-Sysmeta-Global-Write-Ratelimit: WHITELIST" or "X-Account-Sysmeta-Global-Write-Ratelimit: BLACKLIST" on an account to whitelist or blacklist it for ratelimiting. Note: the existing config options continue to work. * Use TCP_NODELAY on outgoing connections. * Improve object-replicator startup time. * Implement OPTIONS verb for storage nodes. * Various other minor bug fixes and improvements. swift (2.2.1) * Swift now rejects object names with Unicode surrogates. * Return 403 (instead of 413) on unauthorized upload when over account quota. * Fix a rare condition when a rebalance could cause swift-ring-builder to crash. This would only happen on old ring files when "rebalance" was the first command run. * Storage node error limits now survive a ring reload. * Speed up reading and writing xattrs for object metadata by using larger xattr value sizes. The change is moving from 254 byte values to 64KiB values. There is no migration issue with this. * Deleted containers beyond the reclaim age are now properly reclaimed. * Full Simplified Chinese translation (zh_CN locale) for errors and logs. * Container quota is now properly enforced during cross-account COPY. * ssync replication now properly uses the configured replication_ip. * Fixed issue were ssync did not replicate custom object headers. * swift-drive-audit now has the 'unmount_failed_device' config option (default to True) that controls if the process will unmount failed drives or not. * swift-drive-audit will now dump drive error rates to a recon file. The file location is controlled by the 'recon_cache_path' config value and it includes each drive and its associated number of errors. * When a filesystem does't support xattr, the object server now returns a 507 Insufficient Storage error to the proxy server. * Clean up empty account and container partitions directories if they are empty. This keeps the system healthy and prevents a large number of empty directories from slowing down the replication process. * Show the sum of every policy's amount of async pendings in swift-recon. * Various other minor bug fixes and improvements. ``` Note that 2.10 at least fixes a bug introduced in 2.7 marked as "critical" https://bugs.launchpad.net/swift/+bug/1651530 so we might hold off on 2.7

current situation: * codfw: swift 1.13.1 (icehouse)1/2.2.0 on trusty/jessie * eqiad: swift 1.13.1 (icehouse)1/2.2.0 on precise (save for 6 machines on trusty)trusty/jessie * esams: swift 1.13.1 (icehouse)2.2.0 on precisjessie (cluster not used) swift changelog: https://github.com/openstack/swift/blob/master/CHANGELOG jessie ships with swift 2.2 (juno) though the latest upstream versionversion in backports is 2.5 (liberty7 (mitaka) ```lines=10,name=changelog diff 2.2.0 -> 2.57.0 swift (2.7.0, OpenStack Mitaka) * Bump PyECLib requirement to >= 1.2.0 * Update container on fast-POST "Fast-POST" is the mode where `object_post_as_copy` is set to `False` in the proxy server config. This mode now allows for fast, efficient updates of metadata without needing to fully recopy the contents of the object. While the default still is `object_post_as_copy` as True, the plan is to change the default to False and then deprecate post-as-copy functionality in later releases. Fast-POST now supports container-sync functionality. * Add concurrent reads option to proxy. This change adds 2 new parameters to enable and control concurrent GETs in Swift, these are `concurrent_gets` and `concurrency_timeout`. `concurrent_gets` allows you to turn on or off concurrent GETs; when on, it will set the GET/HEAD concurrency to the replica count. And in the case of EC HEADs it will set it to ndata. The proxy will then serve only the first valid source to respond. This applies to all account, container, and replicated object GETs and HEADs. For EC only HEAD requests are affected. The default for `concurrent_gets` is off. `concurrency_timeout` is related to `concurrent_gets` and is the amount of time to wait before firing the next thread. A value of 0 will fire at the same time (fully concurrent), but setting another value will stagger the firing allowing you the ability to give a node a short chance to respond before firing the next. This value is a float and should be somewhere between 0 and `node_timeout`. The default is `conn_timeout`, meaning by default it will stagger the firing. * Added an operational procedures guide to the docs. It can be found at http://swift.openstack.org/ops_runbook/index.html and includes information on detecting and handling day-to-day operational issues in a Swift cluster. * Make `handoffs_first` a more useful mode for the object replicator. The `handoffs_first` replication mode is used during periods of problematic cluster behavior (e.g. full disks) when replication needs to quickly drain partitions from a handoff node and move them to a primary node. Previously, `handoffs_first` would sort that handoff work before "normal" replication jobs, but the normal replication work could take quite some time and result in handoffs not being drained quickly enough. In order to focus on getting handoff partitions off the node `handoffs_first` mode will now abort the current replication sweep before attempting any primary suffix syncing if any of the handoff partitions were not removed for any reason - and start over with replication of handoffs jobs as the highest priority. Note that `handoffs_first` being enabled will emit a warning on start up, even if no handoff jobs fail, because of the negative impact it can have during normal operations by dog-piling on a node that was temporarily unavailable. * By default, inbound `X-Timestamp` headers are now disallowed (except when in an authorized container-sync request). This header is useful for allowing data migration from other storage systems to Swift and keeping the original timestamp of the data. If you have this migration use case (or any other requirement on allowing the clients to set an object's timestamp), set the `shunt_inbound_x_timestamp` config variable to False in the gatekeeper middleware config section of the proxy server config. * Requesting a SLO manifest file with the query parameters "?multipart-manifest=get&format=raw" will return the contents of the manifest in the format as was originally sent by the client. The "format=raw" is new. * Static web page listings can now be rendered with a custom label. By default listings are rendered with a label of: "Listing of /v1/<account>/<container>/<path>". This change adds a new custom metadata key/value pair `X-Container-Meta-Web-Listings-Label: My Label` that when set, will cause the following: "Listing of My Label/<path>" to be rendered instead. * Previously, static large objects (SLOs) had a minimum segment size (default to 1MiB). This limit has been removed, but small segments will be ratelimited. The config parameter `rate_limit_under_size` controls the definition of "small" segments (1MiB by default), and `rate_limit_segments_per_sec` controls how many segments per second can be served (default is 1). With the default values, the effective behavior is identical to the previous behavior when serving SLOs. * Container sync has been improved to perform a HEAD on the remote side of the sync for each object being synced. If the object exists on the remote side, container-sync will no longer transfer the object, thus significantly lowering the network requirements to use the feature. * The object auditor will now clean up any old, stale rsync temp files that it finds. These rsync temp files are left if the rsync process fails without completing a full transfer of an object. Since these files can be large, the temp files may end up filling a disk. The new auditor functionality will reap these rsync temp files if they are old. The new object-auditor config variable `rsync_tempfile_timeout` is the number of seconds old a tempfile must be before it is reaped. By default, this variable is set to "auto" or the rsync_timeout plus 900 seconds (falling back to a value of 1 day). * The Erasure Code reconstruction process has been made more efficient by not syncing data files when only the durable commit file is missing. * Fixed a bug where 304 and 416 response may not have the right Etag and Accept-Ranges headers when the object is stored in an Erasure Coded policy. * Versioned writes now correctly stores the date of previous versions using GMT instead of local time. * The deprecated Keystone middleware option is_admin has been removed. * Fixed log format in object auditor. * The zero-byte mode (ZBF) of the object auditor will now properly observe the `--once` option. * Swift keeps track, internally, of "dirty" parts of the partition keyspace with a "hashes.pkl" file. Operations on this file no longer require a read-modify-write cycle and use a new "hashes.invalid" file to track dirty partitions. This change will improve end-user performance for PUT and DELETE operations. * The object replicator's succeeded and failed counts are now logged. * `swift-recon` can now query hosts by storage policy. * The log_statsd_host value can now be an IPv6 address or a hostname which only resolves to an IPv6 address. * Erasure coded fragments now properly call fallocate to reserve disk space before being written. * Various other minor bug fixes and improvements. swift (2.6.0) * Dependency changes - Updated minimum version of eventlet to 0.17.4 to support IPv6. - Updated the minimum version of PyECLib to 1.0.7. * The ring rebalancing algorithm was updated to better handle edge cases and to give better (more balanced) rings in the general case. New rings will have better initial placement, capacity adjustments will move less data for better balance, and existing rings that were imbalanced should start to become better balanced as they go through rebalance cycles. * Added container and account reverse listings. A GET request to an account or container resource with a "reverse=true" query parameter will return the listing in reverse order. When iterating over pages of reverse listings, the relative order of marker and end_marker are swapped. * Storage policies now support having more than one name. This allows operators to fix a typo without breaking existing clients, or, alternatively, have "short names" for policies. This is implemented with the "aliases" config key in the storage policy config in swift.conf. The aliases value is a list of names that the storage policy may also be identified by. The storage policy "name" is used to report the policy to users (eg in container headers). The aliases have the same naming restrictions as the policy's primary name. * The object auditor learned the "interval" config value to control the time between each audit pass. * `swift-recon --all` now includes the config checksum check. * `swift-init` learned the --kill-after-timeout option to force a service to quit (SIGKILL) after a designated time. * `swift-recon` now correctly shows timestamps in UTC instead of local time. * Fixed bug where `swift-ring-builder` couldn't select device id 0. * Documented the previously undocumented `swift-ring-builder pretend_min_part_hours_passed` command. * The "node_timeout" config value now accepts decimal values. * `swift-ring-builder` now properly removes devices with zero weight. * `swift-init` return codes are updated via "--strict" and "--non-strict" options. Please see the usage string for more information. * `swift-ring-builder` now reports the min_part_hours lockout time remaining * Container sync has been improved to more quickly find and iterate over the containers to be synced. This reduced server load and lowers the time required to see data propagate between two clusters. Please see http://swift.openstack.org/overview_container_sync.html for more details about the new on-disk structure for tracking synchronized containers. * A container POST will now update that container's put-timestamp value. * TempURL header restrictions are now exposed in /info. * Error messages on static large object manifest responses have been greatly improved. * Closed a bug where an unfinished read of a large object would leak a socket file descriptor and a small amount of memory. (CVE-2016-0738) * Fixed an issue where a zero-byte object PUT with an incorrect Etag would return a 503. * Fixed an error when a static large object manifest references the same object more than once. * Improved performance of finding handoff nodes if a zone is empty. * Fixed duplication of headers in Access-Control-Expose-Headers on CORS requests. * Fixed handling of IPv6 connections to memcache pools. * Continued work towards python 3 compatibility. * Various other minor bug fixes and improvements. swift (2.5.0, OpenStack Liberty) * Added the ability to specify ranges for Static Large Object (SLO) segments. * Replicator configs now support an "rsync_module" value to allow for per-device rsync modules. This setting gives operators the ability to fine-tune replication traffic in a Swift cluster and isolate replication disk IO to a particular device. Please see the docs and sample config files for more information and examples. * Significant work has gone in to testing, fixing, and validating Swift's erasure code support at different scales. * Swift now emits StatsD metrics on a per-policy basis. * Fixed an issue with Keystone integration where a COPY request to a service account may have succeeded even if a service token was not included in the request. * Ring validation now warns if a placement partition gets assigned to the same device multiple times. This happens when devices in the ring are unbalanced (e.g. two servers where one server has significantly more available capacity). * Various other minor bug fixes and improvements. swift (2.4.0) * Dependency changes - Added six requirement. This is part of an ongoing effort to add support for Python 3. - Dropped support for Python 2.6. * Config changes - Recent versions of Python restrict the number of headers allowed in a request to 100. This number may be too low for custom middleware. The new "extra_header_count" config value in swift.conf can be used to increase the number of headers allowed. - Renamed "run_pause" setting to "interval" (current configs with run_pause still work). Future versions of Swift may remove the run_pause setting. * Versioned writes middleware The versioned writes feature has been refactored and reimplemented as middleware. You should explicitly add the versioned_writes middleware to your proxy pipeline, but do not remove or disable the existing container server config setting ("allow_versions"), if it is currently enabled. The existing container server config setting enables existing containers to continue being versioned. Please see http://swift.openstack.org/middleware.html#how-to-enable-object-versioning-in-a-swift-cluster for further upgrade notes. * Allow 1+ object-servers-per-disk deployment Enabled by a new > 0 integer config value, "servers_per_port" in the [DEFAULT] config section for object-server and/or replication server configs. The setting's integer value determines how many different object-server workers handle requests for any single unique local port in the ring. In this mode, the parent swift-object-server process continues to run as the original user (i.e. root if low-port binding is required), binds to all ports as defined in the ring, and forks off the specified number of workers per listen socket. The child, per-port servers drop privileges and behave pretty much how object-server workers always have, except that because the ring has unique ports per disk, the object-servers will only be handling requests for a single disk. The parent process detects dead servers and restarts them (with the correct listen socket), starts missing servers when an updated ring file is found with a device on the server with a new port, and kills extraneous servers when their port is found to no longer be in the ring. The ring files are stat'ed at most every "ring_check_interval" seconds, as configured in the object-server config (same default of 15s). In testing, this deployment configuration (with a value of 3) lowers request latency, improves requests per second, and isolates slow disk IO as compared to the existing "workers" setting. To use this, each device must be added to the ring using a different port. * Do container listing updates in another (green)thread The object server has learned the "container_update_timeout" setting (with a default of 1 second). This value is the number of seconds that the object server will wait for the container server to update the listing before returning the status of the object PUT operation. Previously, the object server would wait up to 3 seconds for the container server response. The new behavior dramatically lowers object PUT latency when container servers in the cluster are busy (e.g. when the container is very large). Setting the value too low may result in a client PUT'ing an object and not being able to immediately find it in listings. Setting it too high will increase latency for clients when container servers are busy. * TempURL fixes (closes CVE-2015-5223) Do not allow PUT tempurls to create pointers to other data. Specifically, disallow the creation of DLO object manifests via a PUT tempurl. This prevents discoverability attacks which can use any PUT tempurl to probe for private data by creating a DLO object manifest and then using the PUT tempurl to head the object. * Ring changes - Partition placement no longer uses the port number to place partitions. This improves dispersion in small clusters running one object server per drive, and it does not affect dispersion in clusters running one object server per server. - Added ring-builder-analyzer tool to more easily test and analyze a series of ring management operations. - Stop moving partitions unnecessarily when overload is on. * Significant improvements and bug fixes have been made to erasure code support. This feature is suitable for beta testing, but it is not yet ready for broad production usage. * Bulk upload now treats user xattrs on files in the given archive as object metadata on the resulting created objects. * Emit warning log in object replicator if "handoffs_first" or "handoff_delete" is set. * Enable object replicator's failure count in swift-recon. * Added storage policy support to dispersion tools. * Support keystone v3 domains in swift-dispersion. * Added domain_remap information to the /info endpoint. * Added support for a "default_reseller_prefix" in domain_remap middleware config. * Allow SLO PUTs to forgo per-segment integrity checks. Previously, each segment referenced in the manifest also needed the correct etag and bytes setting. These fields now allow the "null" value to skip those particular checks on the given segment. * Allow rsync to use compression via a "rsync_compress" config. If set to true, compression is only enabled for an rsync to a device in a different region. In some cases, this can speed up cross-region replication data transfer. * Added time synchronization check in swift-recon (the --time option). * The account reaper now runs faster on large accounts. * Various other minor bug fixes and improvements. swift (2.3.0, OpenStack Kilo) * Erasure Code support (beta) Swift now supports an erasure-code (EC) storage policy type. This allows deployers to achieve very high durability with less raw capacity as used in replicated storage. However, EC requires more CPU and network resources, so it is not good for every use case. EC is great for storing large, infrequently accessed data in a single region. Swift's implementation of erasure codes is meant to be transparent to end users. There is no API difference between replicated storage and EC storage. To support erasure codes, Swift now depends on PyECLib and liberasurecode. liberasurecode is a pluggable library that allows for the actual EC algorithm to be implemented in a library of your choosing. As a beta release, EC support is nearly fully feature complete, but it is lacking support for some features (like multi-range reads) and has not had a full performance characterization. This feature relies on ssync for durability. Deployers are urged to do extensive testing and not deploy production data using an erasure code storage policy. Full docs are at http://swift.openstack.org/overview_erasure_code.html * Add support for container TempURL Keys. * Make more memcache options configurable. connection_timeout, pool_timeout, tries, and io_timeout are all now configurable. * Swift now supports composite tokens. This allows another service to act on behalf of a user, but only with that user's consent. See http://swift.openstack.org/overview_auth.html for more details. * Multi-region replication was improved. When replicating data to a different region, only one replica will be pushed per replication cycle. This gives the remote region a chance to replicate the data locally instead of pushing more data over the inter-region network. * Internal requests from the ratelimit middleware now properly log a swift_source. See http://swift.openstack.org/logs.html for details. * Improved storage policy support for quarantine stats in swift-recon. * The proxy log line now includes the request's storage policy index. * Ring checker has been added to swift-recon to validate if rings are built correctly. As part of this feature, storage servers have learned the OPTIONS verb. * Add support of x-remove- headers for container-sync. * Rings now support hostnames instead of just IP addresses. * Swift now enforces that the API version on a request is valid. Valid versions are configured via the valid_api_versions setting in swift.conf * Various other minor bug fixes and improvements. swift (2.2.2) * Data placement changes This release has several major changes to data placement in Swift in order to better handle different deployment patterns. First, with an unbalance-able ring, less partitions will move if the movement doesn't result in any better dispersion across failure domains. Also, empty (partition weight of zero) devices will no longer keep partitions after rebalancing when there is an unbalance-able ring. Second, the notion of "overload" has been added to Swift's rings. This allows devices to take some extra partitions (more than would normally be allowed by the device weight) so that smaller and unbalanced clusters will have less data movement between servers, zones, or regions if there is a failure in the cluster. Finally, rings have a new metric called "dispersion". This is the percentage of partitions in the ring that have too many replicas in a particular failure domain. For example, if you have three servers in a cluster but two replicas for a partition get placed onto the same server, that partition will count towards the dispersion metric. A lower value is better, and the value can be used to find the proper value for "overload". The overload and dispersion metrics have been exposed in the swift-ring-build CLI tools. See http://docs.openstack.org/developer/swift/overview_ring.html for more info on how data placement works now. * Improve replication of large out-of-sync, out-of-date containers. * Added console logging to swift-drive-audit with a new log_to_console config option (default False). * Optimize replication when a device and/or partition is specified. * Fix dynamic large object manifests getting versioned. This was not intended and did not work. Now it is properly prevented. * Fix the GET's response code when there is a missing segment in a large object manifest. * Change black/white listing in ratelimit middleware to use sysmeta. Instead of using the config option, operators can set "X-Account-Sysmeta-Global-Write-Ratelimit: WHITELIST" or "X-Account-Sysmeta-Global-Write-Ratelimit: BLACKLIST" on an account to whitelist or blacklist it for ratelimiting. Note: the existing config options continue to work. * Use TCP_NODELAY on outgoing connections. * Improve object-replicator startup time. * Implement OPTIONS verb for storage nodes. * Various other minor bug fixes and improvements. swift (2.2.1) * Swift now rejects object names with Unicode surrogates. * Return 403 (instead of 413) on unauthorized upload when over account quota. * Fix a rare condition when a rebalance could cause swift-ring-builder to crash. This would only happen on old ring files when "rebalance" was the first command run. * Storage node error limits now survive a ring reload. * Speed up reading and writing xattrs for object metadata by using larger xattr value sizes. The change is moving from 254 byte values to 64KiB values. There is no migration issue with this. * Deleted containers beyond the reclaim age are now properly reclaimed. * Full Simplified Chinese translation (zh_CN locale) for errors and logs. * Container quota is now properly enforced during cross-account COPY. * ssync replication now properly uses the configured replication_ip. * Fixed issue were ssync did not replicate custom object headers. * swift-drive-audit now has the 'unmount_failed_device' config option (default to True) that controls if the process will unmount failed drives or not. * swift-drive-audit will now dump drive error rates to a recon file. The file location is controlled by the 'recon_cache_path' config value and it includes each drive and its associated number of errors. * When a filesystem does't support xattr, the object server now returns a 507 Insufficient Storage error to the proxy server. * Clean up empty account and container partitions directories if they are empty. This keeps the system healthy and prevents a large number of empty directories from slowing down the replication process. * Show the sum of every policy's amount of async pendings in swift-recon. * Various other minor bug fixes and improvements. ``` there are jessie backports provided at http://liberty-jessie.pkgs.mirantis.com/ we could use/import. Dependencies to backport are relatively self contained: `python-eclib`, `liberasurecode`, `python-eventlet`Note that 2.10 at least fixes a bug introduced in 2.7 marked as "critical" https://bugs.launchpad.net/swift/+bug/1651530 so we might hold off on 2.7