Introduction
Container Sync is a feature of Swift where all the contents of a container can be mirrored to another container (within the same cluster or a completely different cluster) through background synchronization. You can even sync your data from a private Swift cluster to Rackspace Cloud Files, which is powered by Openstack Swift. This feature is a step towards providing greater availability and durability with geographically distinct replicas.
Container Sync
The container-sync feature is implemented as a simple approach that is easy to understand from a user’s perspective. A user marks the container with the URL of the container to sync to; a daemon will constantly monitor those containers and will replicate objects in the containers to the given container in the Swift cluster. The remote cluster will treat these objects as any other objects and will replicate the objects to the number the cluster is already configured for.
Container-sync doesn’t alter fundamental Swift internals nor does it introduce completely new concepts. These containers are exactly the same as any other containers.
A side benefit is that you can actually synchronize containers within the same cluster too, which can be useful if you are migrating a container from one account to another.
Use Cases
Account Migration
The container-sync feature can be used to do data migration between accounts. Enable sync on all containers on old account to new account. Once you verified that the sync is verified, set the old account to read only and purge it.
Similarly, data can also be migrated between different service providers without being locked in to a particular provider. For example, data can be migrated from a private Swift cluster to a public Swift cloud by enabling container-sync for each container in the private cluster to the public cluster.
Different geographical location
Containers can be synced to containers in other geographical locations. For example, containers in a Swift cluster can be synced to another Swift cluster at a different geographical location. This can be a backup during disaster recovery or to make data highly available from a different geographical location.
Sync to Rackspace Cloud Files
Data that needs security and compliancy requirements can be synced to another private Swift Cluster. But non-sensitive data in your Swift cluster can be synced to a public provider like Rackspace Cloud Files. This means you can take advantage of the CDN feature in Rackspace Cloud Files to serve your data globally if you so choose.
How It Works
The swift-container-sync daemon runs on every container server in the cluster and scans every container database looking for the containers with sync (X-Container-Sync-To and X-Container-Sync-Key HTTP headers) enabled. The daemon keeps track of the last sync point, and sends updates based on the new changes (PUTs and DELETEs) in the container database to the proxy servers in the remote cluster.
To avoid one container from starving all other containers, swift-container-sync can be configured to throttle time trying to sync a given container.
If a container-server crashed, the replacement container-server will get the database copies form the other two servers. But because of the “all updates” algorithm, no updates will be lost. Rebalancing the container ring results in a similar behavior.
Usage
The syncs begin from the container-server to the proxy server of the second cluster. Containers in each zone should be able to talk to the proxy servers on the second cluster.
Syncing container1 in swift-cluster1 to container2 in swift –cluster2 is as easy as setting the sync attributes of the container, i.e. container to sync to and a shared synchronization key.
swift post –t http://swift-cluster2:8080/v1/AUTH_9f00f7e/container2 -k secret container1
This will set two attributes on the sending container:
X-Container-Sync-To:http://swift-cluster2:8080/v1/AUTH_9f00f7e/container2
X-Container-Sync-Key: secret,
Which means any (current and new) objects in the container1 in swift-cluster1 will be synced to container2 of swift-cluster2 using the synchronization key.
Container2 of swift-cluster2 can be synced to container1 of swift-cluster1 to ensure container2 and container1 mirror each other.
swift post –t http://swift-cluster1:8080/v1/AUTH_9f00f7e/container1 -k secret container2
Chains can be created out of containers like container A syncs to B syncs to C syncs to A.
No explicit guarantee to when sync is complete is provided to the user. A request for sync will return a successful response when it is accepted, but the actual sync operation is an asynchronous operation, which runs in the background.
Also there needs to be enough bandwidth between the clusters to keep up with all the changes to the synchronized containers. If the synced container is pretty large, it will use a burst of bandwidth when the sync is being done. As the number of containers to sync grows, the user has to keep track of what containers are synced and their sync keys. There is no way to do it at a cluster level or account level.
Configuration
Enabling container-sync on a cluster is very simple. The Container-server that initiates the sync must be configured with a set of hosts to sync to:
container-server.conf
[DEFAULT]
# This is a comma separated list of hosts allowed in the X-Container-Sync-To
# field for containers.
# allowed_sync_hosts = 127.0.0.1
allowed-sync-hosts= swift-cluster2
[container-sync]
# Will sync, at most, each container once per interval
interval = 300
# Maximum amount of time to spend syncing each container per pass
container_time = 60
Caveats
Swift timestamps each operation and these timestamps are used in conflict resolution. If an object is deleted on one cluster and overwritten on the other, whichever operation has the newest timestamp will win. So the Swift Cluster clocks need to be set reasonably close to one another.
In case of object POSTs to the container, sync will happen only when the proxy server is set to use “object_post_as_copy = true”, which is the default. When “object_post_as_copy = false”, such objects are called fast object posts and are not added to the container listings, and hence are not detected for synchronization.
A large file in Swift is stored as segments and a special manifest file is used to tie the segments together. Both containers need to be synced if the segment files and manifest files reside in different containers.
Syncing to Rackspace Cloud Files
Syncing to Rackspace Cloud files from a private Swift cloud doesn’t work by default, since Cloud Files doesn’t use Keystone-based authentication. But you can get it working with a little bit of hacking the sync.py to properly auth to the Cloud Files while syncing. Here’s the modified sync.py for example: https://github.com/dani4571/swift/commit/9fb626e39b2345215c821e192629a28a966b4200.
Essentially, you need to set the following headers for the originating container:
X-Container-Sync-To: https://auth.api.rackspacecloud.com/v2.0/tokens
X-Container-Sync-Key: api key for the Cloud Files account,
X-Container-Meta-Rack-Sync-Info: username/container/region
The sync to public cloud works essentially the same way as other syncs except that public cloud authentication must take place first.
Additionally, the cluster should be configured with the proper allowed-sync-hosts.
container-server.conf
[DEFAULT]
allowed-sync-hosts= auth.api.rackspacecloud.com
Conclusion
As more and more Swift clusters are being built around the world in both public and private cloud, container-sync provides an easy way to sync your data between two independent Swift clusters. Although the current implementation is almost too simple and lacking in features, container-sync is a useful feature allowing data in a private Swift cloud to be backed up to public providers in different geographical locations or even complete migrations between providers and is a step towards what some call federated clouds.