Google Cloud Storage

Composite Objects and Parallel Uploads


To support parallel uploads and limited append / edit functionality, Google Cloud Storage offers users the ability to compose up to 32 existing objects into a new object without transferring additional object data.

Content

  1. Compose Operation
  2. Component Count Property
  3. Integrity Checking Composite Objects
  4. Parallel Uploads
  5. Limited Append and Edit
  6. Performing Object Composition with gsutil
  7. Performing Object Composition with the XML API
  8. Performing Object Composition with the JSON API

Compose Operation

The compose operation creates a new object whose contents are the concatenation of a given sequence of up to 32 component objects under the same bucket. The components are unaffected by the process, and the resulting composite does not change if its components are replaced or deleted. Composite objects may even be built from other existing composites, provided that the total component count does not exceed 1024.

Component Count Property

Each object maintains a component count property, which specifies the number of originally uploaded objects it was created from. Newly uploaded objects have a component count of 1, and composing a sequence of objects creates an object whose component count is equal to the sum of component counts in the sequence. Compose operations that yield component counts exceeding 1024 are not allowed. To add more objects to a composite object with a component count of 1024, you will first need to copy the composite object to a new object by downloading and then uploading the object. For an example, see Performing Object Composition with gsutil.

Integrity Checking Composite Objects

Google Cloud Storage uses CRC32c for integrity checking each component object at upload time, and for allowing the caller to perform an integrity check of the resulting composite object when it is downloaded. CRC32c is an error detecting code that can be efficiently calculated from the CRC32c values of its components. Your application should use CRC32c as follows:

  • When uploading component objects, you should calculate the CRC32c for each object using a CRC32c library such as one of those listed below, and include that value in your request.
  • For the compose operation, you should not include a CRC32c in the request. Google Cloud Storage will respond with the CRC32c of the composite object. Google Cloud Storage will not calculate MD5 values for composite objects.
  • At download time, you should calculate the CRC32c of the downloaded object, and compare that with the value included in the response.
  • If your application could change component objects between the time of uploading and composing those objects, use source generation preconditions on compose operations to enforce concurrency control.
  • You should change any code that assumes the ETag contains an MD5 to use ETag only as an opaque value, for example for performing caching invalidation.

Libraries for computing CRC32c values include Boost for C++, GoogleCloudPlatform crc32c for Java, crcmod for Python, and digest-crc for Ruby. Note also that CRC32c is supported in hardware in current Intel CPUs.

In the past, Google Cloud Storage used MD5 to construct the ETag value. This is not true for composite objects; client code should make no assumptions about composite object ETags except that they will change whenever the underlying object changes per the IETF specification for HTTP/1.1.

Parallel Uploads

Object composition enables a simple mechanism for uploading an object in parallel: simply divide your data into as many chunks as required to fully utilize your available bandwidth, upload each chunk to a distinct object, compose your final object, and delete any temporary objects.

In order to protect against changes to component objects between the upload and compose requests, users should provide an expected generation number for each component. For more information about object generations, see Generations and Preconditions.

Limited Append and Edit

You can also use the compose operation to accomplish limited object appends and edits.

Append is accomplished by uploading data to a temporary new object, composing it with the object you wish to append to, and deleting the temporary object. Of course, this functionality is limited by the Component Count Property described above.

Previously, editing an object could only be accomplished by overwriting a complete object, but a composite object can be edited more efficiently by replacing one component object and recomposing the composite. For example, you could compose an object X from the sequence {Y1, Y2, Y3}, replace the contents of Y2, and recompose X from those same components. Note that this requires that Y1, Y2, and Y3 be left undeleted, so you will be billed for those components and the composite.

Performing Object Composition with gsutil

The gsutil command line utility supports object composition with the compose command. For details, please see its built-in documentation by running:

gsutil help compose

For example, to compose three objects (component-obj-1, component-obj-2, component-obj-3) into one object (composite-object), you can use the following command:

gsutil compose gs://bucket/component-obj-1 gs://bucket/component-obj-2 gs://bucket/component-obj-3 gs://bucket/composite-object

Or, if the objects you are composing are the only ones with the prefix "component-obj-", then you can also use a wildcard in the compose command as shown in the following example:

gsutil compose gs://bucket/component-obj-* gs://bucket/composite-object

After the compose operation, you can check the component count with the following command:

gsutil ls -L gs://bucket/composite-object

In this example, the component count is 3. However, if you had reached the component count of 1024 and you wanted to continue composing objects, you would need to copy the composite object to a new object by downloading and then uploading the object. You can do this using the "daisy chain" (-D) option of the cp command. If you omit the -D option, gsutil will copy the composite object "in the cloud" and its component count will not change and you will not be able to continue adding objects to it.

The following two commands copy composite-object to new-object and then move new-object back to use the original name. Both commands use the -p option of cp command so that gsutil preserves object ACLs.

gsutil cp -D -p gs://bucket/composite-object gs://bucket/new-object
gsutil mv -p gs://bucket/new-object gs://bucket/composite-object

Performing Object Composition with the XML API

With the XML API, you compose objects by issuing a PUT object request with the compose query parameter, and including an XML body listing the component object names in order as shown in the example below.

PUT /bucket/composite-object?compose HTTP/1.1
Host: storage.googleapis.com
Content-Length: 153
Authorization: <OAuth2 Token>

<ComposeRequest>
  <Component>
    <Name>component-obj-1</Name>
  </Component>
  <Component>
    <Name>component-obj-2</Name>
    <Generation>1361471441094000</Generation>
  </Component>
  <Component>
    <Name>component-obj-3</Name>
    <IfGenerationMatch>1361471441094000</IfGenerationMatch>
  </Component>
</ComposeRequest>

No bucket is specified for the component objects because, as noted earlier, the source and destination objects must all be under the same bucket.

The example request above also specifies a generation number for component-obj-2, so this request will compose version 1361471441094000 of the object even if that version is no longer current.

The third component was supplied with a conditional generation using the IfGenerationMatch tag; this will cause the request to fail if the given generation number doesn't represent the component's current generation.

The response to the above object composition request would look like:

Server: HTTP Upload Server Built on Mar 6 2013 16:24:27 (1362615867)
ETag: "-CKicn4fknbUCEAE="
x-goog-generation: 1362768951202000
x-goog-metageneration: 1
x-goog-hash: crc32c=fbWtZQ==
x-goog-component-count: 3
Vary: Origin
Date: Fri, 08 Mar 2013 18:55:51 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate
Content-Length: 0
Content-Type: text/html; charset=UTF-8

The x-goog-hash header reports the object's CRC32c value, which can be validated by building a CRC32c value from the CRC32c values from which the object was composed.

The component count of the new composite object is the value of the x-goog-component-count response header.

Back to top

Performing Object Composition with the JSON API

With the JSON API, you compose objects by issuing a compose request with a JSON body listing the component object names in order as shown in the example below.

POST /storage/v1beta2/b/bucket/o/composite-object/compose
Host: www.googleapis.com
Content-Length: 216
Content-Type: application/json
Authorization: <OAuth2 Token>

{
  "sourceObjects": [
    {
      "name": "component-obj-1"
    },
    { "name": "component-obj-2"
    },
    { "name": "component-obj-3"
    }
  ],
  "destination": {
   "contentType": "application/octet-stream"
 }
}

The response to the above object composition request would include an object resource which includes the component count:

{
 "kind": "storage#object",
 "id": "bucket/composite-object/1388778813188000",
 "selfLink": "https://www.googleapis.com/storage/v1beta2/b/bucket/o/composite-object",
 "name": "composite-object",
 "bucket": "bucket",
 "generation": "1388778813188000",
 "metageneration": "1",
 "contentType": "application/octet-stream",
 "updated": "2014-01-03T19:53:33.188Z",
 "size": "524052",
 "mediaLink": "https://www.googleapis.com/storage/v1beta2/b/bucket/o/composite-object?generation=1388778813188000&alt=media",
 "crc32c": "V9kcXg==",
 "componentCount": 3,
 "etag": "CKDP057k4rsCEAE="
}

Back to top

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.