Google Cloud Storage

Python Library

This tutorial shows you how to write a simple Python program that performs basic Google Cloud Storage operations. This document assumes you are familiar with Python and the Google Cloud Storage concepts and operations presented in the Hello Google Cloud Storage! guide.

In this tutorial

  1. Setting up your environment
  2. Setting up your Python source file
  3. Creating buckets
  4. Listing buckets
  5. Uploading objects
  6. Listing objects
  7. Downloading and copying objects
  8. Changing object ACLs
  9. Reading bucket and object metadata
  10. Deleting objects and buckets

Setting up your environment

Before starting this tutorial, you must do the following:

  1. Install gsutil on your computer.

    gsutil contains everything you need to run the code on this page, including a copy of boto, an open source Python library that provides an interface to Google Cloud Storage. Even if you already have boto installed on your system, gsutil needs to use the version of boto bundled with gsutil because the Google Cloud Storage team regularly adds features to boto needed by gsutil, and releases new gsutil versions before those boto changes have made it through the boto master branch release process.

  2. Modify your PYTHONPATH environment variable.

    After you download gsutil, you must modify PYTHONPATH to include the location where you installed gsutil as well as modules inside the gsutil directory. Add the following line to your .bashrc or .bash_profile. If you installed gsutil in a directory other than your home directory, replace $HOME with the directory where gsutil is located.

    export PYTHONPATH=${PYTHONPATH}:$HOME/gsutil:\
    $HOME/gsutil/third_party/boto:\
    $HOME/gsutil/third_party/retry-decorator:\
    $HOME/gsutil/third_party/socksipy-branch:\
    $HOME/gsutil/third_party/httplib2/python2:\
    $HOME/gsutil/third-party/httplib2/python2/httplib2:\
    $HOME/gsutil/third_party/google-api-python-client:\
    $HOME/gsutil/third_party/google-api-python-client/oauth2client
    
  3. Set up your boto configuration file to use OAuth2.0.

    One way to do this is by using gsutil config and running the command:

    $ gsutil config
    

    The command shown above prompts you to authorize gsutil to access Google Cloud Storage based on your user credentials. However, you can specify options with the gsutil config command to use service account credentials instead.

Setting up your Python source file

To start this tutorial, use your favorite text editor to create a new Python file. Then, add the following directives, import statements, configuration, and constant assignments shown here:

#!/usr/bin/python

import boto
import multiprocessing
import os
import shutil
import StringIO
import tempfile
import threading
import time
from gslib.third_party.oauth2_plugin import oauth2_plugin
from gslib.third_party.oauth2_plugin import oauth2_client

try:
  oauth2_client.token_exchange_lock = multiprocessing.Manager().Lock()
except:
  oauth2_client.token_exchange_lock = threading.Lock()


# URI scheme for Google Cloud Storage.
GOOGLE_STORAGE = 'gs'
# URI scheme for accessing local files.
LOCAL_FILE = 'file'

Creating buckets

This code creates two buckets. Because bucket names must be globally unique (see the naming guidelines), a timestamp is appended to each bucket name to help guarantee uniqueness.

If these bucket names are already in use, you'll need to modify the code to generate unique bucket names.

now = time.time()
CATS_BUCKET = 'cats-%d' % now
DOGS_BUCKET = 'dogs-%d' % now

# Your project ID can be found at https://console.developers.google.com/
# If there is no domain for your project, then project_id = 'YOUR_PROJECT'
project_id = 'YOUR_DOMAIN:YOUR_PROJECT'

for name in (CATS_BUCKET, DOGS_BUCKET):
  # Instantiate a BucketStorageUri object.
  uri = boto.storage_uri(name, GOOGLE_STORAGE)
  # Try to create the bucket.
  try:
    # If the default project is defined,
    # you do not need the headers.
    # Just call: uri.create_bucket()
    header_values = {"x-goog-project-id": project_id}
    uri.create_bucket(headers=header_values)

    print 'Successfully created bucket "%s"' % name
  except boto.exception.StorageCreateError, e:
    print 'Failed to create bucket:', e

Listing buckets

To retrieve a list of all buckets, call storage_uri() to instantiate a BucketStorageUri object, specifying the empty string as the URI. Then, call the get_all_buckets() instance method.

uri = boto.storage_uri('', GOOGLE_STORAGE)
# If the default project is defined, call get_all_buckets() without arguments.
for bucket in uri.get_all_buckets(headers=header_values):
  print bucket.name

Uploading objects

To upload objects, create a file object (opened for read) that points to your local file and a storage URI object that points to the destination object on Google Cloud Storage. Call the set_file_from_contents() instance method, specifying the file handle as the argument.

# Make some temporary files.
temp_dir = tempfile.mkdtemp(prefix='googlestorage')
tempfiles = {
    'labrador.txt': 'Who wants to play fetch? Me!',
    'collie.txt': 'Timmy fell down the well!'}
for filename, contents in tempfiles.iteritems():
  with open(os.path.join(temp_dir, filename), 'w') as fh:
    fh.write(contents)

# Upload these files to DOGS_BUCKET.
for filename in tempfiles:
  with open(os.path.join(temp_dir, filename), 'r') as localfile:

    dst_uri = boto.storage_uri(
        DOGS_BUCKET + '/' + filename, GOOGLE_STORAGE)
    # The key-related functions are a consequence of boto's
    # interoperability with Amazon S3 (which employs the
    # concept of a key mapping to localfile).
    dst_uri.new_key().set_contents_from_file(localfile)
  print 'Successfully created "%s/%s"' % (
      dst_uri.bucket_name, dst_uri.object_name)

shutil.rmtree(temp_dir)  # Don't forget to clean up!

Listing objects

To list all objects in a bucket, call storage_uri() and specify the bucket's URI and the Google Cloud Storage URI scheme as the arguments. Then, retrieve a list of objects using the get_bucket() instance method.

uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
  print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
  print '  "%s"' % obj.get_contents_as_string()

Downloading and copying objects

This code reads objects in DOGS_BUCKET and copies them to both your home directory and CATS_BUCKET. It also demonstrates that you can use the boto library to operate against both local files and Google Cloud Storage objects using the same interface.

dest_dir = os.getenv('HOME')
for filename in ('collie.txt', 'labrador.txt'):
  src_uri = boto.storage_uri(
      DOGS_BUCKET + '/' + filename, GOOGLE_STORAGE)

  # Create a file-like object for holding the object contents.
  object_contents = StringIO.StringIO()

  # The unintuitively-named get_file() doesn't return the object
  # contents; instead, it actually writes the contents to
  # object_contents.
  src_uri.get_key().get_file(object_contents)

  local_dst_uri = boto.storage_uri(
      os.path.join(dest_dir, filename), LOCAL_FILE)

  bucket_dst_uri = boto.storage_uri(
      CATS_BUCKET + '/' + filename, GOOGLE_STORAGE)

  for dst_uri in (local_dst_uri, bucket_dst_uri):
    object_contents.seek(0)
    dst_uri.new_key().set_contents_from_file(object_contents)

  object_contents.close()

Changing object ACLs

This code grants the specified Google account FULL_CONTROL permissions for labrador.txt. Remember to replace valid-email-address with a valid Google account email address.

uri = boto.storage_uri(DOGS_BUCKET + '/labrador.txt', GOOGLE_STORAGE)
print str(uri.get_acl())
uri.add_email_grant('FULL_CONTROL', 'valid-email-address')
print str(uri.get_acl())

Reading bucket and object metadata

This code retrieves and prints the metadata associated with a bucket and an object.

# Print ACL entries for DOGS_BUCKET.
bucket_uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for entry in bucket_uri.get_bucket().get_acl().entries.entry_list:
  entry_id = entry.scope.id
  if not entry_id:
    entry_id = entry.scope.email_address
  print 'SCOPE: %s' % entry_id
  print 'PERMISSION: %s\n' % entry.permission

# Print object metadata and ACL entries.
object_uri = boto.storage_uri(DOGS_BUCKET + '/labrador.txt', GOOGLE_STORAGE)
key = object_uri.get_key()
print ' Object size:\t%s' % key.size
print ' Last mod:\t%s' % key.last_modified
print ' MIME type:\t%s' % key.content_type
print ' MD5:\t%s' % key.etag.strip('"\'') # Remove surrounding quotes
for entry in key.get_acl().entries.entry_list:
  entry_id = entry.scope.id
  if not entry_id:
    entry_id = entry.scope.email_address
  print 'SCOPE: %s' % entry_id
  print 'PERMISSION: %s\n' % entry.permission

Deleting objects and buckets

To conclude this tutorial, this code deletes the objects and buckets that you have created. A bucket must be empty before it can be deleted, so its objects are first deleted.

for bucket in (CATS_BUCKET, DOGS_BUCKET):
  uri = boto.storage_uri(bucket, GOOGLE_STORAGE)
  for obj in uri.get_bucket():
    print 'Deleting object: %s...' % obj.name
    obj.delete()
  print 'Deleting bucket: %s...' % uri.bucket_name
  uri.delete_bucket()

Back to top

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.