Google Cloud Storage

Access Logs & Storage Data

This document discusses how to download and review access logs and storage data for your Google Cloud Storage buckets, and analyze the logs using Google BigQuery.

Contents

  1. Introduction
  2. Setting Up Log Delivery
  3. Checking Logging Status
  4. Downloading Logs
  5. Analyzing Logs in BigQuery
  6. Disabling Logging
  7. Access and Storage Log Format

Introduction

Google Cloud Storage offers access logs and storage data in the form of CSV files that you can download and view. Access logs provide information for all of the requests made on a specified bucket and are created hourly, while the daily storage logs provide information about the storage consumption of that bucket for the last day. The access logs and storage data files are automatically created as new objects in a bucket that you specify. Access and storage logs are currently only available in CSV format.

When you configure a Google Cloud Storage bucket to simulate the behavior of a static website, you might want to log how resources in the website are being used. Note that you can also configure bucket access logs and storage data files for any Google Cloud Storage bucket.

Back to top

Setting Up Log Delivery

The following steps describe how to set up logs delivery for a specific bucket using the gsutil tool, the XML API, and the JSON API. If you don't have the gsutil tool, download and install it.

gsutil

  1. Create a bucket to store your logs.

    Create a bucket to store your logs using the following command:

    gsutil mb gs://my_logs
  2. Set permissions to allow Google Cloud Storage WRITE permission to the bucket.

    Google Cloud Storage must have WRITE permission to create and store your logs as new objects. To grant Google Cloud Storage WRITE access to your bucket, grant the "cloud-storage-analytics@google.com" group write access with the following command:

    gsutil acl ch -g cloud-storage-analytics@google.com:W gs://my_logs
  3. Log objects will have the default object acl of the log bucket. You can set the default object acl of the log bucket using gsutil. For example, to set the default object acl to project-private:

    gsutil defacl set project-private gs://my_logs
  4. Enable logging for your bucket.

    You can enable logging for your bucket using the logging command:

    gsutil logging set on -b gs://my_logs [-o log_object_prefix ] gs://bucket_name

    Optionally, you can set the log_object_prefix object prefix for your log objects. The object prefix forms the beginning of the log object name. It can be at most 900 characters and must be a valid object name. By default, the object prefix is the name of the bucket for which the logs are enabled.

XML API

  1. Create a bucket to store your logs.

    Create a bucket to store your logs using the following request:

    PUT /my_logs HTTP/1.1
    Host: storage.googleapis.com
    
  2. Set permissions to allow Google Cloud Storage WRITE permission to the bucket.

    Google Cloud Storage must have WRITE permission to create and store your logs as new objects. To grant Google Cloud Storage WRITE access to your bucket, add an ACL entry for the bucket that grants the "cloud-storage-analytics@google.com" group write access. Be sure to include all existing ACLs for the bucket, in addition to the new ACL, in the request.

    PUT /my_logs?acl HTTP/1.1
    Host: storage.googleapis.com
    
    <AccessControlList>
      <Entries>
        <Entry>
          <Scope type="GroupByEmail">
            <EmailAddress>cloud-storage-analytics@google.com</EmailAddress>
          </Scope>
         <Permission>WRITE</Permission>
        </Entry>
        <!-- include other existing ACL entries here-->
      </Entries>
    </AccessControlList>
    
  3. Enable logging for your bucket.

    You can enable logging for your bucket using the logging query parameter:

    PUT /bucket_name?logging HTTP/1.1
    Host: storage.googleapis.com
    
    <Logging>
        <LogBucket>my_logs</LogBucket>
        <LogObjectPrefix>log_object_prefix</LogObjectPrefix>
    </Logging>
    

JSON API

  1. Create a bucket to store your logs.

    Create a bucket to store your logs using the following request:

    POST /storage/v1beta2/b?project=project-id
    Host: www.googleapis.com
    
    {
      "name": "my_logs"
    }
    
  2. Set permissions to allow Google Cloud Storage WRITE permission to the bucket.

    Google Cloud Storage must have WRITE permission to create and store your logs as new objects. To grant Google Cloud Storage WRITE access to your bucket, add an ACL entry for the bucket that grants the "cloud-storage-analytics@google.com" group write access. You can do this with the following request to the BucketAccessControls resource for the logging bucket:

    POST /storage/v1beta2/b/my_logs/acl
    Host: www.googleapis.com
    {
     "entity": "group-cloud-storage-analytics@google.com",
     "role": "WRITER"
    }
    
  3. Enable logging for your bucket.

    You can enable logging for your bucket using the following request:

    PATCH /storage/v1beta2/b/bucket_name
    Host: www.googleapis.com
    
    {
     "logging": {
      "logBucket": "my_logs",
      "logObjectPrefix": "log_object_prefix"
     }
    }
    
Back to top

Checking Logging Status

gsutil

Using gsutil, you can check logging by using the logging get command:

gsutil logging get gs://bucket_name

You can also save the logging configurations to a file:

gsutil logging get gs://bucket_name > your_logging_configuration_file

If logging is enabled, the server sends the <Logging> XML element in response. A response might look similar to the following:

<?xml version="1.0" ?>
<Logging>
    <LogBucket>
        my_logs
    </LogBucket>
    <LogObjectPrefix>
        log_object_prefix
    </LogObjectPrefix>
</Logging>

If logging is not enabled, an empty <Logging> element is returned:

<?xml version="1.0" ?>
<Logging/>

XML API

Using the Google Cloud Storage XML API, you can send a GET request for the bucket's logging configuration as shown in the following example.

GET /bucket_name?logging HTTP/1.1
Host: storage.googleapis.com

If logging is enabled, the server sends the configuration in the response. A response might look similar to the following:

<?xml version="1.0" ?>
<Logging>
    <LogBucket>
        my_logs
    </LogBucket>
    <LogObjectPrefix>
        log_object_prefix
    </LogObjectPrefix>
</Logging>

If logging is not enabled, an empty configuration is returned:

<?xml version="1.0" ?>
<Logging/>

JSON API

Using the Google Cloud Storage JSON API, you can send a GET request for the bucket's logging configuration as shown in the following example.

GET /storage/v1beta2/b/bucket_name?fields=logging
Host: www.googleapis.com

If logging is enabled, the server sends the configuration in the response. A response might look similar to the following:

{
 "logging": {
  "logBucket": "my_logs",
  "logObjectPrefix": "log_object_prefix"
  }
}

If logging is not enabled, an empty configuration is returned:

{
}
Back to top

Downloading Logs

Storage logs are generated once a day and contain the storage usage for the previous day. They are typically created before 10:00 am PST.

Usage logs are generated hourly when there is activity to report in the monitored bucket. Usage logs are typically created 15 minutes after the end of the hour. Here are several things to keep in mind when working with usage logs:

  • Any log processing of usage logs should take into account the possibility that they may be delivered later than 15 minutes after the end of an hour.
  • Usually, hourly usage log object(s) contain records for all usage that occurred during that hour. Occasionally, an hourly usage log object contains records for an earlier hour, but never for a later hour.
  • Google Cloud Storage may write multiple log objects for the same hour.
  • Occasionally, a single record may appear twice in the usage logs. While we make our best effort to remove duplicate records, your log processing should be able to remove them if it is critical to your log analysis. You can use the s_request_id field to detect duplicates.

Access to your logs is controlled by the ACL on the log objects. Log objects have the default object acl of the log bucket.

The easiest way to download your logs and storage data is either through the Google Developers Console or using the gsutil tool. Your access logs are in CSV format and have the following naming convention:

gs://<bucket_name>/<object_prefix>_usage_<timestamp>_<id>_v0

For example, the following is an access logs object for a bucket named gs://finance-data, created on June 18, 2013 at 14:00 UTC and stored in the bucket gs://my_logs:

gs://my_logs/finance-data_usage_2013_06_18_14_00_00_1702e6_v0

Storage data logs are named using the following convention:

gs://<bucket_name>/<object_prefix>_storage_<timestamp>_<id>_v0

For example, the following is a storage data logs object for the same bucket on June 18, 2013:

gs://my_logs/finance-data_storage_2013_06_18_07_00_00_1702e6_v0

gsutil

To download logs using gsutil, run the following command:

gsutil cp <logs_object> <destination_uri>

Google Developers Console

To download logs using Google Developers Console:

  1. Log in to the Google Developers Console.
  2. Select the project that contains the logs.
  3. Click on the Google Cloud Storage service.
  4. Select your log bucket.
  5. Download or view your logs by clicking on the appropriate log object.
Back to top

Anaylyzing Logs in BigQuery

To query your Google Cloud Storage usage and storage logs, you can use Google BiqQuery which enables fast, SQL-like queries against append-only tables. The BigQuery Command-Line Tool (bq) is a Python-based tool that allows you to access BigQuery from the command line. For information about downloading and using bq, see the bq Command-Line Tool reference page.

Loading Logs into BigQuery

  1. Select a default project.

    For details about selecting a project, see Working With Projects.

  2. Create a new dataset.
    $ bq mk storageanalysis
    Dataset 'storageanalysis' successfully created.
    

    List the datasets in the project:

    $ bq ls
    
      datasetId
    -----------------
     storageanalysis
    
  3. Save the usage and storage schemas to your local computer for use in the load command.

    You can find the schemas to use at these locations: cloud_storage_usage_schema_v0 and cloud_storage_storage_schema_v0. The schemas are also described in the section Access and Storage Logs Format.

  4. Load the access logs into the dataset.
    $ bq load --skip_leading_rows=1 storageanalysis.usage \
              gs://my_logs/bucket_usage_2014_01_15_14_00_00_1702e6_v0 \
              ./cloud_storage_usage_schema_v0.json
    $ bq load --skip_leading_rows=1 storageanalysis.storage \
              gs://my_logs/bucket_storage_2014_01_05_14_00_00_091c5f_v0 \
              ./cloud_storage_storage_schema_v0.json
    

    These commands do the following:

    • Load usage and storage logs from the bucket my_logs.
    • Create tables usage and storage in the dataset storageanalysis.
    • Read schema data (.json file) from the same directory where the bq command runs.
    • Skip the first row of each log file because it contains column descriptions.

    Because this was the first time you ran the load command in the example here, the tables usage and storage were created. You could continue to append to these tables with subsequent load commands with different access log file names or using wildcards. For example, the following command appends data from all logs that start with "bucket_usuage_2014", to the storage table:

    $ bq load --skip_leading_rows=1 storageanalysis.usage \
              gs://my_logs/bucket_usage_2014* \
              ./cloud_storage_usage_schema.json
    

    When using wildcards, you might want to move logs already uploaded to BiqQuery to another directory ( e.g., gs://my_logs/processed) to avoid uploading data from a log more than once.

BiqQuery functionality can also be accessed through the BigQuery Browser Tool. With the browser tool, you can load data through the create table process.

For additional information about loading data from Google Cloud Storage, including programmatically loading data, see Loading data from Google Cloud Storage.

Modifying the Access Log Schema

In some scenarios, you may find it useful to pre-process access logs before loading into BigQuery. For example, you can add additional information to the access logs to make your query analysis easier in BigQuery. In this section, we'll show how you can add the file name of each storage access log to the log. This requires modifying the existing schema and each log file.

  1. Modify the existing schema, cloud_storage_storage_schema_v0, to add file name as shown below. Give the new schema a new name, for example, cloud_storage_storage_schema_custom.json, to distinguish from the original.
    [  {"name": "bucket", "type": "string", "mode": "REQUIRED"},
       {"name": "storage_byte_hours","type": "integer","mode": "REQUIRED"},
       {"name": "filename","type": "string","mode": "REQUIRED"}
    ]
    
  2. Pre-process storage access log files based on the new schema, before loading them into BigQuery.

    For example, the following commands can be used in a Linux/Mac OS X or Windows (Cygwin) environment:

    gsutil cp gs://my_logs/bucket_storage* .
    for f in bucket_storage*; do sed -i -e "1s/$/,\"filename\"/" -e "2s/$/,\""$f"\"/" $f; done
    

    The gsutil command copies the files into your working directory. The second command loops through the log files and adds "filename" to the description row (first row) and the actual file name to the data row (second row). Here's an example of a modified log file:

    "bucket","storage_byte_hours","filename"
    "example-bucket","5532482018","bucket_storage_2014_01_05_08_00_00_021fd_v0"
    
  3. When you load the storage access logs into BigQuery, load your locally modified logs and use the customized schema.
    for f in bucket_storage*; \
      do ./bq.py load --skip_leading_rows=1 storageanalysis.storage $f ./cloud_storage_storage_schema_custom.json; done
    

Querying Logs in BigQuery

Once your logs are loaded into BigQuery, you can query your access logs to return information about your logged bucket(s). The following example shows you how to use the bq tool in a scenario where you have access logs for a bucket over several days and you have loaded the logs as shown in Loading access logs into BigQuery. You can also execute the queries below using the BigQuery Browser Tool.

  1. In the bq tool, enter the interactive mode.
    $ bq shell
    
  2. Run a query against the storage log table.

    For example, the following query shows how the storage of a logged bucket changes in time. It assumes that you modified the storage access logs as described in Modifying the Access Log Schema and that the log files are named "log_storage_*".

    project-name>SELECT SUBSTRING(filename, 13, 10) as day, storage_byte_hours/24 as size FROM [storageanalysis.storage] ORDER BY filename LIMIT 100
    

    Example output from the query:

    Waiting on bqjob_r36fbf5c164a966e8_0000014379bc199c_1 ... (0s) Current status: DONE
    +------------+----------------------+
    |    day     |         size         |
    +------------+----------------------+
    | 2014_01_05 | 2.3052008408333334E8 |
    | 2014_01_06 | 2.3012297245833334E8 |
    | 2014_01_07 | 3.3477797120833334E8 |
    | 2014_01_08 | 4.4183686058333334E8 |
    +-----------------------------------+
    

    If you did not modify the schema and are using the default schema, you can run the following query:

    project-name>SELECT storage_byte_hours FROM [storageanalysis.storage] LIMIT 100
    
  3. Run a query against the usage log table.

    For example, the following query shows how to summarize the request methods that clients use to access resources in the logged bucket.

    project-name>SELECT cs_method, COUNT(*) AS count FROM [storageanalysis.usage] GROUP BY cs_method
    

    Example output from the query:

    Waiting on bqjob_r1a6b4596bd9c29fb_000001437d6f8a52_1 ... (0s) Current status: DONE
    +-----------+-------+
    | cs_method | count |
    +-----------+-------+
    | PUT       |  8002 |
    | GET       | 12631 |
    | POST      |  2737 |
    | HEAD      |  2173 |
    | DELETE    |  7290 |
    +-----------+-------+
    
  4. Quit the interactive shell of the bq tool.
    project-name> quit
    
Back to top

Disabling Logging

gsutil

Using gsutil, disable logging with the disablelogging command:

gsutil disablelogging gs://bucket_name

To check that logging was successfully disabled, perform a logging get request:

gsutil logging get gs://bucket_name

If logging is disabled, the response should be an empty <Logging> element:

<?xml version="1.0" ?>
<Logging/>

XML API

Using the Google Cloud Storage XML API, disable logging by sending a PUT request to the bucket's logging configuration as shown in the following example:

PUT /bucket_name?logging HTTP/1.1
Host: storage.googleapis.com

<Logging/>

JSON API

Using the Google Cloud Storage JSON API, disable logging by sending a PATCH request to the bucket's logging configuration as shown in the following example.

PATCH /bucket_name?logging HTTP/1.1
Host: storage.googleapis.com

{
 "logging": null
}
Back to top

Access and Storage Log Format

The access logs and storage data files can provide an overwhelming amount of information. You can use the following tables to help you identify all the information provided in these logs.

Access log fields:

Field Type Description
time_micros integer The time that the request was completed, in microseconds since the Unix epoch.
c_ip string The IP address from which the request was made. The "c" prefix indicates that this is information about the client.
c_ip_type integer The type of IP in the c_ip field:
  • A value of 1 indicates an IPV4 address.
  • A value of 2 indicates an IPV6 address.
c_ip_region string Reserved for future use.
cs_method string The HTTP method of this request. The "cs" prefix indicates that this information was sent from the client to the server.
cs_uri string The URI of the request.
sc_status integer The HTTP status code the server sent in response. The "sc" prefix indicates that this information was sent from the server to the client.
cs_bytes integer The number of bytes sent in the request.
sc_bytes integer The number of bytes sent in the response.
time_taken_micros integer The time it took to serve the request in microseconds.
cs_host string The host in the original request.
cs_referer string The HTTP referrer for the request.
cs_user_agent string The User-Agent of the request. The value is GCS Lifecycle Management for requests made by lifecycle management.
s_request_id string The request identifier.
cs_operation string The Google Cloud Storage operation e.g. GET_Object.
cs_bucket string The bucket specified in the request. If this is a list buckets request, this can be null.
cs_object string The object specified in this request. This can be null.

Storage data fields:

Field Type Description
bucket string The name of the bucket.
storage_byte_hours integer Average size in byte-hours over a 24 hour period of the bucket. To get the total size of the bucket, divide byte-hours by 24.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.