Outputting Data from the Data Export API to CSV Format

Alexander Lucas, Google Analytics API Team – August 2010


Introduction

This article shows you how to take data from any query made to the Google Analytics Data Export API and output the results to popular CSV format. This is one of the most common tasks people perform with Analytics data pulled from the Data Export API, so automating the process is an easy way to save tons of time on a regular basis. In addition, once you have some code for printing out CSV documents from queries, you'll be able to integrate this into larger projects, like automatic report generators, mailers and "export" functions for custom dashboards you've written.

Before You Start

You'll get the most out of this article if you have the following:

Program Overview

The code covered in this article will do the following:

  1. Enable choosing at runtime whether the code prints to the console or to a file stream.
  2. Given a DataFeed object as a parameter, print the data out in CSV format:
    • Print row headers.
    • Print data rows, where each DataEntry makes up one row in the resulting output.
    • Run each value through a sanitizing method for CSV-safe output.
  3. Write a "Sanitizer" method that makes all input CSV-Safe.
  4. Provide you with a Java class that can take any Data Export API query and turn it into a CSV file.

Back to Top

Allow for Configurable Output Streams

The first thing to do is set up a configurable output stream for your class to print to. This way any code using your class can decide whether output should go to standard out or directly to a file. All you need to do here is set up getter/setter method for a PrintStream object. That will be the target of all printing done by the class.

private PrintStream printStream = System.out;

public PrintStream getPrintStream() {
  return printStream;
}

public void setPrintStream(PrintStream printStream) {
  this.printStream = printStream;
}

Setting the output to a file is also very easy. One needs only the filename to create a PrintStream object for that file.

FileOutputStream fstream = new FileOutputStream(filename);
PrintStream stream = new PrintStream(fstream);
csvprinter.setPrintStream(stream);

Back to Top

Iterating through the data

The first row of the CSV file is the row of column names. Each column represents a dimension or metric from the data feed, so in order to print out this first row, do the following.

  1. Grab the first entry from the feed.
  2. Iterate through a list of dimensions using that entry's getDimensions method.
  3. Print the name of each dimension using the Dimension.getName() method, followed by a comma.
  4. Do the same thing for metrics using the getMetrics() method. Print commas after all but the last metric.

Here's one implementation of the method to print row headers. Note that this code doesn't return a string representing the complete row: it prints to an output stream as it processes values.

public void printRowHeaders(DataFeed feed) {
    if(feed.getEntries().size() == 0) {
      return;
    }

    DataEntry firstEntry = feed.getEntries().get(0);

    Iterator<Dimension> dimensions = firstEntry.getDimensions().iterator();
    while (dimensions.hasNext()) {
      printStream.print(sanitizeForCsv(dimensions.next().getName()));
      printStream.print(",");
    }

    Iterator<Metric> metrics = firstEntry.getMetrics().iterator();
    while (metrics.hasNext()) {
      printStream.print(sanitizeForCsv(metrics.next().getName()));
      if (metrics.hasNext()) {
        printStream.print(",");
      }
    }
    printStream.println();
  }

Printing the "body" of the CSV file (everything below the row of column names) is very similar. There are only two key differences. First, it's not just the first entry being evaluated. The code needs to loop through all the entries in the feed object. Second, instead of using the getName() method to pull the value to be sanitized and printed, use getValue() instead.

public void printBody(DataFeed feed) {
    if(feed.getEntries().size() == 0) {
      return;
    }

    for (DataEntry entry : feed.getEntries()) {
      printEntry(entry);
    }
  }

  public void printEntry(DataEntry entry) {
    Iterator<Dimension> dimensions = entry.getDimensions().iterator();
    while (dimensions.hasNext()) {
      printStream.print(sanitizeForCsv(dimensions.next().getValue()));
      printStream.print(",");
    }

    Iterator<Metric> metrics = entry.getMetrics().iterator();
    while (metrics.hasNext()) {
      printStream.print(sanitizeForCsv(metrics.next().getValue()));
      if (metrics.hasNext()) {
        printStream.print(",");
      }
    }
    printStream.println();
  }

This code breaks up your feed into entries, and your entries into values to be printed to output. But how do we make those values CSV-friendly? What if a value in the "comma-separated-values" file has a comma in it? Those values must be sanitized.

Back to Top

How to Sanitize Data for CSV Compatibility

CSV is a straightforward format. A CSV file represents a data table, and each line represents a row in that table. The values in that row are separated by commas. A new line means a new row of data.

Unfortunately, this straightforward format makes it deceptively easy to throw things off with bad data. What if your value has a comma in it? What if one of your values has line breaks within it? What should happen with space between commas and values? All these situations can be accounted for using a few simple rules.

  • If the string contains a doublequote character, escape it with a second doublequote character.
  • If there's a comma in the string, wrap the whole string in double quotes (unless you already have).
  • If there's a line break in the string, wrap the whole string in double quotes (unless you already have).
  • If the string begins or ends with any sort of white space, wrap the whole string in double quotes (unless you already have).

It can be a little tricky to visualize what your values should look like at this point, so here are some examples. Remember, each example represents a single value, and is escaped as such. For clarity, spaces will be shown as a _ character.

Before After
unchanged unchanged
random " doublequote random "" doublequote
comma,separated "comma,separated"
Two
lines
"Two
lines"
_leading space, and a comma "_leading space, and a comma"
"leading quote, comma """leading quote, comma"
_space, comma
second line, and double quote"
"_space, comma
second line, and double quote"""

The easiest way to handle all these conditions is to write a sanitizing method. Questionable data goes in, and good, clean, CSV values come out. Here's a good sample implementation of just such a method.

private String sanitizeForCsv(String cellData) {
  StringBuilder resultBuilder = new StringBuilder(cellData);

  // Look for doublequotes, escape as necessary.
  int lastIndex = 0;
  while (resultBuilder.indexOf("\"", lastIndex) >= 0) {
    int quoteIndex = resultBuilder.indexOf("\"", lastIndex);
    resultBuilder.replace(quoteIndex, quoteIndex + 1, "\"\"");
    lastIndex = quoteIndex + 2;
  }

  char firstChar = cellData.charAt(0);
  char lastChar = cellData.charAt(cellData.length() - 1);

  if (cellData.contains(",") || // Check for commas
      cellData.contains("\n") ||  // Check for line breaks
      Character.isWhitespace(firstChar) || // Check for leading whitespace.
      Character.isWhitespace(lastChar)) { // Check for trailing whitespace
      resultBuilder.insert(0, "\"").append("\""); // Wrap in doublequotes.
  }
    return resultBuilder.toString();
}

The method starts out by checking for existing double quotes. This should be done before all the other checks, as they involve wrapping a string with double quotes, and it would be bothersome to determine the difference between double quotes that were part of the value and double quotes that were added previously by this method. These are easy to escape— they just need to be doubled up. Every " becomes a "", every "" becomes a """", and so on.

Once that condition has been met, all the other conditions (untrimmed whitespace, commas, and line breaks) can be checked for. If any of them are present, simply wrap the value in double quotes.

Note that the above uses a StringBuilder object, never directly manipulating a raw string. This is because the StringBuilder lets you freely manipulate the string without making interim copies in memory. Because strings in Java are immutable, every minor tweak you make would create a brand new string. When chugging through spreadsheet data, this can add up very quickly.

Number of rows x Values per row x Changes to value = Total New Strings Created
10,000 10 3 300,000

Back to Top

What Next?

Now that you've been given a golden hammer, it's only natural to go hunting for nails. Here's some ideas to get you started.

  • Take a look at the sample application source code which uses this class to print out a CSV file based off a sample query. It takes an output filename as a command line parameter, and prints to standard out by default. Use it as a starting point, build something awesome!
  • CSV is just one of many popular formats. Tweak the class to output to a different format, like TSV, YAML, JSON, or XML.
  • Write an application that generates CSVs and mails them when done. Easy automated monthly reporting!
  • Write an application that lets you enter queries interactively, for a powerful interface for digging around inside your data.