Google Apps Script

Tutorial: Removing Duplicate Rows in a Spreadsheet

Editor's Note: Romain Vialard is a Google Apps Script Top Contributor. He has contributed interesting articles and blog posts about Apps Script. - Jan Kleinert

December 2011, updated March 2012

Goal

This tutorial shows how to avoid duplicates when you want to automate the process of copying data in Google Apps and specifically how to remove duplicate rows in spreadsheet data.

Google Apps Script lets you copy email attachments from Gmail to a collection in Google Docs, sync spreadsheet data with a list page in Google Sites or with Google Calendar or with your contact list. The example shown below can be used to avoid duplicates in each of those cases.

Time to Complete

Approximately 5 minutes

Prerequisites

Before beginning this tutorial, you should feel comfortable using the Script Editor and have experience using the most basic Spreadsheet functions.

Overview

This tutorial is divided into the following sections:

  1. Running a simple example
  2. How the script works
  3. Variation
  4. Reuse the method
  5. Summary

Running a simple example

  1. Open a new spreadsheet in Google Docs or use an existing spreadsheet containing duplicates.
  2. If the spreadsheet is empty, add a few rows of data (e.g. a list of contacts, parts inventory, etc.) and duplicate some of them.


  3. Choose the menu Tools > Script Editor.
  4. Copy and paste the following script:
    function removeDuplicates() {
      var sheet = SpreadsheetApp.getActiveSheet();
      var data = sheet.getDataRange().getValues();
      var newData = new Array();
      for(i in data){
        var row = data[i];
        var duplicate = false;
        for(j in newData){
          if(row.join() == newData[j].join()){
            duplicate = true;
          }
        }
        if(!duplicate){
          newData.push(row);
        }
      }
      sheet.clearContents();
      sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
    }
  5. Save the script.
  6. Select the function removeDuplicates in the function combo box and click "Run".
  7. Take a look at your spreadsheet. There should be no more duplicates.


How the script works

First, we do a single call to the spreadsheet to retrieve all the data. We could have read our sheet row by row, but JavaScript operations are considerably faster than talking to other services like Spreadsheet. The fewer calls you make, the faster it will go. This is important because each script execution has a maximum run time of 6 minutes.

var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getDataRange().getValues();

Our variable data is a JavaScript 2-dimensional array that contains all the values in our sheet. newData is an empty array where we will put all rows which are not duplicates.

var newData = new Array();

The first for loop iterates over each row in the data 2-dimensional array. For each row, the second loop tests if another row with matching data already exists in the newData array. If it is not a duplicate, the row is pushed into the newData array.

if(!duplicate){
  newData.push(row);
}

To finish, the script deletes the existing content of the sheet and inserts the content of the newData array.

sheet.clearContents();
sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);

Variation

In the example above, the script finds a duplicate when there are two identical rows, but you may also want to remove rows with matching data in just one or two of the columns. To do that, you can change the conditional statement.

Instead of:

if(row.join() == newData[j].join()){
  duplicate = true;
}

You can write:

if(row[0] == newData[j][0] && row[1] == newData[j][1]){
  duplicate = true;
}

This conditional statement will find duplicates each time two rows have the same data in the first and the second column of the sheet.

Reuse the method

Each time you need to copy data from one point to another you may want to check if this data has already been copied. For example, if you want to sync a label in Gmail to a collection in Google Docs to automatically retrieve important attachments, the solution shown above can be used.

  1. First, retrieve the data you want to copy (from Gmail, Calendar, a spreadsheet, etc.)
  2. Then, retrieve the data already stored in the targeted folder, spreadsheet, site, etc.
  3. For each item you want to copy:
    • Make the assumption that you want to copy this item by the use of a boolean, for example, var toCopy = true;.
    • For each item stored in the targeted folder, check if it looks similar to the item you want to copy.
    • If it looks like the item has already been copied, then you don't want to copy it again, so set toCopy = false;.
    • Once you have read all items stored in the targeted folder, check if toCopy is equal to true or not. If it's true, then copy the item.

There are other solutions to avoid duplicates. For example, you can tag each item as ‘already processed’ once they have been copied. An example can be found in this tutorial: Sending emails from a Spreadsheet. Or, you can remove the item from the list of items to copy. The tutorial Embedding Google DocList in a Google Site uses the example provided in this tutorial to avoid duplicates in a list page.

Summary

Congratulations, you've completed this tutorial. You should now be able to avoid duplicates when manipulating data with Apps Script.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.