Editor's Note: Romain Vialard is a Google Apps Script Top Contributor. He has contributed interesting articles and blog posts about Apps Script. - Jan Kleinert
December 2011, updated March 2018
Goal
This tutorial shows how to avoid duplicates when you want to automate the process of copying data in Google Workspace and specifically how to remove duplicate rows in spreadsheet data.
Google Apps Script lets you copy email attachments from Gmail to a collection in Google Docs, sync spreadsheet data with a list page in Google Sites or with Google Calendar or with your contact list. The example shown below can be used to avoid duplicates in each of those cases.
Time to Complete
Approximately 5 minutes.
Prerequisites
Before beginning this tutorial, you should feel comfortable using the Script Editor and have experience using the most basic Spreadsheet functions.
Running a simple example
- Open a new spreadsheet in Google Docs or use an existing spreadsheet containing duplicates.
If the spreadsheet is empty, add a few rows of data (for example, a list of contacts, parts inventory, etc.) and duplicate some of them.
Choose the menu Tools > Script Editor.
Copy and paste the following script:
Save the script.
Select the function
removeDuplicates
in the function dropdown list and click Run.Take a look at your spreadsheet. There should be no more duplicates.
How the script works
First, we do a single call to the spreadsheet to retrieve all the data. We could have read our sheet row by row, but JavaScript operations are considerably faster than talking to other services like Spreadsheet. The fewer calls you make, the faster it goes. This is important because each script execution has a maximum run time of 6 minutes.
Our variable data
is a JavaScript 2-dimensional array that contains all the
values in our sheet. newData
is an empty array where we put all rows
which are not duplicates.
The first for loop iterates over each row in the data
2-dimensional array.
For each row, the second loop tests if another row with matching data already
exists in the newData
array. If it is not a duplicate, the row is pushed
into the newData
array.
To finish, the script deletes the existing content of the sheet and inserts
the content of the newData
array.
Variation
In the example above, the script finds a duplicate when there are two identical rows, but you may also want to remove rows with matching data in just one or two of the columns. To do that, you can change the conditional statement.
Instead of:
if(row.join() == newData[j].join()){
duplicate = true;
}
You can write:
if(row[0] == newData[j][0] && row[1] == newData[j][1]){
duplicate = true;
}
This conditional statement finds duplicates each time two rows have the same data in the first and the second column of the sheet.
Reuse the method
Each time you need to copy data from one point to another you may want to check if this data has already been copied. For example, if you want to sync a label in Gmail to a collection in Google Docs to automatically retrieve important attachments, the solution shown above can be used.
- First, retrieve the data you want to copy (from Gmail, Calendar, a spreadsheet, etc.)
- Then, retrieve the data already stored in the targeted folder, spreadsheet, site, etc.
- For each item you want to copy:
- Make the assumption that you want to copy this item by the use of
a boolean, for example:
var toCopy = true;
. - For each item stored in the targeted folder, check if it looks similar to the item you want to copy.
- If it looks like the item has already been copied, then you don't
want to copy it again, so set
toCopy = false;
. - Once you have read all items stored in the targeted folder, check
if
toCopy
is equal to true or not. If it's true, then copy the item.
- Make the assumption that you want to copy this item by the use of
a boolean, for example:
There are other solutions to avoid duplicates. For example, you can tag each item as ‘already processed’ once they have been copied. An example can be found in this tutorial: Sending emails from a Spreadsheet. Or, you can remove the item from the list of items to copy.
Summary
Congratulations, you've completed this tutorial. You should now be able to avoid duplicates when manipulating data with Apps Script.