Remove Duplicates

RemoveDuplicates

The Remove Duplicates transformation command removes rows with duplicate values in its input. The data is sorted either by a field you choose or by row number in the data source if no sort order is specified. The transform then compares the values of the selected comparison fields and keeps the top row based on the sort order, and removes the rest of the rows.

Remove Duplicates Illustration

Configuration

After you add a Remove Duplicates ETL command to the ETL designer and connect an input to it, you need to select:

The comparison fields to check for duplicates.
For each comparison field, choose whether the comparison should be case-sensitive or not.
Specify the sort order used before comparison. The row at the top after the sort will be kept, and the rest of the rows will be removed.

Example

Remove Duplicates

In this example, the Leads CSV contains duplicate leads from different sources, we used the Remove Duplicates command to keep one record per lead. The duplicates are identified by the email address and sorted by timestamp in descending order to keep the latest contact.