SQL Finding Duplicates across Two Tables One Column

This post discusses the query needed to check if there are any duplicate values in a certain column if you were to add the records in the first table to the records in the second table.

As an example, suppose you had a History table that shows all records over a period of time. Periodically you need to add records to that table from a table called New. However, mistakes can be made in the New table. Before you add those records in the New table to the History table you want to run a query that checks the History table for any values in a particular column that are also found in the New table. There is a column in both tables called UniqueNumber. That number should always be unique. If a duplicate is found, you know that you have a bad data in the New table and you will need to revert back to the source of the data in the New table and find out what happened.

MS Access

Using Access gives us a good visual representation of the query we need. First, here are the two data table’s data. The first table is the History table, with column names of: FirstName, Amount, UniqueNumber.

John      12        12987     
Frank     121       23456     
Sally     647       11211     
Bob       24        98700     
Sally     7876      22222

This table is the New table.

Sally     345       12555     
Frank     232       23456

Here is the query in MS Access.

Here is the SQL Language code, in Access.

SELECT New.FirstName, New.Amount, New.UniqueNumber
FROM New LEFT JOIN History ON New.[UniqueNumber] = History.[UniqueNumber]
WHERE (((History.UniqueNumber) Is Not Null));

Here is the output in MS Access.

Duplicates Defined with Two Columns

Let’s change the conditions. Suppose no one person can use the same UniqueNumber more than once, and further suppose that it is fine if more than one person has used the same UniquenNumber. So, it’s okay if both Polly and Jill have used the same UniqueNumber 23418 at some point. However, a person cannot use the same UniqueNumber more that one time, ever. A duplicate is now defined as a combination of Name and UniqueNumber. I like to think about it this way: If you concatenate the Name with UniqueNumber you get something like John12987 or Frank23456. Now you are comparing two tables with the concatenated string and a duplicate is found when the concatenated strings are equivalent.

BeginCodingNow.com

for data analysts & software developers

for data analysts & software developers

SQL Finding Duplicates across Two Tables One Column

MS Access

Duplicates Defined with Two Columns