SQL Joins Introduction


This entry is part 1 of 2 in the series SQL Joins

For this post we assume you have some basic knowledge of SQL. A JOIN clause is used to combine rows from two or more tables, based on a related column between them. There are four types of joins: inner join, left join, right join and full outer join. The website w3schools has an introductory series of articles on SQL joins.

The graphics below are from w3schools.com website which show the four types of joins using Venn diagrams.

The four types of joins in Microsoft SQL Server are:

  • Inner join or Equi join
  • Self Join
  • Outer Join
  • Cross join

Let’s consider a specific example that we can set up and define in SQL Server. Consider the Venn diagram below. Set1 has the numbers 1, 2, 3, 4 and 5. Set2 has the numbers 4, 5, 6, 7 and 8. We can start asking questions such as what SQL query will give us a result set of {1, 2, 3}? Could we use a left join and then further eliminate all those that are in the intersection? To eliminate them we could use a WHERE clause.

Inner Join

In SQL Server, the inner join is the default. Therefore when you are writing your SQL code, you may just type join and SQL Server will default it to an inner join. Here is the code for the inner join in SQL Server. The result set is the numbers 4 and 5. An inner join returns results that are in both tables.

select Set1.Number from Set1
inner join Set2 on Set1.Number = Set2.Number

Here are the results.

For information on joins in R language, have a look at the book R for Data Science, chapter 13 Relational Data.

Left Join

Here below is the code that gives the result {1, 2, 3}.

select Set1.Number as S1Number from Set1
left join Set2 on Set1.Number = Set2.Number
where Set2.Number is null

Here is a similar query but we can see the results a little more clearly. The results from SSMS are shown below the code.

select Set1.Number S1Number, Set2.Number S2Number from Set1
left join Set2 on Set1.Number = Set2.Number
where Set2.Number is null

Here is the query and the results of the left join without the where clause and without displaying any Set2 data..

select Set1.Number as S1Number from Set1
left join Set2 on Set1.Number = Set2.Number

The result set is all of the rows in Set1: {1, 2, 3, 4, 5}.

Here is a query that does display the Set2 data, and without the where clause.

select Set1.Number as S1Number, Set2.Number as S2Number from Set1
left join Set2 on Set1.Number = Set2.Number

For an example of a left join in an example using geographic data (country, state, city) please have a look at SQL Joins using Geographic Data

Cross Join

The SQL CROSS JOIN produces a result set which is the number of rows in the first table multiplied by the number of rows in the second table if no WHERE clause is used along with CROSS JOIN. This kind of result is called as Cartesian Product. In our example each table has 5 rows so the result set will have 25 rows as long as there is no where clause. I added an order by clause so we could more easily see the results.

select Set1.Number as S1Number, Set2.Number as S2Number 
from Set1 cross join Set2 
order by S1Number, S2Number

Self Join

A self join is a regular join, but the table is joined with itself. Imaging taking you table, making a copy of it and sticking the copy beside the original table. Why? For what purpose would you need to do this? Imagine a table that lists items that somehow relate to each other. Perhaps one item has another item that comes before (time) it or falls under it (hierarchy) or reports to it (chain of command).

To list a table two times in the same query, you must provide a table alias for at least one of instance of the table name. This table alias helps the query processor determine whether columns should present data from the right or left version of the table.

One practical example is an employee table that has an EmployeeId column and a ReportsTo column. The reports to column refers back to the EmployeeId column. Here we have a table that relates to itself.

Here is the SQL query.

select e.EmpName as Employee, 
       m.EmpName as Manager 
from Employee e
inner join Employee m on e.ReportsTo = m.EmployeeId

We could have used a left join if we wanted to include all employees in the results.

select e.EmpName as Employee, 
       m.EmpName as Manager 
from Employee e
left join Employee m on e.ReportsTo = m.EmployeeId

Here are the results.

YouTube Tutorial

Here is a video on SQL joins called SQL Joins Explained |¦| Joins in SQL |¦| SQL Tutorial. Here is another one called SQL Joins Examples |¦| Joins in SQL |¦| SQL Tutorial.

Series NavigationSQL Server Multi-Join Queries >>