Joins allow us to re-construct our separated database tables back into the relationships that power our applications. Show
In this article, we'll look at each of the different join types in SQL and how to use them. Here's what we'll cover:
(Spoiler alert: we'll cover five different types—but you really only need to know two of them!) What is a join?A join is an operation that combines two rows together into one row. These rows are usually from two different tables—but they don't have to be. Before we look at how to write the join itself, let's look at what the result of a join would look like. Let's take for example a system that stores information about users and their addresses. The rows from the table that stores user information might look like this:
And the rows from the table that stores address information might look like this:
We could write separate queries to retrieve both the user information and the address information—but ideally we could write one query and receive all of the users and their addresses in the same result set. This is exactly what a join lets us do! We'll look at how to write these joins soon, but if we joined our user information to our address information we could get a result like this:
Here we see all of our users and their addresses in one nice result set. Besides producing a combined result set, another important use of joins is to pull extra information into our query that we can filter against. For example, if we wanted to send some physical mail to all users who live in Oklahoma City, we could use this joined-together result set and filter based on the 6 column.Now that we know the purpose of a joins—let's start writing some! Setting up your databaseBefore we can write our queries we need to setup our database. For these examples we'll be using PostgreSQL, but the queries and concepts shown here will easily translate to any other modern database system (like MySQL, SQL Server, etc.). To work with our PostgreSQL database, we can use 7—the interactive PostgreSQL command line program. If you have another database client that you enjoy working with that's fine too.To begin, let's create our database. With PostgreSQL already installed, we can run the command 8 at our terminal to create a new database. I called mine 9:
Next let's start the interactive console by using the command 7 and connect to the database we just made using 1:
Note: I've cleaned up the7 output in these examples to make it easier to read, so don't worry if the output shown here isn't exactly what you've seen in your terminal. I encourage you to follow along with these examples and run these queries for yourself. You will learn and remember far more by working through these examples rather than just reading them. Now onto the joins! 1A, 1B, 1C 2A, 2B, 2C 3A, 3B, 3C 3The simplest kind of join we can do is a 3 or "Cartesian product."This join takes each row from one table and joins it with each row of the other table. If we had two lists—one containing 5 and the other containing 6—the Cartesian product of those two lists would be this:
Each value from the first list is paired with each value of the second list. Let's write this same example as a SQL query. First let's create two very simple tables and insert some data into them:
Our two tables, 7 and 8, just have one column: a simple text field.Now let's join them together with a 3:
This is the simplest type of join we can do—but even in this simple example we can see the join at work: the two separate rows (one from 7 and one from 8) have been joined together to form one row.While this type of join is often discussed as a mere academic example, it does have at least one good use case: covering date ranges. 1A, 1B, 1C 2A, 2B, 2C 3A, 3B, 3C 3 with date rangesOne good use case of a 3 is to take each row from a table and apply it to every day within a date range.Say for example you were building an application that tracked daily tasks—things like brushing your teeth, eating breakfast, or showering. If you wanted to generate a record for every task and for each day of the past week, you could use a 3 against a date range.To make this date range, we can use the 5 function:
The 6 function takes three parameters.The first parameter is the starting value. In this example we use 7. This returns the current date minus five days—or "five days ago."The second parameter is the current date ( 8).The third parameter is the "step interval"—or how much we want to increment the value each time. Since these are daily tasks we'll use the interval of one day ( 9).Putting it all together, this generates a series of dates starting five days ago, ending today, and going one day at a time. Finally we remove the time portion by casting the output of these values to a date using 0, and we alias this column using 1 to make the output a little nicer.The output of this query is the past five days plus today: 0Going back to our tasks-per-day example, let's create a simple table to hold the tasks we want to complete and insert a few tasks: 1Our 2 table just has one column, 3, and we inserted four tasks into this table.Now let's 3 our tasks with the query to generate the dates: 2(Since our date generation query is not an actual table we just write it as a subquery.) From this query we return the task name and the day, and the result set looks like this: 3Like we expected, we get a row for each task for every day in our date range. The 3 is the simplest join we can do, but to look at the next few types we'll need a more-realistic table setup.Creating directors and moviesTo illustrate the following join types, we'll use the example of movies and movie directors. In this situation, a movie has one director, but a movie isn't required to have a director—imagine a new movie being announced but the choice for director hasn't yet been confirmed. Our 6 table will store the name of each director, and the 7 table will store the name of the movie as well as a reference to the director of the movie (if it has one).Let's create those two tables and insert some data into them: 4We have five directors, five movies, and three of those movies have directors assigned to them. Director ID 1 has two movies, and director ID 2 has one. SELECT * FROM letters CROSS JOIN numbers; 8Now that we have some data to work with let's look at the 8.A 8 has some similarities to a 3, but it has a couple key differences.The first difference is that a 8 requires a join condition.A join condition specifies how the rows between the two tables are related to each other and on what criteria they should be joined together. In our example, our 7 table has a reference to the director via the 4 column, and this column matches the 5 column of the 6 table. These are the two columns that we will use as our join condition.Here's how we write this join between our two tables: 5Notice the join condition we specified that matches the movie to its director: 7.Our result set looks like an odd Cartesian product of sorts: 6The first rows we see are ones where the movie had a director, and our join condition evaluated to true. However, after those rows we see each of the remaining rows from each table—but with 8 values where the other table didn't have a match.Note: if you're unfamiliar with8 values, in this SQL operator tutorial. We also see another difference between the 3 and 8 here. A 8 returns one distinct row from each table—unlike the 3 which has multiple.SELECT generate_series( (CURRENT_DATE - INTERVAL '5 day'), CURRENT_DATE, INTERVAL '1 day' )::DATE AS day; 4The next join type, 4, is one of the most commonly used join types.An inner join only returns rows where the join condition is true. In our example, an inner join between our 7 and 6 tables would only return records where the movie has been assigned a director.The syntax is basically the same as before: 7Our result shows the three movies that have a director: 8Since an inner join only includes rows that match the join condition, the order of the two tables in the join don't matter. If we reverse the order of the tables in the query we get same result: 9 0Since we listed the 6 table first in this query and we selected all columns ( 9), we see the 6 column data first and then the columns from 7—but the resulting data is the same.This is a useful property of inner joins, but it's not true for all join types—like our next type. $ psql psql (11.5) Type "help" for help. john=# \c fcc You are now connected to database "fcc" as user "john". fcc=# 3 / $ psql psql (11.5) Type "help" for help. john=# \c fcc You are now connected to database "fcc" as user "john". fcc=# 4These next two join types use a modifier ( 04 or 05) that affects which table's data is included in the result set.Note: the3 and These joins are used in queries where we want to return all of a particular table's data and, if it exists, the associated table's data as well. If the associated data doesn't exist, we still get back all of the "primary" table's data. It's a query for information about a particular thing and bonus information if that bonus information exists. This will be simple to understand with an example. Let's find all movies and their directors, but we don't care if they have a director or not—it's a bonus: 1The query follows our same pattern as before—we've just specified the join as a 3.In this example, the 7 table is the "left" table.If we write the query on one line it makes this a little easier to see: 2A left join returns all records from the "left" table. A left join returns any rows from the "right" table that match the join condition. Rows from the "right" table that don't match the join condition are returned as 8. 3Looking at that result set, we can see why this type of join is useful for "all of this and, if it exists, some of that" type queries. $ psql psql (11.5) Type "help" for help. john=# \c fcc You are now connected to database "fcc" as user "john". fcc=# 4The 4 works exactly like the 3—except the rules about the two tables are reversed.In a right join, all of the rows from the "right" table are returned. The "left" table is conditionally returned based on the join condition. Let's use the same query as above but substitute 3 for 4: 4 5Our result set now returns every 6 row and, if it exists, the 7 data.All we've done is switch which table we're considering the "primary" one—the table we want to see all of the data from regardless of if its associated data exists. $ psql psql (11.5) Type "help" for help. john=# \c fcc You are now connected to database "fcc" as user "john". fcc=# 3 / $ psql psql (11.5) Type "help" for help. john=# \c fcc You are now connected to database "fcc" as user "john". fcc=# 4 in production applicationsIn a production application, I only ever use 3 and I never use 4.I do this because, in my opinion, a 3 makes the query easier to read and understand.When I'm writing queries I like to think of starting with a "base" result set, say all movies, and then bring in (or subtract out) groups of things from that base. Because I like to start with a base, the 3 fits this line of thinking. I want all of the rows from my base table (the "left" table), and I conditionally want the rows from the "right" table.In practice, I don't think I've ever even seen a 4 in a production application. There's nothing wrong with a 4—I just think it makes the query more difficult to understand.Re-writing $ psql psql (11.5) Type "help" for help. john=# \c fcc You are now connected to database "fcc" as user "john". fcc=# 4If we wanted to flip our scenario above and instead return all directors and conditionally their movies, we can easily re-write the 4 into a 3.All we need to do is flip the order of the tables in the query, and change 05 to 04: 6Note: I like to put the table that is being joined on (the "right" table—in the example above7) first in the join condition ( Filtering using $ psql psql (11.5) Type "help" for help. john=# \c fcc You are now connected to database "fcc" as user "john". fcc=# 3There's two use cases for using a 3 (or 4).The first use case we've already covered: to return all of the rows from one table and conditionally from another. The second use case is to return rows from the first table where the data from the second table isn't present. The scenario would look like this: find directors who don't belong to a movie. To do this we'll start with a 3 and our 6 table will be the primary or "left" table: 6For a director that doesn't belong to a movie, the columns from the 7 table are 8: 8In our example, director ID 3, 4, and 5 don't belong to a movie. To filter our result set just to these rows, we can add a 42 clause to only return rows where the movie data is 8: 9 0And there are our three movie-less directors! It's common to use the 5 column of the table to filter against ( 45), but all columns from the 7 table are 8—so any of them would work.(Since we know that all the columns from the 7 table will be 8, in the query above we could just write 50 instead of 9 to just return all of the director's information.)Using $ psql psql (11.5) Type "help" for help. john=# \c fcc You are now connected to database "fcc" as user "john". fcc=# 3 to find matchesIn our previous query we found directors that didn't belong to movies. Using our same structure, we could find directors that do belong to movies by changing our 42 condition to look for rows where the movie data is not 8: 1 0This may seem handy, but we've actually just re-implemented 4!Multiple joinsWe've seen how to join two tables together, but what about multiple joins in a row? It's actually quite simple, but to illustrate this we need a third table: 56.This table will represent tickets sold for a movie: 3The 56 table just has an 5 and a reference to the movie: 59.We've also inserted two tickets sold for movie ID 1, and one ticket sold for movie ID 3. Now, let's join 6 to 7—and then 7 to 56! 4Since these are inner joins, the order in which we write the joins doesn't matter. We could have started with 56, then joined on 7, and then joined on 6.It again comes down to what you're trying to query and what makes the query the most understandable. In our result set, we'll notice that we've further narrowed down the rows that are returned: 5This makes sense because we've added another 4. In effect this adds another "AND" condition to our query.Our query essentially says: "return all directors that belong to movies that also have ticket sales." If instead we wanted to find directors that belong to movies that may not have ticket sales yet, we could substitute our last 4 for a 3: 6We can see that 70 is now back in the result set: 7This movie didn't have any ticket sales, so it was previously excluded from the result set due to the 4.I'll leave this an Exercise For The Reader™, but how would you find directors that belong to movies that don't have any ticket sales? Join execution orderIn the end, we don't really care in what order the joins are executed. One of the key differences between SQL and other modern programming languages is that SQL is a declarative language. This means that we specify the outcome we want, but we don't specify the execution details—those details are left up to the database query planner. We specify the joins we want and the conditions on them and the query planner handles the rest. But, in reality, the database is not joining three tables together at the same time. Instead, it will likely join the first two tables together into one intermediary result, and then join that intermediary result set to the third table. (Note: This is a somewhat simplified explanation.) So, as we're working with multiple joins in queries we can just think of them as a series of joins between two tables—although one of those tables can get quite large. Joins with extra conditionsThe last topic we'll cover is a join with extra conditions. Similar to a 42 clause, we can add as many conditions as we want to our join conditions.For example, if we wanted to find movies with directors that are not named "John Smith", we could add that extra condition to our join with an 73: 8We can use any operators we would put in a 42 clause in this join condition.We also get the same result from this query if we put the condition in a 42 clause instead: 9There are some subtle differences happening under the hood here, but for the purpose of this article the result set is the same. (If you're unfamiliar with all of the ways you can filter a SQL query, check out the previously mentioned article here.) The reality about writing queries with joinsIn reality, I find myself only using joins in three different ways:
SELECT generate_series(
(CURRENT_DATE - INTERVAL '5 day'),
CURRENT_DATE,
INTERVAL '1 day'
)::DATE AS day;
|