Execution plan operations – joins

It’s about time I picked this series up again.

I’m not going to go into too much detail on joins. There are some very good articles elsewhere on joins. The important thing to notice about joins, in the context of an execution plan, is that there are six logical join operators and three physical join operators. The logical operators are what you ask for in the context of the query, the physical operators are what the optimiser picks to do the join.

The six logical operators are:

  • Inner Join
  • Outer Join
  • Cross Join
  • Cross Apply (new in SQL 2005)
  • Semi-Join
  • Anti Semi-Join

Craig Freedman wrote a long article on the logical join operators – Introduction to Joins

The semi-joins are the exception, in that they cannot be specified in a query. Nonetheless, they are present in disguise. They’re the logical operators for EXISTS, IN, NOT EXISTS and NOT IN. They’re used when matching is required, but not a complete join.

The three physical operators are what the optimiser uses to evaluate the logical join. There are various conditions that affect the physical operator the will be used for a particular join. The three operators are:

  • Nested Loop Join
  • Merge Join
  • Hash join

Nested Loop

The nested loop join works by looping through all the rows of one input and for each row looping through all the rows of the other input, looking for matches. The nested loop join works best when one or both of the input row sets is small. Since the input that is chosen as the second is read as many times as there are rows in the outer, this join can get very expensive as the size of the inputs increases.

For more detail on the nested loop, see Craig Freedman’s post

Merge Join

The merge join works by running through the two inputs, comparing rows and outputting matched rows. Both inputs must be sorted on the joining columns for this join to be possible. Since both inputs are only read once, this is an efficient join for larger row sets. This efficiency may be offset by the sorted requirement. If the join column is not indexed so as to retrieve the data already sorted, then an explicit sort is required.

For more detail on the merge join, see Craig Freedman’s post

Hash Join

The hash join is one of the more expensive join operations, as it requires the creation of a hash table to do the join. That said, it’s the join that’s best for large, unsorted inputs. It is the most memory-intensive of any of the joins

The hash join first reads one of the inputs and hashes the join column and puts the resulting hash and the column values into a hash table built up in memory. Then it reads all the rows in the second input, hashes those and checks the rows in the resulting hash bucket for the joining rows.

For more detail on the hash join, see Craig Freedman’s post

That’s pretty much that on joins. Next up, aggregates.

10 Comments

  1. Pingback: Some interesting articles.. « QuantuMatrix’s Weblog

  2. Garry

    Is the hash join not similar to the nested loop in that it compares each hashed value to each row in the hash table ? ie does it not have to loop to do this ? Trying to understand the difference besides the fact that one hashes the values whereas the other compares the actual values.

    Reply
  3. Gail (Post author)

    Not quite

    The one resultset is used to form a hash table (wikipedia should have a good coverage of how a hash table works), then the second resultset is hashed to figure out which bucket the value belongs in. It’s a single pass over each resultset.

    Nested loop is an O(n^2) algorithm, hash search is much lower than that.

    Reply
  4. Tim

    Great intro – just what I needed.

    Reply
  5. behnaz

    would you tell me what is the exclamation mark some times is seen in nested loops inner join pls?

    Reply
  6. Gail (Post author)

    If you hover the mouse over, the tooltip will tell you what the problem is.

    Reply
  7. Pingback: Merge join Vs Hash join Vs Nested loop join « SqlserverBlogForum

  8. SQLDuck

    I know this is old, but thanks for posting this, and for all the info you’ve contributed to the community in various forms. So much good stuff. 🙂

    Reply
  9. Anuj sharma

    Which one is Faster in which conditions ?

    Are their performance depend on
    1.)volume of data in 2 joining tables
    2.)the way rows are sorted in tbl
    3.)indexed joining columns

    Reply
    1. Gail (Post author)

      Which of what is faster?

      For logical join types, you choose the type that gets you the results you want. For the physical, the query optimiser chooses the type based on a bunch of things, some of which I discussed in this post.

      Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.