{"id":575,"date":"2010-03-23T16:00:58","date_gmt":"2010-03-23T14:00:58","guid":{"rendered":"https:\/\/www.sqlinthewild.co.za\/?p=575"},"modified":"2010-03-23T16:00:58","modified_gmt":"2010-03-23T14:00:58","slug":"left-outer-join-vs-not-exists","status":"publish","type":"post","link":"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/03\/23\/left-outer-join-vs-not-exists\/","title":{"rendered":"Left outer join vs NOT EXISTS"},"content":{"rendered":"<p>And to wrap up the miniseries on IN, EXISTS and JOIN, a look at NOT EXISTS and LEFT OUTER JOIN for finding non-matching rows.<\/p>\n<p>For previous parts, see<\/p>\n<ul>\n<li><a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2009\/08\/17\/exists-vs-in\/\">In vs Exists<\/a><\/li>\n<li><a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/01\/12\/in-vs-inner-join\/\">In vs Inner Join<\/a><\/li>\n<li><a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/02\/18\/not-exists-vs-not-in\/\">Not in vs Not Exists<\/a><\/li>\n<\/ul>\n<p>I&#8217;m looking at NOT EXISTS and LEFT OUTER JOIN, as opposed to NOT IN and LEFT OUTER JOIN, because, as shown in the <a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/02\/18\/not-exists-vs-not-in\/\">previous part<\/a> of this series, NOT IN behaves badly in the presence of NULLs. Specifically, if there are any NULLs in the result set, NOT IN returns 0 matches.<\/p>\n<p>The LEFT OUTER JOIN, like the NOT EXISTS can handle NULLs in the second result set without automatically returning no matches. It behaves the same regardless of whether the join columns are nullable or not. Seeing as NULL does not equal anything, any rows in the second result set that have NULL for the join column are eliminated by the join and have no further effect on the query.<\/p>\n<p>It is important, when using the LEFT OUTER JOIN \u2026 IS NULL, to carefully pick the column used for the IS NULL check. It should either be a non-nullable column (the primary key is a somewhat classical choice) or the join column (as nulls in that will be eliminated by the join)<\/p>\n<p>Onto the tests<\/p>\n<p>The usual test tables\u2026<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nCREATE TABLE BigTable (\nid INT IDENTITY PRIMARY KEY,\nSomeColumn char(4) NOT NULL,\nFiller CHAR(100)\n)\n\nCREATE TABLE SmallerTable (\nid INT IDENTITY PRIMARY KEY,\nLookupColumn char(4) NOT NULL,\nSomeArbDate Datetime default getdate()\n)\n\nINSERT INTO BigTable (SomeColumn)\nSELECT top 250000\nchar(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) +\nchar(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))\nfrom master.sys.columns a cross join master.sys.columns b\n\nINSERT INTO SmallerTable (LookupColumn)\nSELECT DISTINCT SomeColumn\nFROM BigTable TABLESAMPLE (25 PERCENT)\n-- (3918 row(s) affected)\n<\/pre>\n<p>First without indexes<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">-- Query 1\nSELECT BigTable.ID, SomeColumn\n\tFROM BigTable LEFT OUTER JOIN SmallerTable ON BigTable.SomeColumn = SmallerTable.LookupColumn\n\tWHERE LookupColumn IS NULL\n\n-- Query 2\nSELECT ID, SomeColumn FROM BigTable\nWHERE NOT EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)\n<\/pre>\n<p>Let&#8217;s take a look at the execution plans<\/p>\n<p><a href=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/03\/LeftOuterJoinNotIN_NotIndexed.png\"><img loading=\"lazy\" decoding=\"async\" style=\"border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px\" title=\"LeftOuterJoinNotIN_NotIndexed\" src=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/03\/LeftOuterJoinNotIN_NotIndexed_thumb.png\" border=\"0\" alt=\"LeftOuterJoinNotIN_NotIndexed\" width=\"244\" height=\"143\" \/><\/a><\/p>\n<p><!--more--><\/p>\n<p>The plans are almost the same. There&#8217;s an extra filter in the JOIN and the logical join types are different. Why the different joins?<\/p>\n<p>If we look at the execution plan for the NOT EXISTS, the join type is Right Anti-Semi join (a bit of a mouthful). This is a special join type used by the NOT EXISTS and NOT IN and it&#8217;s the opposite of the semi-join that I discussed back when I looked at <a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/01\/12\/in-vs-inner-join\/\">IN and INNER JOIN<\/a><\/p>\n<p>An anti-semi join is a partial join. It does not actually join rows in from the second table, it simply checks for, in this case, the absence of matches. That&#8217;s why it&#8217;s an <strong>anti<\/strong>-semi join. A semi-join checks for matches, an anti-semi join does the opposite and checks for the absence of matches.<\/p>\n<p>The extra filter in the LEFT OUTER JOIN query is because the join in that execution plan is a complete right join, i.e. it&#8217;s returned matching rows (and possibly duplicates) from the second table. The filter operator is doing the IS NULL filter.<\/p>\n<p>That&#8217;s the major difference between these two. When using the LEFT OUTER JOIN \u2026 IS NULL technique, SQL can&#8217;t tell that you&#8217;re only doing a check for nonexistance. Optimiser&#8217;s not smart enough (yet). Hence it does the complete join and then filters. The NOT EXISTS filters as part of the join.<\/p>\n<p>Technical discussion done, now how did they actually perform?<\/p>\n<blockquote><p>&#8212; Query 1: LEFT OUTER JOIN<br \/>\nTable &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br \/>\nTable &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 15, physical reads 0.<\/p>\n<p>SQL Server Execution Times:<br \/>\nCPU time = 157 ms,\u00a0 elapsed time = 486 ms.<\/p>\n<p>&#8212; Query 2: NOT EXISTS<br \/>\nTable &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br \/>\nTable &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 15, physical reads 0.<\/p>\n<p>SQL Server Execution Times:<br \/>\nCPU time = 156 ms,\u00a0 elapsed time = 358 ms.<\/p><\/blockquote>\n<p>Can&#8217;t make a big deal out of that.<\/p>\n<p>Now, index on the join columns<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">CREATE INDEX idx_BigTable_SomeColumn\nON BigTable (SomeColumn)\n\nCREATE INDEX idx_SmallerTable_LookupColumn\nON SmallerTable (LookupColumn)<\/pre>\n<p>and the same queries<\/p>\n<p><a href=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/03\/LeftOuterJoinNotIN_Indexed.png\"><img loading=\"lazy\" decoding=\"async\" style=\"border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px\" title=\"LeftOuterJoinNotIN_Indexed\" src=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/03\/LeftOuterJoinNotIN_Indexed_thumb.png\" border=\"0\" alt=\"LeftOuterJoinNotIN_Indexed\" width=\"244\" height=\"130\" \/><\/a><\/p>\n<p>With indexes added, the execution plans are even more different. The LEFT OUTER JOIN is still doing the complete outer join with a filter afterwards. It&#8217;s interesting to note that it&#8217;s still a hash join, even though both inputs are sorted in the order of the join keys.<\/p>\n<p>The Not Exists now has a stream aggregate (because duplicate values are irrelevant for an EXISTS\/NOT EXISTS) and an anti-semi join. The join here is no longer hash, it&#8217;s now a merge join.<\/p>\n<p>This echoes what I found when looking at <a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/01\/12\/in-vs-inner-join\/\">IN vs Inner join<\/a>. When the columns were indexed, the inner join still went for a hash join but the IN changed to a merge join. At the time, I thought it to be a fluke, I&#8217;m not so sure any longer. More tests on this are required\u2026<\/p>\n<p>The costing of the plans indicates that the optimiser believes that the LEFT OUTER JOIN form is more expensive. Do the execution stats carry the same conclusion?<\/p>\n<blockquote><p>&#8212; Query 1: LEFT OUTER JOIN<br \/>\nTable &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br \/>\nTable &#8216;BigTable&#8217;. Scan count 1, logical reads 342, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.<\/p>\n<p>SQL Server Execution Times:<br \/>\nCPU time = 172 ms,\u00a0 elapsed time = 686 ms.<\/p>\n<p>&#8212; Query 2: NOT EXISTS<br \/>\nTable &#8216;BigTable&#8217;. Scan count 1, logical reads 342, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.<\/p>\n<p>SQL Server Execution Times:<br \/>\nCPU time = 78 ms,\u00a0 elapsed time = 388 ms.<\/p><\/blockquote>\n<p>Well, yes, they do.<\/p>\n<p>The reads (ignoring the existence of the worktable for the hash join) are the same. That&#8217;s to be expected, both queries executed with a single scan of each index.<\/p>\n<p>The CPU time figures are not. The CPU time of the LEFT OUTER JOIN form is almost twice that of the NOT EXISTS.<\/p>\n<h3>In conclusion\u2026<\/h3>\n<p>If you need to find rows that don&#8217;t have a match in a second table, and the columns are nullable, use NOT EXISTS. If you need to find rows that don&#8217;t have a match in a second table, and the columns are not nullable, use NOT EXISTS or NOT IN.<\/p>\n<p>The LEFT OUTER JOIN \u2026 IS NULL method is slower when the columns are indexed and it&#8217;s perhaps not as clear what&#8217;s happening. It&#8217;s reasonably clear what a NOT EXISTS predicate does, with LEFT OUTER JOIN it&#8217;s not immediately clear that it&#8217;s a check for non-matching rows, especially if there are several where clause predicates.<\/p>\n<p>I think that&#8217;s about that for this series. I&#8217;m going to do one more post summarising all the findings, probably in a week or two.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>And to wrap up the miniseries on IN, EXISTS and JOIN, a look at NOT EXISTS and LEFT OUTER JOIN for finding non-matching rows. For previous parts, see In vs Exists In vs Inner Join Not in vs Not Exists&#8230; <a class=\"read-more-button\" href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/03\/23\/left-outer-join-vs-not-exists\/\">(Read more)<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[15,16,17],"tags":[],"class_list":["post-575","post","type-post","status-publish","format-standard","hentry","category-sql-server","category-syndication","category-t-sql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p7h6n-9h","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts\/575","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/comments?post=575"}],"version-history":[{"count":0,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts\/575\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/media?parent=575"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/categories?post=575"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/tags?post=575"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}