{"id":394,"date":"2010-01-12T16:00:58","date_gmt":"2010-01-12T14:00:58","guid":{"rendered":"https:\/\/www.sqlinthewild.co.za\/?p=394"},"modified":"2011-08-22T14:26:32","modified_gmt":"2011-08-22T12:26:32","slug":"in-vs-inner-join","status":"publish","type":"post","link":"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/01\/12\/in-vs-inner-join\/","title":{"rendered":"IN vs INNER JOIN"},"content":{"rendered":"<p>Often in forum threads discussing query performance I&#8217;ll see people recommending replacing an INNER JOIN with an IN or EXISTS (or recommending replacing an IN or EXISTS with an INNER JOIN) for performance reasons. I&#8217;ve previously looked at how the <a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2009\/08\/17\/exists-vs-in\/\">IN and EXISTS<\/a> compared, now I&#8217;m going to investigate and see how IN compares with the join.<\/p>\n<p>One very important thing to note right off is that they are not equivalent in all cases.<\/p>\n<p>An inner join between two tables does a complete join, it checks for matches and returns rows. This means, if there are multiple matching rows in the second table, multiple rows will be returned. Also, when two tables are joined, columns can be returned from either.\u00a0 As a quick example:<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nDECLARE @BigTable TABLE (\r\n id INT IDENTITY PRIMARY KEY,\r\n SomeColumn CHAR(4),\r\n Filler CHAR(100)\r\n)\r\n\r\nInsert into @BigTable(SomeColumn) Values (1)\r\nInsert into @BigTable(SomeColumn) Values (2)\r\nInsert into @BigTable(SomeColumn) Values (3)\r\nInsert into @BigTable(SomeColumn) Values (4)\r\nInsert into @BigTable(SomeColumn) Values (5)\r\n\r\n\r\nDECLARE @SomeTable TABLE (IntCol int)\r\nInsert into @SomeTable (IntCol) Values (1)\r\nInsert into @SomeTable (IntCol) Values (2)\r\nInsert into @SomeTable (IntCol) Values (2)\r\nInsert into @SomeTable (IntCol) Values (3)\r\nInsert into @SomeTable (IntCol) Values (4)\r\nInsert into @SomeTable (IntCol) Values (5)\r\nInsert into @SomeTable (IntCol) Values (5)\r\n\r\nSELECT *\r\n FROM @BigTable b INNER JOIN @SomeTable  s ON b.SomeColumn = s.IntCol<\/pre>\n<p>This returns 7 rows and returns columns from both tables. Because the values in @SomeTable are duplicated, the matching rows from BigTable are returned twice.<\/p>\n<p>With an IN, what is done is a semi-join, a join that checks for matches but does not return rows. This means if there are multiple matching tables in the resultset used for the IN, it doesn&#8217;t matter. Only one row from the first table will be returned. Also, because the rows are not returned, columns from the table referenced in the IN cannot be returned. As a quick example<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nDECLARE @BigTable TABLE (\r\n id INT IDENTITY PRIMARY KEY,\r\n SomeColumn CHAR(4),\r\n Filler CHAR(100)\r\n)\r\n\r\nInsert into @BigTable(SomeColumn) Values (1)\r\nInsert into @BigTable(SomeColumn) Values (2)\r\nInsert into @BigTable(SomeColumn) Values (3)\r\nInsert into @BigTable(SomeColumn) Values (4)\r\nInsert into @BigTable(SomeColumn) Values (5)\r\n\r\nDECLARE @SomeTable TABLE (IntCol int)\r\nInsert into @SomeTable (IntCol) Values (1)\r\nInsert into @SomeTable (IntCol) Values (2)\r\nInsert into @SomeTable (IntCol) Values (2)\r\nInsert into @SomeTable (IntCol) Values (3)\r\nInsert into @SomeTable (IntCol) Values (4)\r\nInsert into @SomeTable (IntCol) Values (5)\r\nInsert into @SomeTable (IntCol) Values (5)\r\n\r\nSELECT *\r\n FROM @BigTable\r\n WHERE SomeColumn IN (Select IntCol FROM @SomeTable)<\/pre>\n<p>This returns 5 rows and only columns from BigTable.<\/p>\n<p>So, that said, how does the performance of the two differ for the cases where the results are identical (no duplicates in the second table, no columns needed from the second table)? For that, I&#8217;m going to need larger tables to play with.<!--more--><\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nCREATE TABLE BigTable (\r\nid INT IDENTITY PRIMARY KEY,\r\nSomeColumn CHAR(4),\r\nFiller CHAR(100)\r\n)\r\n\r\nCREATE TABLE SmallerTable (\r\nid INT IDENTITY PRIMARY KEY,\r\nLookupColumn CHAR(4),\r\nSomeArbDate DATETIME DEFAULT GETDATE()\r\n)\r\n\r\nINSERT INTO BigTable (SomeColumn)\r\nSELECT top 250000 char(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) + char(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))\r\nFROM master.sys.columns a CROSS JOIN master.sys.columns b\r\n\r\nINSERT INTO SmallerTable (LookupColumn)\r\nSELECT DISTINCT SomeColumn\r\nFROM BigTable TABLESAMPLE (25 PERCENT)\r\n-- (3819 row(s) affected)\r\n<\/pre>\n<p>That&#8217;s the setup done, now for the two test cases. Let\u2019s first try without indexes and see how the INNER JOIN and IN compare. I&#8217;m selecting from just the first table to ensure that the two queries are logically identical. The DISTINCT used to populate the smaller table ensures that there are no duplicate rows in the smaller table.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">SELECT BigTable.ID, SomeColumn\r\nFROM BigTable\r\nWHERE SomeColumn IN (SELECT LookupColumn FROM dbo.SmallerTable)\r\n\r\nSELECT BigTable.ID, SomeColumn\r\nFROM BigTable\r\nINNER JOIN SmallerTable ON dbo.BigTable.SomeColumn = dbo.SmallerTable.LookupColumn<\/pre>\n<p>Something of interest straight away, the execution plans are almost identical. Not completely identical, but the only difference is that the hash join for the IN shows a Hash Match (Right Semi Join) and the hash join for the INNER JOIN shows a Hash Match (Inner Join)<\/p>\n<p><a href=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/01\/InVsSelect-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-thumbnail wp-image-513\" style=\"border: 1px solid black;\" title=\"In Vs Select 1\" src=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/01\/InVsSelect-1-150x150.png\" alt=\"In Vs Select 1\" width=\"150\" height=\"150\" \/><\/a><\/p>\n<p>The IOs are the same and the durations are extremely similar. Here&#8217;s the IO results and durations for five tests.<\/p>\n<p>IN<\/p>\n<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br \/>\nTable &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.<\/p>\n<p>SQL Server Execution Times:<br \/>\nCPU time = 156 ms,\u00a0 elapsed time = 2502 ms.<br \/>\nCPU time = 157 ms,\u00a0 elapsed time = 2323 ms.<br \/>\nCPU time = 156 ms,\u00a0 elapsed time = 2555 ms.<br \/>\nCPU time = 188 ms,\u00a0 elapsed time = 2381 ms.<br \/>\nCPU time = 203 ms,\u00a0 elapsed time = 2312 ms.<\/p><\/blockquote>\n<p>INNER JOIN<\/p>\n<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br \/>\nTable &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.<\/p>\n<p>SQL Server Execution Times:<br \/>\nCPU time = 125 ms,\u00a0 elapsed time = 2922 ms.<br \/>\nCPU time = 140 ms,\u00a0 elapsed time = 2372 ms.<br \/>\nCPU time = 188 ms,\u00a0 elapsed time = 2530 ms.<br \/>\nCPU time = 203 ms,\u00a0 elapsed time = 2323 ms.<br \/>\nCPU time = 187 ms,\u00a0 elapsed time = 2512 ms.<\/p><\/blockquote>\n<p>Now let&#8217;s try with some indexes on the join columns.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">CREATE INDEX idx_BigTable_SomeColumn ON BigTable (SomeColumn)\r\nCREATE INDEX idx_SmallerTable_LookupColumn ON SmallerTable (LookupColumn)<\/pre>\n<p>Now when I run the two queries, the execution plans are different, and the costs of the two are no longer 50%. Both do a single index scan on each table, but the IN has a Merge Join (Inner Join) and the INNER JOIN has a Hash Match (Inner Join)<\/p>\n<p><a href=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/01\/InVsSelect-2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-thumbnail wp-image-514\" style=\"border: 1px solid black;\" title=\"InVsSelect 2\" src=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/01\/InVsSelect-2-150x150.png\" alt=\"InVsSelect 2\" width=\"150\" height=\"150\" \/><\/a><\/p>\n<p>The IOs are still identical, other than the WorkTable that only appears for the Hash Join.<\/p>\n<p>IN<\/p>\n<blockquote><p>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.<\/p><\/blockquote>\n<p>INNER JOIN<\/p>\n<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br \/>\nTable &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.<\/p><\/blockquote>\n<p>So what about the durations? Honestly it&#8217;s hard to say anything completely conclusive, the durations of both queries are quite small and they are very close. To see if there is any measurable different, I&#8217;m going to run each one 100 times, use Profiler to log the duration and CPU and then average the results over the 100 executions. While running this, I&#8217;m also going to close\/disable everything else I can on the computer, to try and get reasonably accurate times.<\/p>\n<p>IN<\/p>\n<p>Average CPU: 130.<br \/>\nAvg duration: 2.78 seconds<\/p>\n<p>INNER JOIN<\/p>\n<p>Average CPU: 161.<br \/>\nAvg duration: 2.93 seconds<\/p>\n<p>Now is that enough to be significant? I&#8217;m not sure. However, looking at those results along with the IO and execution plans, I do have a recommendation for In vs Inner Join<\/p>\n<p>If all you need is to check for matching rows in the other table but don&#8217;t need any columns from that table, use IN. If you do need columns from the second table, use Inner Join.<\/p>\n<p>I still intend to go over <a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/02\/18\/not-exists-vs-not-in\/\">NOT IN and NOT EXISTS<\/a> and, after this one, I also want to take a look at the <a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/03\/23\/left-outer-join-vs-not-exists\/\">LEFT JOIN with IS NULL check vs NOT EXISTS<\/a> for when you want rows from Table1 that don&#8217;t have a match in Table 2.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Often in forum threads discussing query performance I&#8217;ll see people recommending replacing an INNER JOIN with an IN or EXISTS (or recommending replacing an IN or EXISTS with an INNER JOIN) for performance reasons. I&#8217;ve previously looked at how the&#8230; <a class=\"read-more-button\" href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/01\/12\/in-vs-inner-join\/\">(Read more)<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[15,16,17],"tags":[],"class_list":["post-394","post","type-post","status-publish","format-standard","hentry","category-sql-server","category-syndication","category-t-sql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p7h6n-6m","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts\/394","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/comments?post=394"}],"version-history":[{"count":0,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts\/394\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/media?parent=394"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/categories?post=394"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/tags?post=394"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}