{"id":287,"date":"2009-08-17T16:01:37","date_gmt":"2009-08-17T14:01:37","guid":{"rendered":"https:\/\/www.sqlinthewild.co.za\/?p=287"},"modified":"2010-10-22T17:20:59","modified_gmt":"2010-10-22T15:20:59","slug":"exists-vs-in","status":"publish","type":"post","link":"https:\/\/www.sqlinthewild.co.za\/index.php\/2009\/08\/17\/exists-vs-in\/","title":{"rendered":"EXISTS vs IN"},"content":{"rendered":"<p>This one&#8217;s come up a few times recently, so I&#8217;ll take a look at it.<\/p>\n<p>The EXISTS and IN clauses at first glance look fairly similar. They both use a subquery to evaluate rows, but they do it in a slightly different way<\/p>\n<p>IN does a direct match between the column specified before the IN keyword and the values returned by the subquery. When using IN there can only be a single column specified in the select clause of the subquery<\/p>\n<p>Let&#8217;s have a look at a quick example<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">DECLARE @SomeTable (IntCol int)\r\nInsert into @SomeTable (IntCol) Values (1)\r\nInsert into @SomeTable (IntCol) Values (2)\r\nInsert into @SomeTable (IntCol) Values (3)\r\nInsert into @SomeTable (IntCol) Values (4)\r\nInsert into @SomeTable (IntCol) Values (5)\r\n\r\nSELECT *\r\nFROM BigTable\r\nWHERE SomeColumn IN (Select IntCol FROM @SomeTable)<\/pre>\n<p>So this query returns all the rows from BigTable where SomeColumn has any of the values returned by the subquery, that is 1,2,3,4 or 5<\/p>\n<p>But what if there were duplicate values returned by the subquery?<\/p>\n<p><!--more-->Well, it actually doesn&#8217;t matter. All that SQL is looking for is what values the subquery returns to process the filter. It&#8217;s not joining the two resultsets together so it makes no difference to the results if there are duplicate values returned by the subquery.<\/p>\n<p>To put it more technically, SQL&#8217;s doing a semi-join, a join that can only eliminate or qualify rows from the first table, but cannot duplicate them.<\/p>\n<p>So that&#8217;s IN. What about EXISTS<\/p>\n<p>Exists doesn&#8217;t check for a match, it doesn&#8217;t care in the slightest what values are been returned from the expression, it just checks for whether a row exists or not. Because of that, if there&#8217;s no predicate in the WHERE clause of the subquery that compares rows in the subquery with rows in the outer query, EXISTS will either return true for all the rows in the outer query or it will return false for all the rows in the outer query<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">DECLARE @SomeTable (IntCol int)\r\nInsert into @SomeTable (IntCol) Values (1)\r\nInsert into @SomeTable (IntCol) Values (2)\r\nInsert into @SomeTable (IntCol) Values (3)\r\nInsert into @SomeTable (IntCol) Values (4)\r\nInsert into @SomeTable (IntCol) Values (5)\r\n\r\nSELECT *\r\nFROM BigTable\r\nWHERE EXISTS (Select IntCol FROM @SomeTable)<\/pre>\n<p>This will also return every single row in BigTable, because Select IntCol FROM @SomeTable returns 5 rows and hence the EXISTS predicate is always true.<\/p>\n<p>Hence, to use EXISTS to do the same kind of thing as IN, there must be a correlation predicate within the subquery<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">DECLARE @SomeTable (IntCol int)\r\nInsert into @SomeTable (IntCol) Values (1)\r\nInsert into @SomeTable (IntCol) Values (2)\r\nInsert into @SomeTable (IntCol) Values (3)\r\nInsert into @SomeTable (IntCol) Values (4)\r\nInsert into @SomeTable (IntCol) Values (5)\r\n\r\nSELECT *\r\nFROM BigTable bt\r\nWHERE EXISTS (Select IntCol FROM @SomeTable st WHERE bt.SomeColumn = st.IntCol)<\/pre>\n<p>Now this will behave like the IN because it&#8217;s checking for matching rows and only returning true when there is a match. This will return all the rows from BigTable where SomeColumn has values 1,2,3,4 or 5 because those are the<\/p>\n<p>Exists is better for when comparisons are needed on two or more columns. For eg, this cannot be done easily with an IN<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">DECLARE @SomeTable (IntCol int, charCol char(1))\r\nInsert into @SomeTable (IntCol, charCol) Values (1, 'a')\r\nInsert into @SomeTable (IntCol, charCol) Values (2, 'a')\r\nInsert into @SomeTable (IntCol, charCol) Values (3, 'a')\r\nInsert into @SomeTable (IntCol, charCol) Values (4, 'b')\r\nInsert into @SomeTable (IntCol, charCol) Values (5, 'b')\r\n\r\nSELECT *\r\nFROM BigTable bt\r\nWHERE EXISTS (Select IntCol FROM @SomeTable st\r\nWHERE bt.SomeColumn = st.IntCol AND bt.SomeOtherColumn = st.charCol)<\/pre>\n<p>So that covers how they work, but how do they perform in comparison with each other? To answer that question, first I need some fairly large tables.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">Create Table BigTable (\r\nid int identity primary key,\r\nSomeColumn char(4),\r\nFiller char(100)\r\n)\r\n\r\nCreate Table SmallerTable (\r\nid int identity primary key,\r\nLookupColumn char(4),\r\nSomeArbDate Datetime default getdate()\r\n)\r\n\r\nINSERT INTO BigTable (SomeColumn)\r\nSELECT top 250000\r\nchar(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) +\r\nchar(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))\r\nfrom master.sys.columns a cross join master.sys.columns b\r\n\r\nINSERT INTO SmallerTable (LookupColumn)\r\nSELECT DISTINCT SomeColumn\r\nFROM BigTable TABLESAMPLE (25 PERCENT)\r\n-- (3955 row(s) affected)\r\n<\/pre>\n<p>Let&#8217;s first try without indexes and see how EXISTS and IN compare.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">-- Query 1\r\nSELECT ID, SomeColumn FROM BigTable\r\nWHERE SomeColumn IN (SELECT LookupColumn FROM SmallerTable)\r\n\r\n-- Query 2\r\nSELECT ID, SomeColumn FROM BigTable\r\nWHERE EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)<\/pre>\n<p>The first thing to note is that the execution plans are identical. Two clustered index scans and a hash join (right semi-join).<\/p>\n<p><a href=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2009\/08\/exists1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-thumbnail wp-image-297\" title=\"Exists vs IN\" src=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2009\/08\/exists1-150x150.png\" alt=\"\" width=\"150\" height=\"150\" \/><\/a><\/p>\n<p>The IOs are also identical.<\/p>\n<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br \/>\nTable &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 15, physical reads 0.<\/p><\/blockquote>\n<p>So these two queries are executed by SQL in exactly the same way. No performance differences here.<\/p>\n<p>Now, let me add indexes to both tables, on that join column and see what changes.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">CREATE INDEX idx_BigTable_SomeColumn\r\nON BigTable (SomeColumn)\r\nCREATE INDEX idx_SmallerTable_LookupColumn\r\nON SmallerTable (LookupColumn)<\/pre>\n<p>With those created, I&#8217;m going to run the above two queries again. Again the execution plans of the two are identical, though the hash join and clustered index scans are gone, replaced by index scans, a stream aggregate and a merge join (inner join)<\/p>\n<p><a href=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2009\/08\/exists2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-thumbnail wp-image-298\" title=\"Exists vs IN\" src=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2009\/08\/exists2-150x150.png\" alt=\"\" width=\"150\" height=\"150\" \/><\/a><\/p>\n<p>The IOs are again identical and the execution times very close.<\/p>\n<blockquote><p>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 343, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.<\/p><\/blockquote>\n<p>So IN and EXISTS appear to perform identically both when there are no indexes on the matching columns and when there are, and this is true regardless of whether of not there are nulls in either the subquery or in the outer table.<\/p>\n<p>Next up, a look at how <a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/01\/12\/in-vs-inner-join\/\">IN compares to Inner Join<\/a> for the purposes of finding matching rows<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This one&#8217;s come up a few times recently, so I&#8217;ll take a look at it. The EXISTS and IN clauses at first glance look fairly similar. They both use a subquery to evaluate rows, but they do it in a&#8230; <a class=\"read-more-button\" href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2009\/08\/17\/exists-vs-in\/\">(Read more)<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[15,16,17],"tags":[],"class_list":["post-287","post","type-post","status-publish","format-standard","hentry","category-sql-server","category-syndication","category-t-sql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p7h6n-4D","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts\/287","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/comments?post=287"}],"version-history":[{"count":0,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts\/287\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/media?parent=287"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/categories?post=287"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/tags?post=287"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}