{"id":553,"date":"2010-02-18T16:00:32","date_gmt":"2010-02-18T14:00:32","guid":{"rendered":"https:\/\/www.sqlinthewild.co.za\/?p=553"},"modified":"2016-12-31T12:37:11","modified_gmt":"2016-12-31T10:37:11","slug":"not-exists-vs-not-in","status":"publish","type":"post","link":"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/02\/18\/not-exists-vs-not-in\/","title":{"rendered":"NOT EXISTS vs NOT IN"},"content":{"rendered":"<p>Continuing with the mini-series on query operators, I want to have a look at NOT EXISTS and NOT IN.<\/p>\n<p>Previous parts of this miniseries are:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2009\/08\/17\/exists-vs-in\/\">EXISTS vs IN<\/a><\/li>\n<li><a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/01\/12\/in-vs-inner-join\/\">IN vs INNER JOIN<\/a><\/li>\n<\/ul>\n<p>Just one note before diving into that. The examples I\u2019m using are fairly simplistic and that\u2019s intentional. I\u2019m trying to find what, if any, are the performance differences in a benchmark-style setup. I\u2019ll have some comments on more complex examples in a later post.<\/p>\n<p>The most important thing to note about NOT EXISTS and NOT IN is that, unlike EXISTS and IN,\u00a0 they are not equivalent in all cases. Specifically, when NULLs are involved they will return different results. To be totally specific, when the subquery returns even one null, NOT IN will not match any rows.<\/p>\n<p>The reason for this can be found by looking at the details of what the NOT IN operation actually means.<\/p>\n<p>Let\u2019s say, for illustration purposes that there are 4 rows in the table called t, there\u2019s a column called ID with values 1..4<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">WHERE SomeValue NOT IN (SELECT AVal FROM t)<\/pre>\n<p>is equivalent to<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">WHERE (\r\nSomeValue != (SELECT AVal FROM t WHERE ID=1)\r\nAND\r\nSomeValue != (SELECT AVal FROM t WHERE ID=2)\r\nAND\r\nSomeValue != (SELECT AVal FROM t WHERE ID=3)\r\nAND\r\nSomeValue != (SELECT AVal FROM t WHERE ID=4)\r\n)<\/pre>\n<p>Let\u2019s further say that AVal is NULL where ID = 4. Hence that != comparison returns UNKNOWN. The logical truth table for AND states that UNKNOWN and TRUE is UNKNOWN, UNKNOWN and FALSE is FALSE. There is no value that can be AND\u2019d with UNKNOWN to produce the result TRUE<\/p>\n<p>Hence, if any row of that subquery returns NULL, the entire NOT IN operator will evaluate to either FALSE or UNKNOWN and no records will be returned<\/p>\n<p>So what about EXISTS?<\/p>\n<p>Exists cannot return NULL. It\u2019s checking solely for the presence or absence of a row in the subquery and, hence, it can only return true or false. Since it cannot return NULL, there\u2019s no possibility of a single NULL resulting in the entire expression evaluating to UNKNOWN.<\/p>\n<p>Hence, when the column in the subquery that\u2019s used for comparison with the outer table can have nulls in it, consider carefully which of NOT EXISTS or NOT IN you want to use.<\/p>\n<p>Ok, but say there are no nulls in the column. How do they compare speed-wise. I\u2019m going to do two tests, one where the columns involved in the comparison are defined as NULL and one where they are defined as NOT NULL. There will be no NULL values in the columns in either case. In both cases, the join columns will be indexed. After all, we all index our join columns, right?<\/p>\n<p>So, first test, non-nullable columns. First some setup<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">CREATE TABLE BigTable (\r\nid INT IDENTITY PRIMARY KEY,\r\nSomeColumn char(4) NOT NULL,\r\nFiller CHAR(100)\r\n)\r\n\r\nCREATE TABLE SmallerTable (\r\nid INT IDENTITY PRIMARY KEY,\r\nLookupColumn char(4) NOT NULL,\r\nSomeArbDate Datetime default getdate()\r\n)\r\n\r\nINSERT INTO BigTable (SomeColumn)\r\nSELECT top 250000\r\nchar(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) +\r\nchar(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))\r\nfrom master.sys.columns a cross join master.sys.columns b\r\n\r\nINSERT INTO SmallerTable (LookupColumn)\r\nSELECT DISTINCT SomeColumn\r\nFROM BigTable TABLESAMPLE (25 PERCENT)\r\n-- (3898 row(s) affected)\r\n\r\nCREATE INDEX idx_BigTable_SomeColumn\r\nON BigTable (SomeColumn)\r\nCREATE INDEX idx_SmallerTable_LookupColumn\r\nON SmallerTable (LookupColumn)<\/pre>\n<p>Then the queries<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">-- Query 1\r\nSELECT ID, SomeColumn FROM BigTable\r\nWHERE SomeColumn NOT IN (SELECT LookupColumn FROM SmallerTable)\r\n\r\n-- Query 2\r\nSELECT ID, SomeColumn FROM BigTable\r\nWHERE NOT EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)<\/pre>\n<p>The first thing to note is that the execution plans are identical.<\/p>\n<p><a href=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/02\/ExecPlansNOTNULL.png\"><img loading=\"lazy\" decoding=\"async\" style=\"display: inline; border-width: 0px;\" title=\"ExecPlansNOTNULL\" src=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/02\/ExecPlansNOTNULL_thumb.png\" alt=\"ExecPlansNOTNULL\" width=\"244\" height=\"130\" border=\"0\" \/><\/a><\/p>\n<p>The execution characteristics are also identical.<\/p>\n<blockquote><p><strong>Query 1<br \/>\n<\/strong>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 342, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.<\/p>\n<p>SQL Server Execution Times:<br \/>\nCPU time = 156 ms,\u00a0 elapsed time = 221 ms.<\/p>\n<p><strong>Query 2<br \/>\n<\/strong>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 342, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.<\/p>\n<p>SQL Server Execution Times:<br \/>\nCPU time = 156 ms,\u00a0 elapsed time = 247 ms.<\/p><\/blockquote>\n<p>So, at least for the case where the columns are defined as NOT NULL, these two perform the same.<\/p>\n<p>What about the case where the columns are defined as nullable? I&#8217;m going to simply alter the two columns involved without changing anything else, then test out the two queries again.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">ALTER TABLE BigTable\r\n ALTER COLUMN SomeColumn char(4) NULL\r\n\r\nALTER TABLE SmallerTable\r\n ALTER COLUMN LookupColumn char(4) NULL<\/pre>\n<p>And the same two queries<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">-- Query 1\r\n\r\nSELECT ID, SomeColumn FROM BigTable\r\nWHERE SomeColumn NOT IN (SELECT LookupColumn FROM SmallerTable)\r\n\r\n-- Query 2\r\nSELECT ID, SomeColumn FROM BigTable\r\nWHERE NOT EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)<\/pre>\n<p>And as for their performance\u2026<\/p>\n<p><a href=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/02\/ExecPlansNull.png\"><img loading=\"lazy\" decoding=\"async\" style=\"display: inline; border-width: 0px;\" title=\"ExecPlansNull\" src=\"https:\/\/www.sqlinthewild.co.za\/wp-content\/uploads\/2010\/02\/ExecPlansNull_thumb.png\" alt=\"ExecPlansNull\" width=\"244\" height=\"123\" border=\"0\" \/><\/a><\/p>\n<blockquote><p><strong>Query 1<\/strong><br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 3, logical reads 500011, physical reads 0.<br \/>\nTable &#8216;BigTable&#8217;. Scan count 1, logical reads 437, physical reads 0.<\/p>\n<p>SQL Server Execution Times:<br \/>\nCPU time = 827 ms,\u00a0 elapsed time = 825 ms.<\/p>\n<p><strong>Query 2<br \/>\n<\/strong>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 437, physical reads 0.<br \/>\nTable &#8216;SmallerTable&#8217;. Scan count 1, logical reads 9, physical reads 0.<\/p>\n<p>SQL Server Execution Times:<br \/>\nCPU time = 156 ms,\u00a0 elapsed time = 228 ms.<\/p><\/blockquote>\n<p>Radically different execution plans, radically different performance characteristics. The NOT IN took over 5 times longer to execute and did thousands of times more reads.<\/p>\n<p>Why is that complex execution plan required when there may be nulls in the column? I can&#8217;t answer that one, probably only one of the query optimiser developers can, however the results are obvious. When the columns allow nulls but has none, the NOT IN performs significantly worse than NOT EXISTS.<\/p>\n<p>So, take-aways from this?<\/p>\n<p>Most importantly, NOT EXISTS and NOT IN do not have the same behaviour when there are NULLs involved. Chose carefully which you want.<\/p>\n<p>Columns that will never contain NULL values should be defined as NOT NULL so that SQL knows there will never be NULL values in them and so that it doesn\u2019t have to produce complex plans to handle potential nulls.<\/p>\n<p>On non-nullable columns, the behaviour and performance of NOT IN and NOT EXISTS are the same, so use whichever one works better for the specific situation.<\/p>\n<p>One more to go on this: <a href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/03\/23\/left-outer-join-vs-not-exists\/\">LEFT OUTER JOIN with the IS NULL check vs NOT IN<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Continuing with the mini-series on query operators, I want to have a look at NOT EXISTS and NOT IN. Previous parts of this miniseries are: EXISTS vs IN IN vs INNER JOIN Just one note before diving into that. The&#8230; <a class=\"read-more-button\" href=\"https:\/\/www.sqlinthewild.co.za\/index.php\/2010\/02\/18\/not-exists-vs-not-in\/\">(Read more)<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[15,16,17],"tags":[],"class_list":["post-553","post","type-post","status-publish","format-standard","hentry","category-sql-server","category-syndication","category-t-sql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p7h6n-8V","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts\/553","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/comments?post=553"}],"version-history":[{"count":3,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts\/553\/revisions"}],"predecessor-version":[{"id":2054,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/posts\/553\/revisions\/2054"}],"wp:attachment":[{"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/media?parent=553"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/categories?post=553"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sqlinthewild.co.za\/index.php\/wp-json\/wp\/v2\/tags?post=553"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}