In, Exists and join – a roundup

Over the last several months I’ve had a look at IN, Exists, Join and their opposites to see how they perform and whether there’s any truth in the advice that is often seen on forums and blogs advocating replacing one with the other.

Previous parts of this series can be found:

In this roundup post, I’m going to do multiple tests on the 6 query forms, with different numbers of rows, indexes, no indexes and, for the negative forms (NOT IN, NOT EXISTS), nullable and non-nullable join columns.

In the individual tests, I used 250000 rows in the first table and around 3000 rows in the secondary table. In this roundup, I’m going to use 3 different row counts, 1000000 rows, 50000 rows and 2500 rows. That should give a reasonable idea for performance at various table sizes. (Not much point in going smaller than 2500 rows. Everything’s fast on 100 rows)

Some notes on the tests.

  • The version of SQL is SQL Server 2008 SP1 x64 Developer Edition.
  • The tests were run on a laptop. Core-2 Duo, 3 GB memory. SQL limited to 1 processor, so no parallelism possible.
  • Each query will be run 10 times, reads, cpu and duration measured by profiler and averaged.
  • Each query will be run once before the tests start to ensure that the data is in cache and the execution plans are generated and cached.
  • Reproduction scripts will be available for download.

Exists vs. In vs. Inner Join

First, no indexes on the join columns

Table Size Operator CPU Reads Duration
Large IN 1293 14585 9649
Exists 1260 14585 9573
Inner Join 1302 14585 9716
Medium IN 59 747 538
Exists 78 747 574
Inner Join 69 747 523
Small IN 7 41 65
Exists 3 41 91
Inner Join 4 41 65

Now with indexes on the join columns

Table Size Operator CPU Reads Duration
Large IN 973 1760 9707
Exists 956 1760 9483
Inner Join 1173 1760 9539
Medium IN 43 100 516
Exists 53 100 548
Inner Join 59 100 498
Small IN 3 9 64
Exists 1 9 80
Inner Join 4 9 67

Not Exists vs. Not In vs. Left Outer Join … Is Null

First test with the columns join columns nullable, no indexes

Table Size Operator CPU Reads Duration
Large NOT IN 3194 2014622 3251
NOT Exists 820 14585 837
Outer Join 962 14585 1025
Medium NOT IN 174 100765 217
NOT Exists 54 747 121
Outer Join 53 747 79
Small NOT IN 12 5043 13
NOT Exists 4 41 6
Outer Join 3 41 5

Then with join columns nullable with indexes

Table Size Operator CPU Reads Duration
Large NOT IN 2677 2001762 2726
NOT Exists 569 1760 586
Outer Join 949 1760 1029
Medium NOT IN 137 100102 164
NOT Exists 40 100 104
Outer Join 48 100 69
Small NOT IN 11 5011 12
NOT Exists 3 9 4
Outer Join 6 9 6

Now, let’s make the join columns not nullable. Again, no indexes to start with.

Table Size Operator CPU Reads Duration
Large NOT IN 741 14585 753
NOT Exists 784 14585 790
Outer Join 884 14585 937
Medium NOT IN 43 747 103
NOT Exists 49 747 120
Outer Join 53 747 74
Small NOT IN 4 41 4
NOT Exists 1 41 5
Outer Join 1 41 5

and finally, join columns not nullable, with indexes

Table Size Operator CPU Reads Duration
Large NOT IN 578 1382 588
NOT Exists 585 1382 597
Outer Join 953 1382 1006
Medium NOT IN 37 80 79
NOT Exists 34 80 79
Outer Join 39 80 84
Small NOT IN 3 8 4
NOT Exists 1 8 5
Outer Join 4 8 5

These results seem to pretty much confirm the earlier conclusions.

Exists and IN perform much the same, whether there are indexes on the join column or not. When there are indexes on the join columns, the INNER JOIN is slightly (very slightly) slower, which is more noticeable on the large tables, much less on the medium or small ones. (Note I’m mostly looking at CPU time, as the duration is also affected by sending of results to client, in this case, lots and lots of results)

When it comes to NOT In and NOT Exists they perform much the same when the columns involved are not nullable. If the columns are nullable, Not In is significantly slower because it has a different behaviour when nulls are present.

The join is slightly slower than Not Exists (or Not In on non-nullable columns), again only noticeable on the large table, probably because the optimiser has to do a full join with a secondary filter rather than the anti-semi join that it can use for Not Exists and Not In.

My conclusion from earlier posts stands. If all you are doing is looking for matching or non-matching rows and you don’t need any columns from the second table, use IN or Exists (or their negations), as appropriate for the situation. Only when you need columns from the second table should Join be used.

I think (and hope) that this adequately concludes the discussion on the Exists and In and joins, both behaviour and performance.

Reproduction scripts

6 Comments

  1. Brad Schulz

    Nice summary, Gail, to a great series of posts.

    For yet another weird wrinkle in the NOT EXISTS and OUTER JOIN saga, check this out:

    http://bradsruminations.blogspot.com/2010/04/looking-under-hood.html

    Reply
  2. Josh Meares

    Great Job. I’m curious how parallel processing would change the mix. However, the results as they stand make sense to me.

    The order of the day: make sure your foreign keys are indexed and for Pete’s sake, make them non-nullable if they don’t need nulls!

    Reply
  3. Gail

    Absolutely. I think I need a rant on data types and nullability.

    What I noticed on earlier tests (when I first started writing the series) was that not all the query operators would parallel at the same point. That’s the main reason I forced everything to run without parallelism, so I could be sure of comparing apples with apples, not watermelons.

    Reply
  4. LC

    Interesting information. Thanks for going to the trouble to research and post on this subject.

    LC

    Reply
  5. Mark

    I’m not disputing your data Gail, and I might be wrong, but through experience I tend to avoid EXISTS and IN as much as possible and use JOINs instead. I’ll pay more attention to this from now on though. Maybe my environment’s just odd or something.

    Reply
  6. Pingback: Qual é mais rápido: NOT IN ou NOT EXISTS? – Porto SQL

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.