Quantcast
Channel: Forum SQL Server Database Engine
Viewing all articles
Browse latest Browse all 15889

Strange JOIN performance issue

$
0
0

Hi,

I've found a strange performance issue when executing simple JOIN both on SQL 2008 R2 and SQL 2012.

Let's create the following tables:

CREATE TABLE TestJoinParent (
 ID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
 Name nvarchar(50) NOT NULL,
 JoinKey int NULL
)

CREATE TABLE TestJoinChild (
 ID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
 Name nvarchar(50) NOT NULL,
 JoinKey int NULL
)

And fill them with sample data in the way that half of rows have NULL value at JoinKey column:

-- populating child tab
declare @i int = 0
-- 50000 rows with not null JoinKey
while @i < 50000 begin
 set @i = @i + 1
 insert into TestJoinChild(Name, JoinKey)
 values(N'Child #' + cast(@i as nvarchar(10)), @i)
end
-- 50000 rows with NULL JoinKey
while @i < 100000 begin
 set @i = @i + 1
 insert into TestJoinChild(Name, JoinKey)
 values(N'Child #' + cast(@i as nvarchar(10)), null)
end

-- populating parent tab
declare @j int = 0
-- 2000 rows with not null JoinKey
while @j < 2000 begin
 set @j = @j + 1
 insert into TestJoinParent(Name, JoinKey)
 values(N'Child #' + cast(@j as nvarchar(10)), @j)
end
-- 2000 rows with NULL JoinKey
while @j < 4000 begin
 set @j = @j + 1
 insert into TestJoinParent(Name, JoinKey)
 values(N'Child #' + cast(@j as nvarchar(10)), null)
end

Then execute simple join query:

select c.Name as ChildName, p.Name as ParentName
from TestJoinChild c
left outer join TestJoinParent p on p.JoinKey = c.JoinKey

It takes 3 seconds (on my desktop) and shows Hash Match join method in actual execution plan.

Let's slightly modify the query to exclude null values from join logic:

select c.Name as ChildName, p.Name as ParentName
from TestJoinChild c
left outer join TestJoinParent p on isnull(p.JoinKey, 0) = c.JoinKey

It takes less then 1 second now with the same actual execution plan shown.

Forcing different join methods (using OPTION for the query) exhibits the following result:

  • on p.JoinKey = c.JoinKey OPTION(hash join):       ~3s
  • on p.JoinKey = c.JoinKey OPTION(merge join):     ~17s
  • on p.JoinKey = c.JoinKey OPTION(loop join):        ~84s
  • isnull(p.JoinKey, 0) = c.JoinKey OPTION(hash join):     < 1s
  • isnull(p.JoinKey, 0) = c.JoinKey OPTION(merge join):   < 1s
  • isnull(p.JoinKey, 0) = c.JoinKey OPTION(loop join):      ~33s

This result is confusing. It looks like join algorithm internally performs lookup into TestJoinParent table for each NULL value found in TestJoinChild.JoinKey which does not make sense.

Replacing clear p.JoinKey = c.JoinKey join condition with poor isnull(p.JoinKey, 0) = c.JoinKey one for the sake of performance is bad idea in general case.

Can anybody explain me what is the reason of this strange behavior and how can I improve performance for this kind of query?


Viewing all articles
Browse latest Browse all 15889

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>