Hi,
I've found a strange performance issue when executing simple JOIN both on SQL 2008 R2 and SQL 2012.
Let's create the following tables:
CREATE TABLE TestJoinParent ( ID int IDENTITY(1,1) NOT NULL PRIMARY KEY, Name nvarchar(50) NOT NULL, JoinKey int NULL ) CREATE TABLE TestJoinChild ( ID int IDENTITY(1,1) NOT NULL PRIMARY KEY, Name nvarchar(50) NOT NULL, JoinKey int NULL )
And fill them with sample data in the way that half of rows have NULL value at JoinKey column:
-- populating child tab declare @i int = 0 -- 50000 rows with not null JoinKey while @i < 50000 begin set @i = @i + 1 insert into TestJoinChild(Name, JoinKey) values(N'Child #' + cast(@i as nvarchar(10)), @i) end -- 50000 rows with NULL JoinKey while @i < 100000 begin set @i = @i + 1 insert into TestJoinChild(Name, JoinKey) values(N'Child #' + cast(@i as nvarchar(10)), null) end -- populating parent tab declare @j int = 0 -- 2000 rows with not null JoinKey while @j < 2000 begin set @j = @j + 1 insert into TestJoinParent(Name, JoinKey) values(N'Child #' + cast(@j as nvarchar(10)), @j) end -- 2000 rows with NULL JoinKey while @j < 4000 begin set @j = @j + 1 insert into TestJoinParent(Name, JoinKey) values(N'Child #' + cast(@j as nvarchar(10)), null) end
Then execute simple join query:
select c.Name as ChildName, p.Name as ParentName from TestJoinChild c left outer join TestJoinParent p on p.JoinKey = c.JoinKey
It takes 3 seconds (on my desktop) and shows Hash Match join method in actual execution plan.
Let's slightly modify the query to exclude null values from join logic:
select c.Name as ChildName, p.Name as ParentName from TestJoinChild c left outer join TestJoinParent p on isnull(p.JoinKey, 0) = c.JoinKey
It takes less then 1 second now with the same actual execution plan shown.
Forcing different join methods (using OPTION for the query) exhibits the following result:
- on p.JoinKey = c.JoinKey OPTION(hash join): ~3s
- on p.JoinKey = c.JoinKey OPTION(merge join): ~17s
- on p.JoinKey = c.JoinKey OPTION(loop join): ~84s
- isnull(p.JoinKey, 0) = c.JoinKey OPTION(hash join): < 1s
- isnull(p.JoinKey, 0) = c.JoinKey OPTION(merge join): < 1s
- isnull(p.JoinKey, 0) = c.JoinKey OPTION(loop join): ~33s
This result is confusing. It looks like join algorithm internally performs lookup into TestJoinParent table for each NULL value found in TestJoinChild.JoinKey which does not make sense.
Replacing clear p.JoinKey = c.JoinKey join condition with poor isnull(p.JoinKey, 0) = c.JoinKey one for the sake of performance is bad idea in general case.
Can anybody explain me what is the reason of this strange behavior and how can I improve performance for this kind of query?