Monday, March 12, 2012
Help needed to improve the performance of the query
I have the following setup and I would appreciate any help in improving
the performance of the query.
BigTable:
Column1 (indexed)
Column2 (indexed)
Column3 (no index)
Column4 (no index)
select
[time] =
CASE
when BT.Column3 = 'value1' then DateAdd(...)
when BT.Column3 in ('value2', 'value3') then DateAdd(...)
END,
Duration =
CASE
when BT.Column3 = 'value1' then DateDiff(...)
when BT.Column3 in ('value2', 'value3') then DateDiff(ss,
BT.OrigTime, (select TOP 1 X.OrigTime from BigTable X where X.Column1 >
BT.Column1 and X.Column3 <> 'value4' order by X.Column1 ))
END,
FROM
BigTable BT where BT.Column3 = 'value1' OR (BT.Column3 in ('value2',
'value3') and BT.Column4 <> (select X.Column4 from BigTable X where
X.Column1 = BT.Column1 and X.Column3 = 'Value1'))
Apart from the above mentioned, there are a few more columns which are
just a part of select statement and are not in any condition statments.
The BigTable has around 1 Mil records and the response time is very
poor, it takes around 3 mins to retrieve the records (which would be
around 500K)
With the Statistics ON,
I get the following:
Table 'BigTable'. Scan count 2, logical reads 44184, physical reads 0,
read-ahead reads 0.
Table 'WorkTable'. Scan count 541221, logical reads 4873218, physical
reads 0, read-ahead reads 0.
Is there any way to increase the performance, so that I can get the
result under 1 minute?
Any help would be appreciated.
P.S: I tried indexing the Column3, but no improvement.
-SRwhoops this might not owork. let me test.|||OK. Instead of cyling through your entire big table in you your where clause for every record you should do something like this. Mine is a little more dumbed down but you should get the concept
CREATE TABLE test2
(
[ID] int IDENTITY,
col1 int,
col2 int,
col3 int
)
INSERT INTO test2(col1,col2,col3)
VALUES (1,2,3)
INSERT INTO test2(col1,col2,col3)
VALUES (3,2,1)
INSERT INTO test2(col1,col2,col3)
VALUES (2,3,3)
SELECT A.col1,A.col2,A.col3
FROM test2 A
LEFT OUTER JOIN test2 B
ON A.[ID] = B.[ID]
WHERE A.col2 <> B.col3
I hate subqueries in the where clause.|||Is Column1 the primary key of the big table?|||Thanks a lot for your reply.
The problem is I cannot change the database (I cannot add new tables)
I have to define a view to use the existing tables.
I have created a view using the select statement mentioned in the first post, and doing a select on view.
OK. Instead of cyling through your entire big table in you your where clause for every record you should do something like this. Mine is a little more dumbed down but you should get the concept
CREATE TABLE test2
(
[ID] int IDENTITY,
col1 int,
col2 int,
col3 int
)
INSERT INTO test2(col1,col2,col3)
VALUES (1,2,3)
INSERT INTO test2(col1,col2,col3)
VALUES (3,2,1)
INSERT INTO test2(col1,col2,col3)
VALUES (2,3,3)
SELECT A.col1,A.col2,A.col3
FROM test2 A
LEFT OUTER JOIN test2 B
ON A.[ID] = B.[ID]
WHERE A.col2 <> B.col3
I hate subqueries in the where clause.|||Nope,
Column2 is the primary key.
Is Column1 the primary key of the big table?|||I did'nt tell you to create any new tables. This was a demonstration of recursive querying on the same table without doing a subquery against a million records in the where clause (which I think is the source of your performance issue). I did'nt have time to rewrite your whole thing. Sorry.
If you take a look at the example again I am joining the table to itself by using aliases.|||I did'nt tell you to create any new tables. This was a demonstration of recursive querying on the same table without doing a subquery against a million records in the where clause (which I think is the source of your performance issue). I did'nt have time to rewrite your whole thing. Sorry.
If you take a look at the example again I am joining the table to itself by using aliases.
My bad. Didn't read it completely.
I put the left outer join as you mentioned, but haven't seen any improvement.|||What indices do you have on this table? I would create seperate indices for columns 1,3, and for because they are all being searched. Perhaps a clustered index on the column with the highest number of distinct values and nonclustered on the other 2.
Have you looked at the execution? Where are your table scans occurring? I do not see why you have column2 indexed. I do not see where it is joined on or searched.
Help needed to improve the performance of the query
I have the following setup and I would appreciate any help in improving
the performance of the query.
BigTable:
Column1 (indexed)
Column2 (indexed)
Column3 (no index)
Column4 (no index)
select
[time] =
CASE
when BT.Column3 = 'value1' then DateAdd(...)
when BT.Column3 in ('value2', 'value3') then DateAdd(...)
END,
Duration =
CASE
when BT.Column3 = 'value1' then DateDiff(...)
when BT.Column3 in ('value2', 'value3') then DateDiff(ss,
BT.OrigTime, (select TOP 1 X.OrigTime from BigTable X where X.Column1 >
BT.Column1 and X.Column3 <> 'value4' order by X.Column1 ))
END,
FROM
BigTable BT where BT.Column3 = 'value1' OR (BT.Column3 in ('value2',
'value3') and BT.Column4 <> (select X.Column4 from BigTable X where
X.Column1 = BT.Column1 and X.Column3 = 'Value1'))
Apart from the above mentioned, there are a few more columns which are
just a part of select statement and are not in any condition statments.
The BigTable has around 1 Mil records and the response time is very
poor, it takes around 3 mins to retrieve the records (which would be
around 500K)
With the Statistics ON,
I get the following:
Table 'BigTable'. Scan count 2, logical reads 44184, physical reads 0,
read-ahead reads 0.
Table 'WorkTable'. Scan count 541221, logical reads 4873218, physical
reads 0, read-ahead reads 0.
Is there any way to increase the performance, so that I can get the
result under 1 minute?
Any help would be appreciated.
P.S: I tried indexing the Column3, but no improvement.(rsarath@.gmail.com) writes:
> I have the following setup and I would appreciate any help in improving
> the performance of the query.
> BigTable:
> Column1 (indexed)
> Column2 (indexed)
> Column3 (no index)
> Column4 (no index)
>
> select
> [time] =
> CASE
> when BT.Column3 = 'value1' then DateAdd(...)
> when BT.Column3 in ('value2', 'value3') then DateAdd(...)
> END,
> Duration =
> CASE
> when BT.Column3 = 'value1' then DateDiff(...)
> when BT.Column3 in ('value2', 'value3') then DateDiff(ss,
> BT.OrigTime, (select TOP 1 X.OrigTime from BigTable X where X.Column1 >
> BT.Column1 and X.Column3 <> 'value4' order by X.Column1 ))
> END,
> FROM
> BigTable BT where BT.Column3 = 'value1' OR (BT.Column3 in ('value2',
> 'value3') and BT.Column4 <> (select X.Column4 from BigTable X where
> X.Column1 = BT.Column1 and X.Column3 = 'Value1'))
It is very difficult with this very abstract representation of the
query to say that much. What strikes my eyes are the two correlated
subqueries. If value2 and value3 are frequent, these subqueries are
likely to be invoked many times.
It is often a good idea to replace correlated subqueries with derived
tables that you join to. But I cannot say whether this is possible. To
suggest a rewrite of the query I would need:
o CREATE TABLE statments (possibly simplified) for the table.
o INSERT statements with sample data, enough to demonstrate all
variations.
o The expected output for the sample.
For the performance point of view it would also be useful to know
whether the indexes you have are clustered or not.
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp