Monday, March 12, 2012

Help needed to improve the performance of the query

Hello,
I have the following setup and I would appreciate any help in improving
the performance of the query.

BigTable:
Column1 (indexed)
Column2 (indexed)
Column3 (no index)
Column4 (no index)

select
[time] =
CASE
when BT.Column3 = 'value1' then DateAdd(...)
when BT.Column3 in ('value2', 'value3') then DateAdd(...)
END,
Duration =
CASE
when BT.Column3 = 'value1' then DateDiff(...)
when BT.Column3 in ('value2', 'value3') then DateDiff(ss,
BT.OrigTime, (select TOP 1 X.OrigTime from BigTable X where X.Column1 >
BT.Column1 and X.Column3 <> 'value4' order by X.Column1 ))
END,

FROM
BigTable BT where BT.Column3 = 'value1' OR (BT.Column3 in ('value2',
'value3') and BT.Column4 <> (select X.Column4 from BigTable X where
X.Column1 = BT.Column1 and X.Column3 = 'Value1'))

Apart from the above mentioned, there are a few more columns which are
just a part of select statement and are not in any condition statments.

The BigTable has around 1 Mil records and the response time is very
poor, it takes around 3 mins to retrieve the records (which would be
around 500K)

With the Statistics ON,
I get the following:

Table 'BigTable'. Scan count 2, logical reads 44184, physical reads 0,
read-ahead reads 0.
Table 'WorkTable'. Scan count 541221, logical reads 4873218, physical
reads 0, read-ahead reads 0.

Is there any way to increase the performance, so that I can get the
result under 1 minute?
Any help would be appreciated.

P.S: I tried indexing the Column3, but no improvement.(rsarath@.gmail.com) writes:
> I have the following setup and I would appreciate any help in improving
> the performance of the query.
> BigTable:
> Column1 (indexed)
> Column2 (indexed)
> Column3 (no index)
> Column4 (no index)
>
> select
> [time] =
> CASE
> when BT.Column3 = 'value1' then DateAdd(...)
> when BT.Column3 in ('value2', 'value3') then DateAdd(...)
> END,
> Duration =
> CASE
> when BT.Column3 = 'value1' then DateDiff(...)
> when BT.Column3 in ('value2', 'value3') then DateDiff(ss,
> BT.OrigTime, (select TOP 1 X.OrigTime from BigTable X where X.Column1 >
> BT.Column1 and X.Column3 <> 'value4' order by X.Column1 ))
> END,
> FROM
> BigTable BT where BT.Column3 = 'value1' OR (BT.Column3 in ('value2',
> 'value3') and BT.Column4 <> (select X.Column4 from BigTable X where
> X.Column1 = BT.Column1 and X.Column3 = 'Value1'))

It is very difficult with this very abstract representation of the
query to say that much. What strikes my eyes are the two correlated
subqueries. If value2 and value3 are frequent, these subqueries are
likely to be invoked many times.

It is often a good idea to replace correlated subqueries with derived
tables that you join to. But I cannot say whether this is possible. To
suggest a rewrite of the query I would need:

o CREATE TABLE statments (possibly simplified) for the table.
o INSERT statements with sample data, enough to demonstrate all
variations.
o The expected output for the sample.

For the performance point of view it would also be useful to know
whether the indexes you have are clustered or not.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

No comments:

Post a Comment