Monday, March 12, 2012

Help needed to improve the performance of the query

Hello,
I have the following setup and I would appreciate any help in improving
the performance of the query.

BigTable:
Column1 (indexed)
Column2 (indexed)
Column3 (no index)
Column4 (no index)

select
[time] =
CASE
when BT.Column3 = 'value1' then DateAdd(...)
when BT.Column3 in ('value2', 'value3') then DateAdd(...)
END,
Duration =
CASE
when BT.Column3 = 'value1' then DateDiff(...)
when BT.Column3 in ('value2', 'value3') then DateDiff(ss,
BT.OrigTime, (select TOP 1 X.OrigTime from BigTable X where X.Column1 >
BT.Column1 and X.Column3 <> 'value4' order by X.Column1 ))
END,

FROM
BigTable BT where BT.Column3 = 'value1' OR (BT.Column3 in ('value2',
'value3') and BT.Column4 <> (select X.Column4 from BigTable X where
X.Column1 = BT.Column1 and X.Column3 = 'Value1'))

Apart from the above mentioned, there are a few more columns which are
just a part of select statement and are not in any condition statments.

The BigTable has around 1 Mil records and the response time is very
poor, it takes around 3 mins to retrieve the records (which would be
around 500K)

With the Statistics ON,
I get the following:

Table 'BigTable'. Scan count 2, logical reads 44184, physical reads 0,
read-ahead reads 0.
Table 'WorkTable'. Scan count 541221, logical reads 4873218, physical
reads 0, read-ahead reads 0.

Is there any way to increase the performance, so that I can get the
result under 1 minute?
Any help would be appreciated.

P.S: I tried indexing the Column3, but no improvement.

-SRwhoops this might not owork. let me test.|||OK. Instead of cyling through your entire big table in you your where clause for every record you should do something like this. Mine is a little more dumbed down but you should get the concept

CREATE TABLE test2
(
[ID] int IDENTITY,
col1 int,
col2 int,
col3 int
)

INSERT INTO test2(col1,col2,col3)
VALUES (1,2,3)

INSERT INTO test2(col1,col2,col3)
VALUES (3,2,1)

INSERT INTO test2(col1,col2,col3)
VALUES (2,3,3)

SELECT A.col1,A.col2,A.col3
FROM test2 A
LEFT OUTER JOIN test2 B
ON A.[ID] = B.[ID]
WHERE A.col2 <> B.col3

I hate subqueries in the where clause.|||Is Column1 the primary key of the big table?|||Thanks a lot for your reply.
The problem is I cannot change the database (I cannot add new tables)
I have to define a view to use the existing tables.

I have created a view using the select statement mentioned in the first post, and doing a select on view.

OK. Instead of cyling through your entire big table in you your where clause for every record you should do something like this. Mine is a little more dumbed down but you should get the concept

CREATE TABLE test2
(
[ID] int IDENTITY,
col1 int,
col2 int,
col3 int
)

INSERT INTO test2(col1,col2,col3)
VALUES (1,2,3)

INSERT INTO test2(col1,col2,col3)
VALUES (3,2,1)

INSERT INTO test2(col1,col2,col3)
VALUES (2,3,3)

SELECT A.col1,A.col2,A.col3
FROM test2 A
LEFT OUTER JOIN test2 B
ON A.[ID] = B.[ID]
WHERE A.col2 <> B.col3

I hate subqueries in the where clause.|||Nope,
Column2 is the primary key.

Is Column1 the primary key of the big table?|||I did'nt tell you to create any new tables. This was a demonstration of recursive querying on the same table without doing a subquery against a million records in the where clause (which I think is the source of your performance issue). I did'nt have time to rewrite your whole thing. Sorry.

If you take a look at the example again I am joining the table to itself by using aliases.|||I did'nt tell you to create any new tables. This was a demonstration of recursive querying on the same table without doing a subquery against a million records in the where clause (which I think is the source of your performance issue). I did'nt have time to rewrite your whole thing. Sorry.

If you take a look at the example again I am joining the table to itself by using aliases.
My bad. Didn't read it completely.
I put the left outer join as you mentioned, but haven't seen any improvement.|||What indices do you have on this table? I would create seperate indices for columns 1,3, and for because they are all being searched. Perhaps a clustered index on the column with the highest number of distinct values and nonclustered on the other 2.

Have you looked at the execution? Where are your table scans occurring? I do not see why you have column2 indexed. I do not see where it is joined on or searched.

No comments:

Post a Comment