Friday, March 30, 2012

Help optimizing query for large data set

Hi all... I'm hoping someone can help me with optimizing a query that is
taking a really long time because the data set is so large... Basically I
have a large table and I need to select on some date ranges.
Will adding an index for a datetime column help performance for queries of
the form (dtCol >= DATE1) and (dtCol < DATE2)?
-mdbOn Fri, 13 Jul 2007 14:11:21 -0700, Michael Bray
<mbrayATctiusaDOTcom@.you.figure.it.out.com> wrote:
>Hi all... I'm hoping someone can help me with optimizing a query that is
>taking a really long time because the data set is so large... Basically I
>have a large table and I need to select on some date ranges.
>Will adding an index for a datetime column help performance for queries of
>the form (dtCol >= DATE1) and (dtCol < DATE2)?
>-mdb
The answer is the most common one here: It depends.
If the index were clustered on the date then it certainly will help.
But I would not rush to cluster on the date, the choice of a proper
clustered index is not that simple.
If the date ranges are very narrow compared to the overall table then
a non-clustered index on the date may very well be chosen by the
optimizer and save time. The optimizer is going to estimate the
number of pages it will have to read in random order to use the index,
and compare that to the number of pages it will have to read to scan
the entire table. The smaller number wins.
Without more information on the table, data and query I don't know
what else to say.
Roy Harvey
Beacon Falls, CT|||Roy Harvey <roy_harvey@.snet.net> wrote in
news:k4uf935qq5jqltamrffo96mgsemsnojorh@.4ax.com:
> If the index were clustered on the date then it certainly will help.
> But I would not rush to cluster on the date, the choice of a proper
> clustered index is not that simple.
> If the date ranges are very narrow compared to the overall table then
> a non-clustered index on the date may very well be chosen by the
> optimizer and save time. The optimizer is going to estimate the
> number of pages it will have to read in random order to use the index,
> and compare that to the number of pages it will have to read to scan
> the entire table. The smaller number wins.
> Without more information on the table, data and query I don't know
> what else to say.
>
Well I can tell you that the dates are very evenly spread out. Each row is
an entry from about 200 different clients where each event from a client is
separated by about 3-4 minutes, with very little variation. There is an
index on another column (an integer) and the query is selecting values from
that table where the int value equals a value and the date column is
between a range, and is inner joining that to 3 other tables through a
fourth common table (i.e. A-Z-B, A-Z-C, A-Z-D). Does that alter any of
what you said?
It sounds as if a clustered index would help in this case. Do you agree?
-mdb|||I really can't tell from the information available.
If you post the table definitions, including keys and indexes, as well
as sizes and the query that is performing poorly someone might be able
to give meaningful advice.
Roy Harvey
Beacon Falls, CT
On Mon, 16 Jul 2007 06:35:38 -0700, Michael Bray
<mbrayATctiusaDOTcom@.you.figure.it.out.com> wrote:
>Roy Harvey <roy_harvey@.snet.net> wrote in
>news:k4uf935qq5jqltamrffo96mgsemsnojorh@.4ax.com:
>> If the index were clustered on the date then it certainly will help.
>> But I would not rush to cluster on the date, the choice of a proper
>> clustered index is not that simple.
>> If the date ranges are very narrow compared to the overall table then
>> a non-clustered index on the date may very well be chosen by the
>> optimizer and save time. The optimizer is going to estimate the
>> number of pages it will have to read in random order to use the index,
>> and compare that to the number of pages it will have to read to scan
>> the entire table. The smaller number wins.
>> Without more information on the table, data and query I don't know
>> what else to say.
>Well I can tell you that the dates are very evenly spread out. Each row is
>an entry from about 200 different clients where each event from a client is
>separated by about 3-4 minutes, with very little variation. There is an
>index on another column (an integer) and the query is selecting values from
>that table where the int value equals a value and the date column is
>between a range, and is inner joining that to 3 other tables through a
>fourth common table (i.e. A-Z-B, A-Z-C, A-Z-D). Does that alter any of
>what you said?
>It sounds as if a clustered index would help in this case. Do you agree?
>-mdb

No comments:

Post a Comment