Showing posts with label optimizing. Show all posts
Showing posts with label optimizing. Show all posts

Friday, March 30, 2012

Help optimizing query for large data set

Hi all... I'm hoping someone can help me with optimizing a query that is
taking a really long time because the data set is so large... Basically I
have a large table and I need to select on some date ranges.
Will adding an index for a datetime column help performance for queries of
the form (dtCol >= DATE1) and (dtCol < DATE2)?
-mdbOn Fri, 13 Jul 2007 14:11:21 -0700, Michael Bray
<mbrayATctiusaDOTcom@.you.figure.it.out.com> wrote:

>Hi all... I'm hoping someone can help me with optimizing a query that is
>taking a really long time because the data set is so large... Basically I
>have a large table and I need to select on some date ranges.
>Will adding an index for a datetime column help performance for queries of
>the form (dtCol >= DATE1) and (dtCol < DATE2)?
>-mdb
The answer is the most common one here: It depends.
If the index were clustered on the date then it certainly will help.
But I would not rush to cluster on the date, the choice of a proper
clustered index is not that simple.
If the date ranges are very narrow compared to the overall table then
a non-clustered index on the date may very well be chosen by the
optimizer and save time. The optimizer is going to estimate the
number of pages it will have to read in random order to use the index,
and compare that to the number of pages it will have to read to scan
the entire table. The smaller number wins.
Without more information on the table, data and query I don't know
what else to say.
Roy Harvey
Beacon Falls, CT|||Roy Harvey <roy_harvey@.snet.net> wrote in
news:k4uf935qq5jqltamrffo96mgsemsnojorh@.
4ax.com:

> If the index were clustered on the date then it certainly will help.
> But I would not rush to cluster on the date, the choice of a proper
> clustered index is not that simple.
> If the date ranges are very narrow compared to the overall table then
> a non-clustered index on the date may very well be chosen by the
> optimizer and save time. The optimizer is going to estimate the
> number of pages it will have to read in random order to use the index,
> and compare that to the number of pages it will have to read to scan
> the entire table. The smaller number wins.
> Without more information on the table, data and query I don't know
> what else to say.
>
Well I can tell you that the dates are very evenly spread out. Each row is
an entry from about 200 different clients where each event from a client is
separated by about 3-4 minutes, with very little variation. There is an
index on another column (an integer) and the query is selecting values from
that table where the int value equals a value and the date column is
between a range, and is inner joining that to 3 other tables through a
fourth common table (i.e. A-Z-B, A-Z-C, A-Z-D). Does that alter any of
what you said?
It sounds as if a clustered index would help in this case. Do you agree?
-mdb|||I really can't tell from the information available.
If you post the table definitions, including keys and indexes, as well
as sizes and the query that is performing poorly someone might be able
to give meaningful advice.
Roy Harvey
Beacon Falls, CT
On Mon, 16 Jul 2007 06:35:38 -0700, Michael Bray
<mbrayATctiusaDOTcom@.you.figure.it.out.com> wrote:

>Roy Harvey <roy_harvey@.snet.net> wrote in
> news:k4uf935qq5jqltamrffo96mgsemsnojorh@.
4ax.com:
>
>Well I can tell you that the dates are very evenly spread out. Each row is
>an entry from about 200 different clients where each event from a client is
>separated by about 3-4 minutes, with very little variation. There is an
>index on another column (an integer) and the query is selecting values from
>that table where the int value equals a value and the date column is
>between a range, and is inner joining that to 3 other tables through a
>fourth common table (i.e. A-Z-B, A-Z-C, A-Z-D). Does that alter any of
>what you said?
>It sounds as if a clustered index would help in this case. Do you agree?
>-mdb

Help optimizing query for large data set

Hi all... I'm hoping someone can help me with optimizing a query that is
taking a really long time because the data set is so large... Basically I
have a large table and I need to select on some date ranges.
Will adding an index for a datetime column help performance for queries of
the form (dtCol >= DATE1) and (dtCol < DATE2)?
-mdbOn Fri, 13 Jul 2007 14:11:21 -0700, Michael Bray
<mbrayATctiusaDOTcom@.you.figure.it.out.com> wrote:
>Hi all... I'm hoping someone can help me with optimizing a query that is
>taking a really long time because the data set is so large... Basically I
>have a large table and I need to select on some date ranges.
>Will adding an index for a datetime column help performance for queries of
>the form (dtCol >= DATE1) and (dtCol < DATE2)?
>-mdb
The answer is the most common one here: It depends.
If the index were clustered on the date then it certainly will help.
But I would not rush to cluster on the date, the choice of a proper
clustered index is not that simple.
If the date ranges are very narrow compared to the overall table then
a non-clustered index on the date may very well be chosen by the
optimizer and save time. The optimizer is going to estimate the
number of pages it will have to read in random order to use the index,
and compare that to the number of pages it will have to read to scan
the entire table. The smaller number wins.
Without more information on the table, data and query I don't know
what else to say.
Roy Harvey
Beacon Falls, CT|||Roy Harvey <roy_harvey@.snet.net> wrote in
news:k4uf935qq5jqltamrffo96mgsemsnojorh@.4ax.com:
> If the index were clustered on the date then it certainly will help.
> But I would not rush to cluster on the date, the choice of a proper
> clustered index is not that simple.
> If the date ranges are very narrow compared to the overall table then
> a non-clustered index on the date may very well be chosen by the
> optimizer and save time. The optimizer is going to estimate the
> number of pages it will have to read in random order to use the index,
> and compare that to the number of pages it will have to read to scan
> the entire table. The smaller number wins.
> Without more information on the table, data and query I don't know
> what else to say.
>
Well I can tell you that the dates are very evenly spread out. Each row is
an entry from about 200 different clients where each event from a client is
separated by about 3-4 minutes, with very little variation. There is an
index on another column (an integer) and the query is selecting values from
that table where the int value equals a value and the date column is
between a range, and is inner joining that to 3 other tables through a
fourth common table (i.e. A-Z-B, A-Z-C, A-Z-D). Does that alter any of
what you said?
It sounds as if a clustered index would help in this case. Do you agree?
-mdb|||I really can't tell from the information available.
If you post the table definitions, including keys and indexes, as well
as sizes and the query that is performing poorly someone might be able
to give meaningful advice.
Roy Harvey
Beacon Falls, CT
On Mon, 16 Jul 2007 06:35:38 -0700, Michael Bray
<mbrayATctiusaDOTcom@.you.figure.it.out.com> wrote:
>Roy Harvey <roy_harvey@.snet.net> wrote in
>news:k4uf935qq5jqltamrffo96mgsemsnojorh@.4ax.com:
>> If the index were clustered on the date then it certainly will help.
>> But I would not rush to cluster on the date, the choice of a proper
>> clustered index is not that simple.
>> If the date ranges are very narrow compared to the overall table then
>> a non-clustered index on the date may very well be chosen by the
>> optimizer and save time. The optimizer is going to estimate the
>> number of pages it will have to read in random order to use the index,
>> and compare that to the number of pages it will have to read to scan
>> the entire table. The smaller number wins.
>> Without more information on the table, data and query I don't know
>> what else to say.
>Well I can tell you that the dates are very evenly spread out. Each row is
>an entry from about 200 different clients where each event from a client is
>separated by about 3-4 minutes, with very little variation. There is an
>index on another column (an integer) and the query is selecting values from
>that table where the int value equals a value and the date column is
>between a range, and is inner joining that to 3 other tables through a
>fourth common table (i.e. A-Z-B, A-Z-C, A-Z-D). Does that alter any of
>what you said?
>It sounds as if a clustered index would help in this case. Do you agree?
>-mdb

Help optimizing query for large data set

Hi all... I'm hoping someone can help me with optimizing a query that is
taking a really long time because the data set is so large... Basically I
have a large table and I need to select on some date ranges.
Will adding an index for a datetime column help performance for queries of
the form (dtCol >= DATE1) and (dtCol < DATE2)?
-mdb
On Fri, 13 Jul 2007 14:11:21 -0700, Michael Bray
<mbrayATctiusaDOTcom@.you.figure.it.out.com> wrote:

>Hi all... I'm hoping someone can help me with optimizing a query that is
>taking a really long time because the data set is so large... Basically I
>have a large table and I need to select on some date ranges.
>Will adding an index for a datetime column help performance for queries of
>the form (dtCol >= DATE1) and (dtCol < DATE2)?
>-mdb
The answer is the most common one here: It depends.
If the index were clustered on the date then it certainly will help.
But I would not rush to cluster on the date, the choice of a proper
clustered index is not that simple.
If the date ranges are very narrow compared to the overall table then
a non-clustered index on the date may very well be chosen by the
optimizer and save time. The optimizer is going to estimate the
number of pages it will have to read in random order to use the index,
and compare that to the number of pages it will have to read to scan
the entire table. The smaller number wins.
Without more information on the table, data and query I don't know
what else to say.
Roy Harvey
Beacon Falls, CT
|||Roy Harvey <roy_harvey@.snet.net> wrote in
news:k4uf935qq5jqltamrffo96mgsemsnojorh@.4ax.com:

> If the index were clustered on the date then it certainly will help.
> But I would not rush to cluster on the date, the choice of a proper
> clustered index is not that simple.
> If the date ranges are very narrow compared to the overall table then
> a non-clustered index on the date may very well be chosen by the
> optimizer and save time. The optimizer is going to estimate the
> number of pages it will have to read in random order to use the index,
> and compare that to the number of pages it will have to read to scan
> the entire table. The smaller number wins.
> Without more information on the table, data and query I don't know
> what else to say.
>
Well I can tell you that the dates are very evenly spread out. Each row is
an entry from about 200 different clients where each event from a client is
separated by about 3-4 minutes, with very little variation. There is an
index on another column (an integer) and the query is selecting values from
that table where the int value equals a value and the date column is
between a range, and is inner joining that to 3 other tables through a
fourth common table (i.e. A-Z-B, A-Z-C, A-Z-D). Does that alter any of
what you said?
It sounds as if a clustered index would help in this case. Do you agree?
-mdb
|||I really can't tell from the information available.
If you post the table definitions, including keys and indexes, as well
as sizes and the query that is performing poorly someone might be able
to give meaningful advice.
Roy Harvey
Beacon Falls, CT
On Mon, 16 Jul 2007 06:35:38 -0700, Michael Bray
<mbrayATctiusaDOTcom@.you.figure.it.out.com> wrote:

>Roy Harvey <roy_harvey@.snet.net> wrote in
>news:k4uf935qq5jqltamrffo96mgsemsnojorh@.4ax.com :
>
>Well I can tell you that the dates are very evenly spread out. Each row is
>an entry from about 200 different clients where each event from a client is
>separated by about 3-4 minutes, with very little variation. There is an
>index on another column (an integer) and the query is selecting values from
>that table where the int value equals a value and the date column is
>between a range, and is inner joining that to 3 other tables through a
>fourth common table (i.e. A-Z-B, A-Z-C, A-Z-D). Does that alter any of
>what you said?
>It sounds as if a clustered index would help in this case. Do you agree?
>-mdb

help optimizing query

I have a query that is taking about 12 minutes to run. I'm sure there has to be a way to speed it up...i'm just not sure how. any suggestions on how to rewrite this would be much appreciated. i think it has something to do with the "not in" join.

Select
Distinct T.TSNUMB
from
TSR T
where
T.TSSTAT='C'
and T.TSCDAT >= '01-Oct-03'
and T.TSCDAT <= sysdate
and T.TSORGA in (Select distinct O.ORORGA
from OrgCode O
where
O.ORTCON like '123456%'
and T.TSNUMB not in (select distinct w.wcnumb
from
wobaskdtl w
where w.wcbask like '%-V'))You could try using NOT EXISTS instead of NOT IN:

Select
Distinct T.TSNUMB
from
TSR T
where
T.TSSTAT='C'
and T.TSCDAT >= '01-Oct-03'
and T.TSCDAT <= sysdate
and T.TSORGA in (Select distinct O.ORORGA
from OrgCode O
where
O.ORTCON like '123456%'
and not exists (select null
from
wobaskdtl w
where w.wcbask like '%-V' and w.wcnumb = t.tsnumb))

Also make sure your tables have been analyzed and the statistics are not stale (I'm inferring this is Oracle?)|||thank you...not sure i completely understand what is going on there (have to sit down and really look at it) but it works.sql

Sunday, February 19, 2012

help me in optimizing this query

Hi,

I have a result set like this ..

ACCOUNTS DATEBILLED DUEDATE TOTALBALANCE PRIORBALANCE NEWBALANCE PAYMENTAMOUNT PAYNC
-- - - - -
2001 05/24/2007 06/21/2007 826.88 826.88 0.00 0.00 0.00
2001 05/24/2007 06/22/2007 826.88 826.88 0.00 0.00 0.00
2001 05/24/2007 06/23/2007 826.88 826.88 0.00 0.00 0.00

2002 05/24/2007 06/22/2007 826.88 826.88 0.00 0.00 0.00
2002 05/24/2007 06/24/2007 826.88 826.88 0.00 0.00 0.00

and the query is this..


select A.AccountNo as Accounts, A.DATEBILLED as DATEBILLED, A.DUEDATE as DUEDATE, A.TOTALBALANCE, A.PRIORBALANCE,(A.TOTALBALANCE.PRIORBALANCE) as NEWBALANCE,
A.PAYMENTAMOUNT,
(case when (A.PAYMENTAMOUNT-A.PRIORBALANCE) > 0 then A.PAYMENTAMOUNT - A.PRIORBALANCE else 0 end) as PAYNC
from ACCTPAID A , ACCTTYPE B
where A.DUEDATE BETWEEN '2007-06-19' AND '2007-06-30' and (A.TOTALBALANCE - A.PRIORBALANCE) >=0
and A.ACCOUNTNO = B.ACCOUNTNO
and A.duedate=(select max(duedate) from ACCTPAID
where ACCTPAID.accountno = A.accountno)

It is taking like 30 minutes to execute ONLY because of the query "select max(duedate) from ACCTPAID where ACCTPAID.accountno = A.accountno" without that it is taking like 2 minutes..

Is there a way to reduce this so that i can pick only the maximum due dates for each account..

Any help would be greatly appreciated

Thanks,
Praveen

How can we help if we have no idea about those table's estructure. Can you provide DDL, including constraints and indexes, sample data and expected result?

AMB

|||

Please find the DDLS for the two tables. there are no indexes....

ACCTPAID


Column Type
name name Length Scale Nulls
-- --
ACCOUNTNO DECIMAL 13 0 No
DATEBILLED DATE 4 0 No
DUEDATE DATE 4 0 No
TOTALBALANCE DECIMAL 13 2 No
PRIORBALANCE DECIMAL 13 2 No
PAYMENTAMOUNT DECIMAL 13 2 No

ACCTTYPE

Column Type
name name Length Scale Nulls
-- --
ACCOUNTNO DECIMAL 13 0 No
DIVISIONNO DECIMAL 2 0 No
ACCTTYPE CHARACTER 1 0 No

Thanks,

Praveen Raju

|||

It would help us better assist you if you could include table DDL, query strategy used so far, sample data in the form of INSERT statements, and an illustration of the desired results

Click here for information that will explain how to prepare your question so that the 'volunteers' here can optimally spend their time assisting you.

|||

Try this

Code Snippet

Select

A.AccountNo as Accounts

, A.DATEBILLED as DATEBILLED

, A.DUEDATE as DUEDATE

, A.TOTALBALANCE

, A.PRIORBALANCE

,(A.TOTALBALANCE.PRIORBALANCE) as NEWBALANCE

, A.PAYMENTAMOUNT

,(case when (A.PAYMENTAMOUNT-A.PRIORBALANCE) > 0 then

A.PAYMENTAMOUNT - A.PRIORBALANCE

else 0

end) as PAYNC

From

ACCTPAID A

Join ACCTTYPE B On A.ACCOUNTNO = B.ACCOUNTNO

Where

A.TOTALBALANCE >= A.PRIORBALANCE

and DUEDATE = (Select max(duedate) duedate from ACCTPAID as MaxDate

Where MaxDate.accountno=A.accountno And DUEDATE Between '2007-06-19' AND '2007-06-30')

|||

What about primary key and foreign key constraints?

Based in the query, I will suggest to create indexes in both tables by [ACCOUNTNO], and also a clustered index on table [ACCTPAID] by [DUEDATE]. Try with these changes to see if the performance improves. Post back the execution plan.

AMB

|||

Hi Manivannan,

Thanks for your query...by changing the position of the "DUEDATE Between '2007-06-19' AND '2007-06-30'" from the initial to the last line, the query execution time reduced from 15 minutes to less than a minute...

I dont know why but this has a great help from you and thanks a lot...

Praveen Raju