Showing posts with label analyzing. Show all posts
Showing posts with label analyzing. Show all posts

Wednesday, March 21, 2012

Help on analyzing rather large data set

underprocessableum, hello?
"JL Morrison" <sqlserverdba@.tampabay.rr.com> wrote in message
news:eCyD4C9aHHA.4520@.TK2MSFTNGP06.phx.gbl...
> This is posted to both newsgroups. I wasnt sure which one was more
> appropriate.
> I am new to SQL 2005 AS and I was looking for some guidance.
> I am trying to do a proof of concept for one of our clients.
> Company A has 60 million accounts in 2 divisions (C,D)
> Company B has 100 million accounts in 1 division (E)
> For each account, I have assigned a common individual id and household id
> based on name/address matching across all data.
> They want a cube that that allows one to analyze the overlap between the
> companies (or divisions) at the account, individual, or household level.
> For example, how many common households between divisions D and E.
> Something that would produce the data for a Venn diagram. I cant seem to
> produce something that the cube bowser could work with. Is it a matter of
> my query...do i need to solve this with a mdx query?
> This seems simple but ive had some trouble. The distinct count on the
> different levels is a concern. Can you do this in one cube, or will this
> require rolling the data at each level. (But that really seems to defeat
> the purpose of using a cube). The second stage of this project is
> adding other attributes that are account, individual, or household level.
> Some attribute overlap across all data while others are specific to a
> club.
> Also, im thinking about a fact table for each club...and full outer joins
> betweeen them? Would that be more efficient?
> I have attached a test script for an example for reference.
> Thanks in advance,
> JL
>|||um, hello?
"JL Morrison" <sqlserverdba@.tampabay.rr.com> wrote in message
news:eCyD4C9aHHA.4520@.TK2MSFTNGP06.phx.gbl...
> This is posted to both newsgroups. I wasnt sure which one was more
> appropriate.
> I am new to SQL 2005 AS and I was looking for some guidance.
> I am trying to do a proof of concept for one of our clients.
> Company A has 60 million accounts in 2 divisions (C,D)
> Company B has 100 million accounts in 1 division (E)
> For each account, I have assigned a common individual id and household id
> based on name/address matching across all data.
> They want a cube that that allows one to analyze the overlap between the
> companies (or divisions) at the account, individual, or household level.
> For example, how many common households between divisions D and E.
> Something that would produce the data for a Venn diagram. I cant seem to
> produce something that the cube bowser could work with. Is it a matter of
> my query...do i need to solve this with a mdx query?
> This seems simple but ive had some trouble. The distinct count on the
> different levels is a concern. Can you do this in one cube, or will this
> require rolling the data at each level. (But that really seems to defeat
> the purpose of using a cube). The second stage of this project is
> adding other attributes that are account, individual, or household level.
> Some attribute overlap across all data while others are specific to a
> club.
> Also, im thinking about a fact table for each club...and full outer joins
> betweeen them? Would that be more efficient?
> I have attached a test script for an example for reference.
> Thanks in advance,
> JL
>sql

Help on analyzing rather large data set

This is posted to both newsgroups. I wasnt sure which one was more
appropriate.
I am new to SQL 2005 AS and I was looking for some guidance.
I am trying to do a proof of concept for one of our clients.
Company A has 60 million accounts in 2 divisions (C,D)
Company B has 100 million accounts in 1 division (E)
For each account, I have assigned a common individual id and household id
based on name/address matching across all data.
They want a cube that that allows one to analyze the overlap between the
companies (or divisions) at the account, individual, or household level.
For example, how many common households between divisions D and E.
Something that would produce the data for a Venn diagram. I cant seem to
produce something that the cube bowser could work with. Is it a matter of
my query...do i need to solve this with a mdx query?
This seems simple but ive had some trouble. The distinct count on the
different levels is a concern. Can you do this in one cube, or will this
require rolling the data at each level. (But that really seems to defeat the
purpose of using a cube). The second stage of this project is adding
other attributes that are account, individual, or household level. Some
attribute overlap across all data while others are specific to a club.
Also, im thinking about a fact table for each club...and full outer joins
betweeen them? Would that be more efficient?
I have attached a test script for an example for reference.
Thanks in advance,
JL
um, hello?
"JL Morrison" <sqlserverdba@.tampabay.rr.com> wrote in message
news:eCyD4C9aHHA.4520@.TK2MSFTNGP06.phx.gbl...
> This is posted to both newsgroups. I wasnt sure which one was more
> appropriate.
> I am new to SQL 2005 AS and I was looking for some guidance.
> I am trying to do a proof of concept for one of our clients.
> Company A has 60 million accounts in 2 divisions (C,D)
> Company B has 100 million accounts in 1 division (E)
> For each account, I have assigned a common individual id and household id
> based on name/address matching across all data.
> They want a cube that that allows one to analyze the overlap between the
> companies (or divisions) at the account, individual, or household level.
> For example, how many common households between divisions D and E.
> Something that would produce the data for a Venn diagram. I cant seem to
> produce something that the cube bowser could work with. Is it a matter of
> my query...do i need to solve this with a mdx query?
> This seems simple but ive had some trouble. The distinct count on the
> different levels is a concern. Can you do this in one cube, or will this
> require rolling the data at each level. (But that really seems to defeat
> the purpose of using a cube). The second stage of this project is
> adding other attributes that are account, individual, or household level.
> Some attribute overlap across all data while others are specific to a
> club.
> Also, im thinking about a fact table for each club...and full outer joins
> betweeen them? Would that be more efficient?
> I have attached a test script for an example for reference.
> Thanks in advance,
> JL
>
|||um, hello?
"JL Morrison" <sqlserverdba@.tampabay.rr.com> wrote in message
news:eCyD4C9aHHA.4520@.TK2MSFTNGP06.phx.gbl...
> This is posted to both newsgroups. I wasnt sure which one was more
> appropriate.
> I am new to SQL 2005 AS and I was looking for some guidance.
> I am trying to do a proof of concept for one of our clients.
> Company A has 60 million accounts in 2 divisions (C,D)
> Company B has 100 million accounts in 1 division (E)
> For each account, I have assigned a common individual id and household id
> based on name/address matching across all data.
> They want a cube that that allows one to analyze the overlap between the
> companies (or divisions) at the account, individual, or household level.
> For example, how many common households between divisions D and E.
> Something that would produce the data for a Venn diagram. I cant seem to
> produce something that the cube bowser could work with. Is it a matter of
> my query...do i need to solve this with a mdx query?
> This seems simple but ive had some trouble. The distinct count on the
> different levels is a concern. Can you do this in one cube, or will this
> require rolling the data at each level. (But that really seems to defeat
> the purpose of using a cube). The second stage of this project is
> adding other attributes that are account, individual, or household level.
> Some attribute overlap across all data while others are specific to a
> club.
> Also, im thinking about a fact table for each club...and full outer joins
> betweeen them? Would that be more efficient?
> I have attached a test script for an example for reference.
> Thanks in advance,
> JL
>