Wednesday, March 21, 2012

Help on analyzing rather large data set

This is posted to both newsgroups. I wasnt sure which one was more
appropriate.
I am new to SQL 2005 AS and I was looking for some guidance.
I am trying to do a proof of concept for one of our clients.
Company A has 60 million accounts in 2 divisions (C,D)
Company B has 100 million accounts in 1 division (E)
For each account, I have assigned a common individual id and household id
based on name/address matching across all data.
They want a cube that that allows one to analyze the overlap between the
companies (or divisions) at the account, individual, or household level.
For example, how many common households between divisions D and E.
Something that would produce the data for a Venn diagram. I cant seem to
produce something that the cube bowser could work with. Is it a matter of
my query...do i need to solve this with a mdx query?
This seems simple but ive had some trouble. The distinct count on the
different levels is a concern. Can you do this in one cube, or will this
require rolling the data at each level. (But that really seems to defeat the
purpose of using a cube). The second stage of this project is adding
other attributes that are account, individual, or household level. Some
attribute overlap across all data while others are specific to a club.
Also, im thinking about a fact table for each club...and full outer joins
betweeen them? Would that be more efficient?
I have attached a test script for an example for reference.
Thanks in advance,
JL
um, hello?
"JL Morrison" <sqlserverdba@.tampabay.rr.com> wrote in message
news:eCyD4C9aHHA.4520@.TK2MSFTNGP06.phx.gbl...
> This is posted to both newsgroups. I wasnt sure which one was more
> appropriate.
> I am new to SQL 2005 AS and I was looking for some guidance.
> I am trying to do a proof of concept for one of our clients.
> Company A has 60 million accounts in 2 divisions (C,D)
> Company B has 100 million accounts in 1 division (E)
> For each account, I have assigned a common individual id and household id
> based on name/address matching across all data.
> They want a cube that that allows one to analyze the overlap between the
> companies (or divisions) at the account, individual, or household level.
> For example, how many common households between divisions D and E.
> Something that would produce the data for a Venn diagram. I cant seem to
> produce something that the cube bowser could work with. Is it a matter of
> my query...do i need to solve this with a mdx query?
> This seems simple but ive had some trouble. The distinct count on the
> different levels is a concern. Can you do this in one cube, or will this
> require rolling the data at each level. (But that really seems to defeat
> the purpose of using a cube). The second stage of this project is
> adding other attributes that are account, individual, or household level.
> Some attribute overlap across all data while others are specific to a
> club.
> Also, im thinking about a fact table for each club...and full outer joins
> betweeen them? Would that be more efficient?
> I have attached a test script for an example for reference.
> Thanks in advance,
> JL
>
|||um, hello?
"JL Morrison" <sqlserverdba@.tampabay.rr.com> wrote in message
news:eCyD4C9aHHA.4520@.TK2MSFTNGP06.phx.gbl...
> This is posted to both newsgroups. I wasnt sure which one was more
> appropriate.
> I am new to SQL 2005 AS and I was looking for some guidance.
> I am trying to do a proof of concept for one of our clients.
> Company A has 60 million accounts in 2 divisions (C,D)
> Company B has 100 million accounts in 1 division (E)
> For each account, I have assigned a common individual id and household id
> based on name/address matching across all data.
> They want a cube that that allows one to analyze the overlap between the
> companies (or divisions) at the account, individual, or household level.
> For example, how many common households between divisions D and E.
> Something that would produce the data for a Venn diagram. I cant seem to
> produce something that the cube bowser could work with. Is it a matter of
> my query...do i need to solve this with a mdx query?
> This seems simple but ive had some trouble. The distinct count on the
> different levels is a concern. Can you do this in one cube, or will this
> require rolling the data at each level. (But that really seems to defeat
> the purpose of using a cube). The second stage of this project is
> adding other attributes that are account, individual, or household level.
> Some attribute overlap across all data while others are specific to a
> club.
> Also, im thinking about a fact table for each club...and full outer joins
> betweeen them? Would that be more efficient?
> I have attached a test script for an example for reference.
> Thanks in advance,
> JL
>

No comments:

Post a Comment