Saturday, February 25, 2012

Data Warehouse Estimation

Hi
Looking out for data size estimation, and server size estimation templates.
To start with data size estimation templates will also do.
Any help will be appreciated. The requirement is kind of urgent.
Regards,
Bharat
Hello Bharat,
I wouldn't have expected a warehouse estimation process to be any different
to a normal database.
I always find that each project requires its own version, a generic one doesn't
cut it.
Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons

> Hi
> Looking out for data size estimation, and server size estimation
> templates. To start with data size estimation templates will also do.
> Any help will be appreciated. The requirement is kind of urgent.
> Regards,
> Bharat

Data Warehouse Estimation

Hi
Looking out for data size estimation, and server size estimation templates.
To start with data size estimation templates will also do.
Any help will be appreciated. The requirement is kind of urgent.
Regards,
BharatHello Bharat,
I wouldn't have expected a warehouse estimation process to be any different
to a normal database.
I always find that each project requires its own version, a generic one does
n't
cut it.
Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons

> Hi
> Looking out for data size estimation, and server size estimation
> templates. To start with data size estimation templates will also do.
> Any help will be appreciated. The requirement is kind of urgent.
> Regards,
> Bharat

Data warehouse data refresh/update

Hello,
Im a member of a team planning a data warehousing project. We have multiple data sources which are aggregated in a staging area. This is then denormalised and imported into the datawarehouse database.
I am looking at ideas for incremental data refresh, rather than a drop and re-import of all data. This would allow us to have historic data.
Does anyone have any tips that might be helpful for detecting changes in the source data for import? We have had some bad experiences with triggers on our source database in the past, so would rather not use these. I have considered replication and log sh
ipping, but these just give a replica of the source data and does not flag the updated/new data.
Any help would be greatly appreciated.
Thanks.
Ben.
Look in SQL Books Online for CHECKSUM_AGG to identify changes in a table. The difficulty then is in trying to identify what has changed. Another route is to try and identify fields within the production data that will identify when it was last changed. Ty
pically these tend to be datetime, timestamp or rowversion data types.
If you keep production keys in your fact table as additional attributes then this will give you another option for identifying new data.
"Ben" wrote:

> Hello,
> Im a member of a team planning a data warehousing project. We have multiple data sources which are aggregated in a staging area. This is then denormalised and imported into the datawarehouse database.
> I am looking at ideas for incremental data refresh, rather than a drop and re-import of all data. This would allow us to have historic data.
> Does anyone have any tips that might be helpful for detecting changes in the source data for import? We have had some bad experiences with triggers on our source database in the past, so would rather not use these. I have considered replication and log
shipping, but these just give a replica of the source data and does not flag the updated/new data.
> Any help would be greatly appreciated.
> Thanks.
> Ben.
|||Timestamp your source data when it gets changed. That is definately the best way.
Rgards
Jamie
"Ben" wrote:

> Hello,
> Im a member of a team planning a data warehousing project. We have multiple data sources which are aggregated in a staging area. This is then denormalised and imported into the datawarehouse database.
> I am looking at ideas for incremental data refresh, rather than a drop and re-import of all data. This would allow us to have historic data.
> Does anyone have any tips that might be helpful for detecting changes in the source data for import? We have had some bad experiences with triggers on our source database in the past, so would rather not use these. I have considered replication and log
shipping, but these just give a replica of the source data and does not flag the updated/new data.
> Any help would be greatly appreciated.
> Thanks.
> Ben.

Data warehouse data refresh/update

Hello,
Im a member of a team planning a data warehousing project. We have multiple
data sources which are aggregated in a staging area. This is then denormalis
ed and imported into the datawarehouse database.
I am looking at ideas for incremental data refresh, rather than a drop and r
e-import of all data. This would allow us to have historic data.
Does anyone have any tips that might be helpful for detecting changes in the
source data for import? We have had some bad experiences with triggers on o
ur source database in the past, so would rather not use these. I have consid
ered replication and log sh
ipping, but these just give a replica of the source data and does not flag t
he updated/new data.
Any help would be greatly appreciated.
Thanks.
Ben.Look in SQL Books Online for CHECKSUM_AGG to identify changes in a table. Th
e difficulty then is in trying to identify what has changed. Another route i
s to try and identify fields within the production data that will identify w
hen it was last changed. Ty
pically these tend to be datetime, timestamp or rowversion data types.
If you keep production keys in your fact table as additional attributes then
this will give you another option for identifying new data.
"Ben" wrote:

> Hello,
> Im a member of a team planning a data warehousing project. We have multipl
e data sources which are aggregated in a staging area. This is then denormal
ised and imported into the datawarehouse database.
> I am looking at ideas for incremental data refresh, rather than a drop and
re-import of all data. This would allow us to have historic data.
> Does anyone have any tips that might be helpful for detecting changes in the sourc
e data for import? We have had some bad experiences with triggers on our source data
base in the past, so would rather not use these. I have considered replication and l
og
shipping, but these just give a replica of the source data and does not flag the updated/new
data.
> Any help would be greatly appreciated.
> Thanks.
> Ben.|||Timestamp your source data when it gets changed. That is definately the best
way.
Rgards
Jamie
"Ben" wrote:

> Hello,
> Im a member of a team planning a data warehousing project. We have multipl
e data sources which are aggregated in a staging area. This is then denormal
ised and imported into the datawarehouse database.
> I am looking at ideas for incremental data refresh, rather than a drop and
re-import of all data. This would allow us to have historic data.
> Does anyone have any tips that might be helpful for detecting changes in the sourc
e data for import? We have had some bad experiences with triggers on our source data
base in the past, so would rather not use these. I have considered replication and l
og
shipping, but these just give a replica of the source data and does not flag the updated/new
data.
> Any help would be greatly appreciated.
> Thanks.
> Ben.

Data Warehouse backup/recover

Hello, everyone:
Does any body have the experience to execute data warehouse backup/recover? What I want to know is how to backup/recover database in data warehouse and cubes.
Thanks.
ZYTdo you refer to facts/cubes that may get out of sync?
you might want to do the backup when there's no loading of facts/dimensions or have the loading processes do the cubes prior the loading of facts. Also, in case some batches are not processed 100% have some automated processes that check this and are able to repair (delete/reload batches).|||Thanks for reply. The difficult I have to face is I cannot connect database of OLAP,such as FoodMark2000, and cubes from backup box in EM. Any suggestions?

ZYT|||Do you get an error message?|||No. I mean I cannot find cube and database of DW in SQL Server Backup box.|||ok, so either no backup is being done at all or a third party tool is installed that does the backup. I hope this tool is able to connect to the sql-server though, otherwise the backups may not be so usefull afterall.

Data warehouse and a data base? difference?

I need someone to give me a small briefieng on how a enterprise data warehouse (EDW)
differs from a regular database?

Currently we have an appplication that accesses a database with about 18 tables. We also have
a Data Warehouse. For some reason I was thinking that it would be possibly to migrate the
database into the data warehouse. The reason is that in looking at the schematic design for
the data warehouse there are some data tables that could also be used by our application
that uses the DB.

I guess I am confused because I am not sure if a data warehouse is used in the same way
as a database?

Datawarehouse and Databases setup and uses differ. A database is traditionally used to store normalized data for a transactional based system. A datawarehouse is a way to abstract the database tables in such a way as to allow denormalization. Think of it this way: I would use a warehouse to store data in such a way as to query the data for reporting purposes. I would use a database for this reason because I would want some user or users running a 20 minute report on my core database. Hope this helps.

|||

Ok, thank you. I also found this info which is pretty good.

http://www.lc.leidenuniv.nl/awcourse/oracle/server.920/a96520/concept.htm

Data Warehouse & OLAP interview questions

Hi, I have a tech interview coming up for a team lead position for the data
warehouse/OLAP team. I was wondering if some of you wouldn't mind posting
technical questions (anything related to SQL Server/Data Warehouse/OLAP) for
me to use as practice.
Thanks.Assuming it's more "tech lead" than "manager lead" (based on your email I
assume you're not in management)
Question: What are the benefits and risks of co-hosting analysis services
and sql server RDBMS on the same OS image?
"Terry" <spankme@.nospam.com> wrote in message
news:ODd8YwAwDHA.620@.TK2MSFTNGP10.phx.gbl...
quote:

> Hi, I have a tech interview coming up for a team lead position for the

data
quote:

> warehouse/OLAP team. I was wondering if some of you wouldn't mind posting
> technical questions (anything related to SQL Server/Data Warehouse/OLAP)

for
quote:

> me to use as practice.
> Thanks.
>
|||Hi Terry,
Here's one of my favorite questions, because it can go on for as long as
someone wants to talk.
Name some performance aspects of bulk loading and what can be done to
improve the process.
Here's what I listen for:
Hardware Issues like RAM issues, Networking issues, Hard Drive issues, RAID
levels, CPU speeds, Bus speeds
Database Level Issues like Table locking, Column Sizes and Data Typing,
Batch sizes
Architecture Issues like Where the transforms happen (on the server or a
distributed architecture to use other CPUs), Parallelism issues
Of course, the lists go on and on, but that's the general idea. Much can be
gained from asking a question like this, because one can determine, almost
immediately, the type of experience a candidate has. If they are DBAs that
have quite a bit of hardware experience, they'll stick to answering with
improvements that can be made in hardware. Same goes for database
improvements and architecture improvements. If they happen to remark on all
three subject areas, they're probably well rounded enough to be able to
think about all three aspects simultaneously and can probably handle a much
greater load than someone that sticks to one area, or one that answers...I
don't know.
HTH, good luck.
Regards,
Don R. Watters
Data Group Manager
PhotoWorks, Inc.
"Terry" <spankme@.nospam.com> wrote in message
news:ODd8YwAwDHA.620@.TK2MSFTNGP10.phx.gbl...
quote:

> Hi, I have a tech interview coming up for a team lead position for the

data
quote:

> warehouse/OLAP team. I was wondering if some of you wouldn't mind posting
> technical questions (anything related to SQL Server/Data Warehouse/OLAP)

for
quote:

> me to use as practice.
> Thanks.
>
|||I would rephrase the term "bulk loading" (which has a very specific context
in SQL Server) to "ETL Processes"
And you seem to imply parallelism is a side issue in terms of performance.
It is the core of any high performance ETL process.
Kevin Connell, MCDBA
----
The views expressed here are my own
and not of my employer.
----
"Don R. Watters" <DonW@.PhotoWorks.Com> wrote in message
news:#M3nzfbyDHA.1668@.TK2MSFTNGP10.phx.gbl...
quote:

> Hi Terry,
> Here's one of my favorite questions, because it can go on for as long as
> someone wants to talk.
> Name some performance aspects of bulk loading and what can be done to
> improve the process.
> Here's what I listen for:
> hardware Issues like RAM issues, Networking issues, Hard Drive issues,

RAID
quote:

> levels, CPU speeds, Bus speeds
> Database Level Issues like Table locking, Column Sizes and Data Typing,
> Batch sizes
> Architecture Issues like Where the transforms happen (on the server or a
> distributed architecture to use other CPUs), Parallelism issues
> Of course, the lists go on and on, but that's the general idea. Much can

be
quote:

> gained from asking a question like this, because one can determine, almost
> immediately, the type of experience a candidate has. If they are DBAs

that
quote:

> have quite a bit of hardware experience, they'll stick to answering with
> improvements that can be made in hardware. Same goes for database
> improvements and architecture improvements. If they happen to remark on

all
quote:

> three subject areas, they're probably well rounded enough to be able to
> think about all three aspects simultaneously and can probably handle a

much
quote:

> greater load than someone that sticks to one area, or one that answers...I
> don't know.
> HTH, good luck.
> --
> Regards,
> Don R. Watters
> Data Group Manager
> PhotoWorks, Inc.
>
> "Terry" <spankme@.nospam.com> wrote in message
> news:ODd8YwAwDHA.620@.TK2MSFTNGP10.phx.gbl...
> data
posting[QUOTE]
> for
>
|||Hey Kevin,
Isn't this a SQL Server Newsgroup? The question is specific to SQL
Server.
Parallelism is certainly an issue if you're using DTS. If you don't
specifically set up the package to use parallelism then your only choice is
to run multiple packages at once to get parallel processes running. If you
have abstracted the data load in such a way that a single package is used to
load multiple files dynamically, then parallelism needs to be considered in
order to get the package running mutliple threads within the same package.
There are different degrees of how and where parallelism can and should be
used. Ensuring that the query processor is setup correctly to handle
parallel processing of the data load is also a consideration. One can also
have multiple servers running to get the data loaded across federated
servers. This is also considered to be parallelism.
One more point, just bringing up parallelism means that the candidate
understands that it's something that needs to be considered. It doesn't
mean that it's not part of the process. Every one of the things that I gave
as examples are needed for high performance systems. You can't have high
performance with 256 MB of RAM, nor can you have a high performance system
with a 5400 RPM ATA drive with no cache. Sure, you can get the job done,
but it won't be very performant. The same goes for parallel processing. I
defintely was not implying that parallelism was a side issue.
Regards,
Don R. Watters
Data Group Manager
PhotoWorks, Inc.
"Kevin" <ReplyTo@.Newsgroups.only> wrote in message
news:O6B5gbmyDHA.2064@.TK2MSFTNGP10.phx.gbl...
quote:

> I would rephrase the term "bulk loading" (which has a very specific

context
quote:

> in SQL Server) to "ETL Processes"
> And you seem to imply parallelism is a side issue in terms of performance.
> It is the core of any high performance ETL process.
> --
> Kevin Connell, MCDBA
> ----
> The views expressed here are my own
> and not of my employer.
> ----
> "Don R. Watters" <DonW@.PhotoWorks.Com> wrote in message
> news:#M3nzfbyDHA.1668@.TK2MSFTNGP10.phx.gbl...
> RAID
can[QUOTE]
> be
almost[QUOTE]
> that
> all
> much
answers...I[QUOTE]
> posting
Warehouse/OLAP)[QUOTE]
>
|||Don
Your post is slightly confusing
Most people unserstand the word "Parallelism" to mean the process by which
SQL Server spawns another thread on another CPU to better execute a Query.
I believe the way you are using it here is for "In parallel". DTS has not a
right to do with "Parellelism" SQL Server will decide that / or not. For
instance no amount of DTS jiggery pokery will create "Parllelism" on a 1 CPU
box.
For Parallelism there were problems in 7 whereby a query using it would
execute far slower. I have also seen this in 2000 and therefore set (MAXDOP
1) in my queries but much less so than in 7.
mk:@.MSITStore:C:\Program%20Files\Microso
ft%20SQL%20Server\80\Tools\Books\arc
hitec.chm::/8_ar_sa_163x.htm
Please forgive me if I have misread your post.
--
Allan Mitchell MCSE,MCDBA, (Microsoft SQL Server MVP)
www.allisonmitchell.com - Expert SQL Server Consultancy.
www.SQLDTS.com - The site for all your DTS needs.
I support PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org
"Don R. Watters" <DonW@.PhotoWorks.Com> wrote in message
news:excxFymyDHA.3220@.tk2msftngp13.phx.gbl...
quote:

> Hey Kevin,
> Isn't this a SQL Server Newsgroup? The question is specific to SQL
> Server.
> Parallelism is certainly an issue if you're using DTS. If you don't
> specifically set up the package to use parallelism then your only choice

is
quote:

> to run multiple packages at once to get parallel processes running. If

you
quote:

> have abstracted the data load in such a way that a single package is used

to
quote:

> load multiple files dynamically, then parallelism needs to be considered

in
quote:

> order to get the package running mutliple threads within the same package.
> There are different degrees of how and where parallelism can and should be
> used. Ensuring that the query processor is setup correctly to handle
> parallel processing of the data load is also a consideration. One can

also
quote:

> have multiple servers running to get the data loaded across federated
> servers. This is also considered to be parallelism.
> One more point, just bringing up parallelism means that the candidate
> understands that it's something that needs to be considered. It doesn't
> mean that it's not part of the process. Every one of the things that I

gave
quote:

> as examples are needed for high performance systems. You can't have high
> performance with 256 MB of RAM, nor can you have a high performance system
> with a 5400 RPM ATA drive with no cache. Sure, you can get the job done,
> but it won't be very performant. The same goes for parallel processing.

I
quote:

> defintely was not implying that parallelism was a side issue.
> --
> Regards,
> Don R. Watters
> Data Group Manager
> PhotoWorks, Inc.
>
> "Kevin" <ReplyTo@.Newsgroups.only> wrote in message
> news:O6B5gbmyDHA.2064@.TK2MSFTNGP10.phx.gbl...
> context
performance.[QUOTE]
as[QUOTE]
Typing,[QUOTE]
a[QUOTE]
> can
> almost
with[QUOTE]
on[QUOTE]
to[QUOTE]
> answers...I
the[QUOTE]
> Warehouse/OLAP)
>
|||Hi Allan,
I don't think it's confusing if you read and understand the first post.
I think it's a matter of semantics. Nothing in my original post that was
questioned by a subsequent reader pointed to any flaw of SQL Server, whereby
that flaw had to be considered. My original post said that one thing that I
listen for in the interview is for the interviewee to bring up parallelism.
Regardless of if they bring it up as a matter of running processes in
parallel or if they being it up as query parallelism, to me they're one in
the same. I need to hear from the interviewee that they understand the
complexity of dealing with SMP.
Regards,
Don R. Watters
Data Group Manager
PhotoWorks, Inc.
"Allan Mitchell" <allan@.no-spam.sqldts.com> wrote in message
news:%23beEqVn6DHA.2432@.TK2MSFTNGP10.phx.gbl...
quote:

> Don
> Your post is slightly confusing
> Most people unserstand the word "Parallelism" to mean the process by which
> SQL Server spawns another thread on another CPU to better execute a Query.
> I believe the way you are using it here is for "In parallel". DTS has not

a
quote:

> right to do with "Parellelism" SQL Server will decide that / or not. For
> instance no amount of DTS jiggery pokery will create "Parllelism" on a 1

CPU
quote:

> box.
> For Parallelism there were problems in 7 whereby a query using it would
> execute far slower. I have also seen this in 2000 and therefore set

(MAXDOP
quote:

> 1) in my queries but much less so than in 7.
>

mk:@.MSITStore:C:\Program%20Files\Microso
ft%20SQL%20Server\80\Tools\Books\arc[QUO
TE]
> hitec.chm::/8_ar_sa_163x.htm
> Please forgive me if I have misread your post.
>
> --
> --
> Allan Mitchell MCSE,MCDBA, (Microsoft SQL Server MVP)
> www.allisonmitchell.com - Expert SQL Server Consultancy.
> www.SQLDTS.com - The site for all your DTS needs.
> I support PASS - the definitive, global community
> for SQL Server professionals - http://www.sqlpass.org
>
> "Don R. Watters" <DonW@.PhotoWorks.Com> wrote in message
> news:excxFymyDHA.3220@.tk2msftngp13.phx.gbl...
> is
> you
used
quote:

> to
> in
package.[QUOTE]
be[QUOTE]
> also
candidate[QUOTE]
> gave
high[QUOTE]
system[QUOTE]
done,[QUOTE]
> I
> performance.
long[QUOTE]
> as
to[QUOTE]
issues,[QUOTE]
> Typing,
or[QUOTE]
> a
Much[QUOTE]
DBAs[QUOTE]
> with
remark[QUOTE]
> on
> to
a[QUOTE]
> the
>
|||OK
Semantics it is then.
I interpret Parallelism and "In Parallel" to be two different things
especially when tied with DTS.
--
Allan Mitchell MCSE,MCDBA, (Microsoft SQL Server MVP)
www.allisonmitchell.com - Expert SQL Server Consultancy.
www.SQLDTS.com - The site for all your DTS needs.
I support PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org
"Don R. Watters" <DonW@.Photoworks.com> wrote in message
news:uSv6hrn6DHA.1428@.TK2MSFTNGP12.phx.gbl...
quote:

> Hi Allan,
> I don't think it's confusing if you read and understand the first

post.
quote:

> I think it's a matter of semantics. Nothing in my original post that was
> questioned by a subsequent reader pointed to any flaw of SQL Server,

whereby
quote:

> that flaw had to be considered. My original post said that one thing that

I
quote:

> listen for in the interview is for the interviewee to bring up

parallelism.
quote:

> Regardless of if they bring it up as a matter of running processes in
> parallel or if they being it up as query parallelism, to me they're one in
> the same. I need to hear from the interviewee that they understand the
> complexity of dealing with SMP.
> --
> Regards,
> Don R. Watters
> Data Group Manager
> PhotoWorks, Inc.
>
> "Allan Mitchell" <allan@.no-spam.sqldts.com> wrote in message
> news:%23beEqVn6DHA.2432@.TK2MSFTNGP10.phx.gbl...
which[QUOTE]
Query.[QUOTE]
not[QUOTE]
> a
For[QUOTE]
> CPU
> (MAXDOP
>

mk:@.MSITStore:C:\Program%20Files\Microso
ft%20SQL%20Server\80\Tools\Books\arc[QUO
TE]
SQL
quote:

don't[QUOTE]
choice[QUOTE]
If[QUOTE]
> used
considered[QUOTE]
> package.
should[QUOTE]
> be
> candidate
doesn't[QUOTE]
I[QUOTE]
> high
> system
> done,
processing.[QUOTE]
> long
> to
> issues,
server[QUOTE]
> or
> Much
determine,[QUOTE]
> DBAs
answering[QUOTE]
> remark
able[QUOTE]
handle[QUOTE]
> a
for[QUOTE]
mind[QUOTE]
>