Saturday, February 25, 2012

Data warehouse question

Hi,
I am building the data warehouse in SQL 2005 and have the following
questions want to clarify.
1. Any method and suggestion for incremental loading for data warehouse? If
I create a field in each record in my source table and select the one havent
been loaded in ETL, is it a method or any other better suggestion?
2. If there are schema changes in the source table and the data warehouse,
is it necessary (or best practise) to reload all data?
Thanks a lot.
Ryan
Ryan:
I am stumbling through my ETL systems initial design and have had to work
this one out as well. Take this following bit of free advice with caution:
I created a tracking table in a separate db with TableName and LastLoadDate
columns. After runnin a package that is my "Initial Load" package (ie:
truncating the whole Fact table), it runs a SQL task that populates the
LastLoadDate in the corresponding record.
For Incremental Loads, I run two data flows, one for 'changed' rows and one
for 'new' rows:
"Changed": SELECT * FROM x WHERE [created] < LastLoadDate and [Modified] >
LastLoadDate. That goes into a staging table. At the end of the package,
another SQL task runs a simple UPDATE query that updates the FACT table from
the statging table.
"NEW ROWS" SELECT * FROM x WHERE [created] > LastLoadDate. That data flow
gets appended directly to the FACT table.
Check out books by Ralph Kimball on Microsoft Data Warehouse Design.
HTH
Todd Chittenden
"Ryan" wrote:

> Hi,
> I am building the data warehouse in SQL 2005 and have the following
> questions want to clarify.
> 1. Any method and suggestion for incremental loading for data warehouse? If
> I create a field in each record in my source table and select the one havent
> been loaded in ETL, is it a method or any other better suggestion?
> 2. If there are schema changes in the source table and the data warehouse,
> is it necessary (or best practise) to reload all data?
> Thanks a lot.
> Ryan

No comments:

Post a Comment