Saturday, February 25, 2012

Data warehouse data refresh/update

Hello,
Im a member of a team planning a data warehousing project. We have multiple
data sources which are aggregated in a staging area. This is then denormalis
ed and imported into the datawarehouse database.
I am looking at ideas for incremental data refresh, rather than a drop and r
e-import of all data. This would allow us to have historic data.
Does anyone have any tips that might be helpful for detecting changes in the
source data for import? We have had some bad experiences with triggers on o
ur source database in the past, so would rather not use these. I have consid
ered replication and log sh
ipping, but these just give a replica of the source data and does not flag t
he updated/new data.
Any help would be greatly appreciated.
Thanks.
Ben.Look in SQL Books Online for CHECKSUM_AGG to identify changes in a table. Th
e difficulty then is in trying to identify what has changed. Another route i
s to try and identify fields within the production data that will identify w
hen it was last changed. Ty
pically these tend to be datetime, timestamp or rowversion data types.
If you keep production keys in your fact table as additional attributes then
this will give you another option for identifying new data.
"Ben" wrote:

> Hello,
> Im a member of a team planning a data warehousing project. We have multipl
e data sources which are aggregated in a staging area. This is then denormal
ised and imported into the datawarehouse database.
> I am looking at ideas for incremental data refresh, rather than a drop and
re-import of all data. This would allow us to have historic data.
> Does anyone have any tips that might be helpful for detecting changes in the sourc
e data for import? We have had some bad experiences with triggers on our source data
base in the past, so would rather not use these. I have considered replication and l
og
shipping, but these just give a replica of the source data and does not flag the updated/new
data.
> Any help would be greatly appreciated.
> Thanks.
> Ben.|||Timestamp your source data when it gets changed. That is definately the best
way.
Rgards
Jamie
"Ben" wrote:

> Hello,
> Im a member of a team planning a data warehousing project. We have multipl
e data sources which are aggregated in a staging area. This is then denormal
ised and imported into the datawarehouse database.
> I am looking at ideas for incremental data refresh, rather than a drop and
re-import of all data. This would allow us to have historic data.
> Does anyone have any tips that might be helpful for detecting changes in the sourc
e data for import? We have had some bad experiences with triggers on our source data
base in the past, so would rather not use these. I have considered replication and l
og
shipping, but these just give a replica of the source data and does not flag the updated/new
data.
> Any help would be greatly appreciated.
> Thanks.
> Ben.

No comments:

Post a Comment