Saturday, February 25, 2012

Data warehouse data refresh/update

Hello,
Im a member of a team planning a data warehousing project. We have multiple data sources which are aggregated in a staging area. This is then denormalised and imported into the datawarehouse database.
I am looking at ideas for incremental data refresh, rather than a drop and re-import of all data. This would allow us to have historic data.
Does anyone have any tips that might be helpful for detecting changes in the source data for import? We have had some bad experiences with triggers on our source database in the past, so would rather not use these. I have considered replication and log sh
ipping, but these just give a replica of the source data and does not flag the updated/new data.
Any help would be greatly appreciated.
Thanks.
Ben.
Look in SQL Books Online for CHECKSUM_AGG to identify changes in a table. The difficulty then is in trying to identify what has changed. Another route is to try and identify fields within the production data that will identify when it was last changed. Ty
pically these tend to be datetime, timestamp or rowversion data types.
If you keep production keys in your fact table as additional attributes then this will give you another option for identifying new data.
"Ben" wrote:

> Hello,
> Im a member of a team planning a data warehousing project. We have multiple data sources which are aggregated in a staging area. This is then denormalised and imported into the datawarehouse database.
> I am looking at ideas for incremental data refresh, rather than a drop and re-import of all data. This would allow us to have historic data.
> Does anyone have any tips that might be helpful for detecting changes in the source data for import? We have had some bad experiences with triggers on our source database in the past, so would rather not use these. I have considered replication and log
shipping, but these just give a replica of the source data and does not flag the updated/new data.
> Any help would be greatly appreciated.
> Thanks.
> Ben.
|||Timestamp your source data when it gets changed. That is definately the best way.
Rgards
Jamie
"Ben" wrote:

> Hello,
> Im a member of a team planning a data warehousing project. We have multiple data sources which are aggregated in a staging area. This is then denormalised and imported into the datawarehouse database.
> I am looking at ideas for incremental data refresh, rather than a drop and re-import of all data. This would allow us to have historic data.
> Does anyone have any tips that might be helpful for detecting changes in the source data for import? We have had some bad experiences with triggers on our source database in the past, so would rather not use these. I have considered replication and log
shipping, but these just give a replica of the source data and does not flag the updated/new data.
> Any help would be greatly appreciated.
> Thanks.
> Ben.

No comments:

Post a Comment