Saturday, February 25, 2012
Data Warehousing:Best way to write SP?
I'm dealing with 8 fact tables, 8 dimension tables and 9 keys. I'm currently doing:
SELECT COUNT(DISTINCT key)
FROM t_fact
WHERE key NOT IN
(
SELECT DISTINCT key FROM t_dimension
)
If I do one of those for each key-fact table combo, there are about 50 queries in total. Not every key exists in every fact table.
I'm a Stored Procedure novice. What is the best way to check all of the fact tables, aside from running 50 counts with subqueries? If I run the queries one fact table at a time, it will take about 30 minutes. I've tried to run one query per fact table, by counting all keys, and doing a subselect to each dimension table, but got misleading results.
Any tips will be greatly appreciated. Abandoning data warehousing isn't a current option!
MikeIn Oracle I would find missing keys like this:
SELECT key FROM t_fact
MINUS
SELECT key FROM t_dimension
BTW, why not use a foreign key constraint to ensure all fact keys are based on dimension keys?
Data Warehouse Structure and Indexing when using MOLAP
Hi there,
I am building a cube in AS2005 which is made up of one fact table (20 million rows) and 7 dimension tables. I am planning to use MOLAP storage and my question is this. Is it important to add indexes, primary keys, foreign keys to my data source to improve performance? I can appreciate that it might speed up processing time, but since all the data and aggregations would be stored in the cube, I can't see how it would have any imact on performance (querying the cube).
If there is a benefit to adding indexes, keys etc. then could somebody please tell the best strategy, i.e where to put them, clustered or nonclustered etc.
Many thanks,
David
I have the same opinion, with MOLAP indexes will only help when you process a cube. MOLAP is separated from the data source when the cube is finished/processed.
If you use the same datamart for Reporting Services you will benefit when running queries.
The strategy for index is clustered indexes for dimension primary keys and non-clustered on their foreign keys in the fact table. A non clustered composite index on all the foreigh keys in the fact table is also recommended. Clustered index in the fact table is not recommended.
You will pay for using indexes when you do inserts, deletes and updates so it is often recommended to drop indexes when you update your data mart or data warehouse, and recreate the when the update/load is finished.
Primary and foreign keys are good help if you like to receive errors when you are loading the data mart/warehouse. With them you know that all fact records have related dimension members Else these errors will appear when you process the cube.
Here is a web site I can recommend for performance issues:
http://www.sql-server-performance.com/default.asp
Regards
Thomas Ivarsson
Data Warehouse Structure and Indexing when using MOLAP
Hi there,
I am building a cube in AS2005 which is made up of one fact table (20 million rows) and 7 dimension tables. I am planning to use MOLAP storage and my question is this. Is it important to add indexes, primary keys, foreign keys to my data source to improve performance? I can appreciate that it might speed up processing time, but since all the data and aggregations would be stored in the cube, I can't see how it would have any imact on performance (querying the cube).
If there is a benefit to adding indexes, keys etc. then could somebody please tell the best strategy, i.e where to put them, clustered or nonclustered etc.
Many thanks,
David
I have the same opinion, with MOLAP indexes will only help when you process a cube. MOLAP is separated from the data source when the cube is finished/processed.
If you use the same datamart for Reporting Services you will benefit when running queries.
The strategy for index is clustered indexes for dimension primary keys and non-clustered on their foreign keys in the fact table. A non clustered composite index on all the foreigh keys in the fact table is also recommended. Clustered index in the fact table is not recommended.
You will pay for using indexes when you do inserts, deletes and updates so it is often recommended to drop indexes when you update your data mart or data warehouse, and recreate the when the update/load is finished.
Primary and foreign keys are good help if you like to receive errors when you are loading the data mart/warehouse. With them you know that all fact records have related dimension members Else these errors will appear when you process the cube.
Here is a web site I can recommend for performance issues:
http://www.sql-server-performance.com/default.asp
Regards
Thomas Ivarsson