Saturday, February 25, 2012

Data Warehousing:Best way to write SP?

I'm writing a set of queries designed to test the data quality of our data warehouse at the fact level. The intent is to ensure that there are no keys at the fact level that do not exist at the dimension level.

I'm dealing with 8 fact tables, 8 dimension tables and 9 keys. I'm currently doing:

SELECT COUNT(DISTINCT key)
FROM t_fact
WHERE key NOT IN
(
SELECT DISTINCT key FROM t_dimension
)

If I do one of those for each key-fact table combo, there are about 50 queries in total. Not every key exists in every fact table.

I'm a Stored Procedure novice. What is the best way to check all of the fact tables, aside from running 50 counts with subqueries? If I run the queries one fact table at a time, it will take about 30 minutes. I've tried to run one query per fact table, by counting all keys, and doing a subselect to each dimension table, but got misleading results.

Any tips will be greatly appreciated. Abandoning data warehousing isn't a current option!

MikeIn Oracle I would find missing keys like this:

SELECT key FROM t_fact
MINUS
SELECT key FROM t_dimension

BTW, why not use a foreign key constraint to ensure all fact keys are based on dimension keys?

No comments:

Post a Comment