Showing posts with label million. Show all posts
Showing posts with label million. Show all posts

Saturday, February 25, 2012

Data Warehouse Structure and Indexing when using MOLAP

Hi there,

I am building a cube in AS2005 which is made up of one fact table (20 million rows) and 7 dimension tables. I am planning to use MOLAP storage and my question is this. Is it important to add indexes, primary keys, foreign keys to my data source to improve performance? I can appreciate that it might speed up processing time, but since all the data and aggregations would be stored in the cube, I can't see how it would have any imact on performance (querying the cube).

If there is a benefit to adding indexes, keys etc. then could somebody please tell the best strategy, i.e where to put them, clustered or nonclustered etc.

Many thanks,

David

I have the same opinion, with MOLAP indexes will only help when you process a cube. MOLAP is separated from the data source when the cube is finished/processed.

If you use the same datamart for Reporting Services you will benefit when running queries.

The strategy for index is clustered indexes for dimension primary keys and non-clustered on their foreign keys in the fact table. A non clustered composite index on all the foreigh keys in the fact table is also recommended. Clustered index in the fact table is not recommended.

You will pay for using indexes when you do inserts, deletes and updates so it is often recommended to drop indexes when you update your data mart or data warehouse, and recreate the when the update/load is finished.

Primary and foreign keys are good help if you like to receive errors when you are loading the data mart/warehouse. With them you know that all fact records have related dimension members Else these errors will appear when you process the cube.

Here is a web site I can recommend for performance issues:

http://www.sql-server-performance.com/default.asp

Regards

Thomas Ivarsson

Data Warehouse Structure and Indexing when using MOLAP

Hi there,

I am building a cube in AS2005 which is made up of one fact table (20 million rows) and 7 dimension tables. I am planning to use MOLAP storage and my question is this. Is it important to add indexes, primary keys, foreign keys to my data source to improve performance? I can appreciate that it might speed up processing time, but since all the data and aggregations would be stored in the cube, I can't see how it would have any imact on performance (querying the cube).

If there is a benefit to adding indexes, keys etc. then could somebody please tell the best strategy, i.e where to put them, clustered or nonclustered etc.

Many thanks,

David

I have the same opinion, with MOLAP indexes will only help when you process a cube. MOLAP is separated from the data source when the cube is finished/processed.

If you use the same datamart for Reporting Services you will benefit when running queries.

The strategy for index is clustered indexes for dimension primary keys and non-clustered on their foreign keys in the fact table. A non clustered composite index on all the foreigh keys in the fact table is also recommended. Clustered index in the fact table is not recommended.

You will pay for using indexes when you do inserts, deletes and updates so it is often recommended to drop indexes when you update your data mart or data warehouse, and recreate the when the update/load is finished.

Primary and foreign keys are good help if you like to receive errors when you are loading the data mart/warehouse. With them you know that all fact records have related dimension members Else these errors will appear when you process the cube.

Here is a web site I can recommend for performance issues:

http://www.sql-server-performance.com/default.asp

Regards

Thomas Ivarsson

Friday, February 24, 2012

Data Validation Challenge

We have data migration process that transfers around 450 million records from
one DB to Another DB. There is a validation that takes place for the migrated
data that is time consuming (few days). The following validations takes place
on migrated data
NULL check
Length Check
Numeric Precision Check
So its looping through 450(rows) million * 30 (columns) times. so it takes
forever to complete the validation process and moreover the space
requirements also growing exponentially :-(. I would like to know is there a
better approach for validation of this kind. we are planning to try partition
approach. If there is any better way please help with your recommendations.
Regards,
Murali
Which tool are you using?
"Murali" wrote:

> We have data migration process that transfers around 450 million records from
> one DB to Another DB. There is a validation that takes place for the migrated
> data that is time consuming (few days). The following validations takes place
> on migrated data
> NULL check
> Length Check
> Numeric Precision Check
> So its looping through 450(rows) million * 30 (columns) times. so it takes
> forever to complete the validation process and moreover the space
> requirements also growing exponentially :-(. I would like to know is there a
> better approach for validation of this kind. we are planning to try partition
> approach. If there is any better way please help with your recommendations.
> Regards,
> Murali
>

Data Validation Challenge

We have data migration process that transfers around 450 million records fro
m
one DB to Another DB. There is a validation that takes place for the migrate
d
data that is time consuming (few days). The following validations takes plac
e
on migrated data
NULL check
Length Check
Numeric Precision Check
So its looping through 450(rows) million * 30 (columns) times. so it takes
forever to complete the validation process and moreover the space
requirements also growing exponentially :-(. I would like to know is there a
better approach for validation of this kind. we are planning to try partitio
n
approach. If there is any better way please help with your recommendations.
Regards,
MuraliWhich tool are you using?
"Murali" wrote:

> We have data migration process that transfers around 450 million records f
rom
> one DB to Another DB. There is a validation that takes place for the migra
ted
> data that is time consuming (few days). The following validations takes pl
ace
> on migrated data
> NULL check
> Length Check
> Numeric Precision Check
> So its looping through 450(rows) million * 30 (columns) times. so it takes
> forever to complete the validation process and moreover the space
> requirements also growing exponentially :-(. I would like to know is there
a
> better approach for validation of this kind. we are planning to try partit
ion
> approach. If there is any better way please help with your recommendations
.
> Regards,
> Murali
>

Friday, February 17, 2012

Data type conversion issue

Hi guys
I exported some data from a text file to sql server. Here is the sample data..

This table has about 2 million rows.There is a date field in the table which comes as a 'nvarchar' in sql .When i try to convert it to a 'datetime' , i get an error as operation timed out..

Here is the data from the text file...

Date dispensed Outliers Formulation ID Provider Number (dispensing) NSS flag Patient category Units dispensed Total days supply
1/01/2006 12:00:00 a.m. normal 106509.00 7952 I A 120.00 30.00
1/01/2006 12:00:00 a.m. normal 106509.00 8208 I A 360.00 90.00
1/01/2006 12:00:00 a.m. normal 106509.00 9460 I A 120.00 30.00
1/01/2006 12:00:00 a.m. normal 106509.00 10184 I A 120.00 60.00
1/01/2006 12:00:00 a.m. normal 106509.00 10291 I A 120.00 60.00
1/01/2006 12:00:00 a.m. normal 106509.00 11149 I A 120.00 30.00
1/01/2006 12:00:00 a.m. normal 106509.00 11294 I A 120.00 60.00
1/01/2006 12:00:00 a.m. normal 106509.00 11777 I A 120.00 30.00
1/01/2006 12:00:00 a.m. normal 106509.00 12048 I A 120.00 30.00

I have tried the bulk insert as well.

Here is the script for the create table ..

USE [Library]

GO

/****** Object: Table [dbo].[tablename] Script Date: 10/03/2006 14:45:59 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[NormalOutlier1](

[Datedispensed] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL,

[Outliers] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL,

[Formulation ID] [float] NULL,

[Provider Number (dispensing)] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL,

[NSS flag] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL,

[Patient category] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL,

[Units dispensed] [float] NULL,

[Total days supply] [float] NULL

) ON [PRIMARY]

Hope this helps

Did you try using a datetime rather than NVARCHAR(MAX) in the definition of the table ? There might be a implicit conversion possible for the values. if not you will have to convert the dates with your own logic using either an ETL process in DTS / SSIS or just by using a string manipulation.

HTH, Jens K. Suessmeyer.

http://www.sqlserver2005.de