Showing posts with label loading. Show all posts
Showing posts with label loading. Show all posts

Wednesday, March 7, 2012

Database (loading)

I have a database that is in (loading) status. Don't know why, we don't have a job running on it.
How to resolve this issue.What does DBCC CHECKDB tell you?

my guess is that you had a failed load|||What does DBCC CHECKDB tell you?

my guess is that you had a failed load|||What does DBCC CHECKDB tell you?

my guess is that you had a failed load...

database (loading)

Hey All,
I came in this morning to find a few SQL databases that show in a 'loading'
state. What does this mean'
Thanks in advance!!
WadeHi Wade,
Seems that someone Restored the database and left it in the loading
state, try to issue this for each database:
RESTORE DATABASE YourdbName WITH RECOVERY
HTH, Jens Suessmeyer.|||Hi Wade,
Seems that someone Restored the database and left it in the loading
state, try to issue this for each database:
RESTORE DATABASE YourdbName WITH RECOVERY
HTH, Jens Suessmeyer.

database (loading)

Hey All,
I came in this morning to find a few SQL databases that show in a 'loading'
state. What does this mean?
Thanks in advance!!
Wade
Hi Wade,
Seems that someone Restored the database and left it in the loading
state, try to issue this for each database:
RESTORE DATABASE YourdbName WITH RECOVERY
HTH, Jens Suessmeyer.
|||Hi Wade,
Seems that someone Restored the database and left it in the loading
state, try to issue this for each database:
RESTORE DATABASE YourdbName WITH RECOVERY
HTH, Jens Suessmeyer.

database (loading)

Hey All,
I came in this morning to find a few SQL databases that show in a 'loading'
state. What does this mean'
Thanks in advance!!
WadeHi Wade,
Seems that someone Restored the database and left it in the loading
state, try to issue this for each database:
RESTORE DATABASE YourdbName WITH RECOVERY
HTH, Jens Suessmeyer.|||Hi Wade,
Seems that someone Restored the database and left it in the loading
state, try to issue this for each database:
RESTORE DATABASE YourdbName WITH RECOVERY
HTH, Jens Suessmeyer.

Saturday, February 25, 2012

Data, data, who's got the data? (basic theory question)

I'm trying to understand aggregations, storage theory (MOLAP,ROLAP,HOLAP), and proactive caching and loading data into my cube.

We are currently building both an OLTP db and OLAP db 100% in SQL Server.

The OLTP users will add several hundred thousand rows a day.

I'm going to create an SSIS package to initially take data from the OLTP db and load it into the OLAP db and then build my cube.

Now my understanding becomes fuzzy. I've built partitions and created aggregations and enabled proactive caching, but I'm not sure why I am doing so. When talking about real time MOLAP and proactive caching and MOLAP to ROLAP to MOLAP, the various literature refers to "changes in the relational data" triggering various actions based on my settings. Are the authors referring to changes in the original OLTP db or changes in the OLAP db. If the latter, then changes will only result when I run another SSIS package to update my data.

Or does this discussion assume I am using a UDM db which seems to say I can build my star schema on top of the OLTP db. But then who has that data?

Where the heck is my data?

You want to update your cubes after data has been added to your relational OLAP database. Your relational OLAP database is the source of the data for your cubes so there really isn't any reason to process your cubes until they have been updated.

Though data will be added to your OLTP system throughout the day, how frequently will you be adding data to your OLAP database? If it's on a fixed schedule, you may want to do without the proactive caching and just start the processing yourself as part of your SSIS package.

Regarding the UDM, I would not recommend building your DSV using references to tables in the OLTP system. Though this can be done, it just seems like this is bound to cause you problems.

Good luck,

Bryan

|||

My problem is my users want "near real time" data so I'm looking for solutions. Proactive caching seems like a piced of the puzzle but I wasn't sure what the data source was for the proactive caching.

If all the hubub is about detecting data that changes in the OLAP, I don't get it. Data will only change in the OLAP when I tell it (run an SSIS program) to change so I would think I'd know about it and go ahead and process the cube.

|||

What's nice about the Proactive Caching is that you could have SSAS handle the processing on your behalf. You can configure it to detect a change but then wait for some period of time for things to settle down before processing. During that time, you might have 2 or 3 ETL cycles, some with changes and some without. Really just depends on your needs. If you can process quickly so that it doesn't interfere with your ETL cycles, then your suggested approach is probably the best for you.

Bryan

|||Does it "detect a change" in the OLAP db or the original OLTP db? I suspect OLAP, which is only going to change when I run an SSIS update so I don't see much value to detecting a change sincs I am causing the change and am fully aware of when it happens. As part of my SSIS package I suppose I could trigger the reprossing of my cube.|||

Just so anyone else reading this is clear on what we are talking about, let me define a few things. OLTP and OLAP refer to how data is used. OLTP systems typically support entity-relationship models while OLAP systems typiclly support dimensional models (though others are available).

Both entity-relationship and dimensional models are be supported through relational database technologies. The SQL Server Database Engine is an example of a relational database technology. Dimensional models are also supported by multidimensional database technologies, such as SQL Server Analysis Services, which store data in a different kind of structure and allow access through languages like MDX.

So, when you say "OLAP db", I interpret that to mean the relational database containing your dimensional model supporting OLAP activities, and when you say "OLTP db", I interpret that to mean the relational database containing your entity-relationship model supporting OLTP activities.

Whew!

With that out of the way, the answer to your question is normally you are watching your OLAP db for changes. Watching isn't the most accurate way to describe this. You can certainly configure SSAS to evaluate the underlying tables for changes. However, you can also configure some relational database technologies to notify SSAS when a change occurs. And there are a ton of other options that make ProActive caching really useful in some scenarios.

Still, my general philosophy is to keep your system as simplistic as is possible while achieving your critical goals (one of which is supportablity!). If you can avoid using proactive caching by taking a more straightforward approach while still meeting your goals, I would recommend doing so.

Good luck,

Bryan

|||

I'm sorry, I did leave out quite a bit, but yes, that is what I am talking about. In my case the only reason to have a "middle" OLAP db is to create some calculated fields and pull together some columns from different tables. I could probably do most everything during cube processing and build it right on my OLTP which is a single SQL Server db. But the main reason I'm using the OLAP "staging" db is so I am not placing an additional load on my OLTP db which is expected to receive up to a million new rows of data a day. I foresee huge latency and functionality issues and am trying to get ahead of the curve.

But, from your answer, it looks like I can use either db to trigger a cache update (if that's the proper SSAS term).

Thanks.

|||

Just wanted to mention one thing.... You refer to your OLAP db as a "staging" database. We often implement a full-blown dimensional model in a relational database. Lots of reasons for this. (The Kimball books go into this in detail.)

This database + the OLAP cube serve as the data warehouse layer of our BI infrastructure. You could build a cube off a DSV that assembles data right out of your OLTP database, but we don't often recommend this.

Good Luck,
Bryan

|||

Yes. The main reason I'm building an OLAP db is to offload data retrieval requests from the main db. So I tend to think of it as a "staging" db, which really refers to a real db one might require were one gathering data from diverse data sources.

We are currently building both the OLTP db and BI solution. In theory, I could build my cube right on top of the live db (OLTP) or a snapshot of it or some other replicated image. My "truth" lies in my OLTP db, changes in which I may be able to use to update caching. I was reading the Sam's book and it referred to changes in the relational database triggering updates to your cache and I just wondered WHICH relational db.

Given the constant updating that is going to be done on the OLTP source, I probably am better off to update caching and/or cube processing on a timed schedule. I realize there are a couple of configurable "windows" but my data updates probably will be constant throughout the day. Of course clients want terabyte size data cubes to generate instantaneous reports on data with nano-second latency.

I'm trying to get my arms around the basic theory of this new (to me) data model (multi-dimensional). I can recite Codd's 12 rules in my sleep (or is it 13...I think I need a nap), which is perhaps part of my problem: I am trying to force everything through a relational prism.

But you've been quite helpful.

Thanks.

John

Data warehouse question

Hi,
I am building the data warehouse in SQL 2005 and have the following
questions want to clarify.
1. Any method and suggestion for incremental loading for data warehouse? If
I create a field in each record in my source table and select the one havent
been loaded in ETL, is it a method or any other better suggestion?
2. If there are schema changes in the source table and the data warehouse,
is it necessary (or best practise) to reload all data?
Thanks a lot.
Ryan
Ryan:
I am stumbling through my ETL systems initial design and have had to work
this one out as well. Take this following bit of free advice with caution:
I created a tracking table in a separate db with TableName and LastLoadDate
columns. After runnin a package that is my "Initial Load" package (ie:
truncating the whole Fact table), it runs a SQL task that populates the
LastLoadDate in the corresponding record.
For Incremental Loads, I run two data flows, one for 'changed' rows and one
for 'new' rows:
"Changed": SELECT * FROM x WHERE [created] < LastLoadDate and [Modified] >
LastLoadDate. That goes into a staging table. At the end of the package,
another SQL task runs a simple UPDATE query that updates the FACT table from
the statging table.
"NEW ROWS" SELECT * FROM x WHERE [created] > LastLoadDate. That data flow
gets appended directly to the FACT table.
Check out books by Ralph Kimball on Microsoft Data Warehouse Design.
HTH
Todd Chittenden
"Ryan" wrote:

> Hi,
> I am building the data warehouse in SQL 2005 and have the following
> questions want to clarify.
> 1. Any method and suggestion for incremental loading for data warehouse? If
> I create a field in each record in my source table and select the one havent
> been loaded in ETL, is it a method or any other better suggestion?
> 2. If there are schema changes in the source table and the data warehouse,
> is it necessary (or best practise) to reload all data?
> Thanks a lot.
> Ryan

Data warehouse question

Hi,
I am building the data warehouse in SQL 2005 and have the following
questions want to clarify.
1. Any method and suggestion for incremental loading for data warehouse? If
I create a field in each record in my source table and select the one havent
been loaded in ETL, is it a method or any other better suggestion?
2. If there are schema changes in the source table and the data warehouse,
is it necessary (or best practise) to reload all data?
Thanks a lot.
RyanRyan:
I am stumbling through my ETL systems initial design and have had to work
this one out as well. Take this following bit of free advice with caution:
I created a tracking table in a separate db with TableName and LastLoadDate
columns. After runnin a package that is my "Initial Load" package (ie:
truncating the whole Fact table), it runs a SQL task that populates the
LastLoadDate in the corresponding record.
For Incremental Loads, I run two data flows, one for 'changed' rows and one
for 'new' rows:
"Changed": SELECT * FROM x WHERE [created] < LastLoadDate and [Modif
ied] >
LastLoadDate. That goes into a staging table. At the end of the package,
another SQL task runs a simple UPDATE query that updates the FACT table from
the statging table.
"NEW ROWS" SELECT * FROM x WHERE [created] > LastLoadDate. That data flo
w
gets appended directly to the FACT table.
Check out books by Ralph Kimball on Microsoft Data Warehouse Design.
HTH
--
Todd Chittenden
"Ryan" wrote:

> Hi,
> I am building the data warehouse in SQL 2005 and have the following
> questions want to clarify.
> 1. Any method and suggestion for incremental loading for data warehouse? I
f
> I create a field in each record in my source table and select the one have
nt
> been loaded in ETL, is it a method or any other better suggestion?
> 2. If there are schema changes in the source table and the data warehouse,
> is it necessary (or best practise) to reload all data?
> Thanks a lot.
> Ryan

Data Warehouse Load Hints?

(SQL 2000) So if we are loading large tables while no one else is logged in
and no other processes will occur, should we provide hints for the table
inserts and updates for a performance boost? If so, which hints? Can any
help avoid deadlocks? And will this be different for 2005?
Does setting the db to singleuser help speed-up updates?
Thanks
Greg CA deadlock is when two processes have locked resources (typically in an
uncommitted transaction) that the other needs to continue. How would this
happen if the ETL is the only process running?
http://support.microsoft.com/defaul...kb;en-us;169960
http://msdn.microsoft.com/library/d... />
a_3hdf.asp
As for maximizing the performance of your ETL, I would reccomend the "bulk
load" method, becuase it supports features such as minimal transaction
logging.
[url]http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx[/u
rl]
Setting the database to DBO Use Only would also be a good idea; it may
improve performance, but mostly to insure that noone or nothing else logs
in.
"gc" <nospam@.hotmail.com> wrote in message
news:Xns977C861DB2E90nospam@.24.93.43.121...
> (SQL 2000) So if we are loading large tables while no one else is logged
> in
> and no other processes will occur, should we provide hints for the table
> inserts and updates for a performance boost? If so, which hints? Can any
> help avoid deadlocks? And will this be different for 2005?
> Does setting the db to singleuser help speed-up updates?
> Thanks
> Greg C
>
>|||Just a couple of tips that I've learned:
1. If you have a clustered index, make sure that it's located on a
monotonically increasing value. This will minimize page splitting.
2. The clustered index does not have to be your primary key; in fact,
if you use a natural key, then it should not be. Our data is very date
and time sensitive, so I index the date and time of load; others use a
sequential numbering system (like an identity column), but I needed the
date of load for other reasons, and I prefer not to add columns that
have no meaning.
3 Depending on the size of your data, it may be more appropriate to
drop all of your non-clustered indexes , load the data, and rebuild the
indexes.
HTH,
Stu|||"JT" <someone@.microsoft.com> wrote in
news:uTboj28PGHA.720@.TK2MSFTNGP14.phx.gbl:
Well...you are asking someone who really needs to ask the question. Did
that make sense? :-) I am not a dba, and have learned through trial and
error what seems to work, at least most of the time.
I thought 'bulk load' recovery was just beneficial when importing text
files or using 'select into', no?
In our ETL, we have a number of procedures (which can call other
procedures) to test and process the data after the raw data is imported
w/ DTS. The DTS part is always fast, some of the subsequent processing
will lock-up periodically...I assume this would be a deadlock or lock
escalation? (When I look at the locks I will see many pages on one
object). It just happened to occurr on one machine...I stopped and
restarted w/o mods to the code and the next time it processed w/o
halting.
I drop indexes for the DTS, for the data transformation I optimize sql
for performance, individual statements are broken into transactions,
most of the largest updates/inserts are broken into smaller transactions,
temp tables are used when needed, I aggressively manage the multiple data
and log files on multiple raids and hard drives. But I have not been
able to understand the hints but more importantly why a process will run
fine most times then decide to freeze once (btw, I also check for open
transactions to make certain that there was no minor error leaving an
open tran).
thanks for the references

> A deadlock is when two processes have locked resources (typically in
> an uncommitted transaction) that the other needs to continue. How
> would this happen if the ETL is the only process running?
> http://support.microsoft.com/defaul...kb;en-us;169960
> http://msdn.microsoft.com/library/default.asp
?url=/library/en-us/acdata/ac_8_con_7a_3hdf.asp As for maximizing the
performance of your ETL, I
> would reccomend the "bulk load" method, becuase it supports features
> such as minimal transaction logging.
http://www.microsoft.com/technet/pr...tain/incbulkloa
d.mspx Setting the database to DBO Use Only would also be a good
> idea; it may improve performance, but mostly to insure that noone or
> nothing else logs in.
> "gc" <nospam@.hotmail.com> wrote in message
> news:Xns977C861DB2E90nospam@.24.93.43.121...
>
>|||"Stu" <stuart.ainsworth@.gmail.com> wrote in news:1141535576.352832.290970
@.e56g2000cwe.googlegroups.com:

> 1. If you have a clustered index, make sure that it's located on a
> monotonically increasing value. This will minimize page splitting.
Not sure that I understand this one.
If I load some data into the field of the clustered key, are you saying
that it makes a difference what order that I load the records?
If so, is that different whether the clustered index is the primary key or
not?

> 2. The clustered index does not have to be your primary key; in fact,
> if you use a natural key, then it should not be. Our data is very date
> and time sensitive, so I index the date and time of load; others use a
> sequential numbering system (like an identity column), but I needed the
> date of load for other reasons, and I prefer not to add columns that
> have no meaning.
I use natural keys for something like a client id, but convert almost all
of the dimensions to surrogate (identity) keys.

> 3 Depending on the size of your data, it may be more appropriate to
> drop all of your non-clustered indexes , load the data, and rebuild the
> indexes.
>
I do this everytime. 1.5 out of 3... :-)|||You can know if a process if being blocked by another process by executing
sp_who2 and seeing if the [blkby] column for a spid (process id) contains
the spid of another blocking process.
When importing large amounts of data, use the bulk copy program (BCP.EXE) or
BULK INSERT command.
http://msdn.microsoft.com/library/d...>
bcp_9esz.asp
Try reducing the batch size of your bulk copy process:
http://support.microsoft.com/defaul...=kb;en-us;81339
Consider what impact transaction logging is having on your process. There
are a few conditions (such as dropping indexes and specifying the TABLOCK
hint) that are required for minimal logging to take effect.
http://support.microsoft.com/defaul...=kb;en-us;59462
http://support.microsoft.com/defaul...kb;en-us;110139
"gc" <nospam@.hotmail.com> wrote in message
news:Xns977CEEB88B493nospam@.24.93.43.121...
> "JT" <someone@.microsoft.com> wrote in
> news:uTboj28PGHA.720@.TK2MSFTNGP14.phx.gbl:
> Well...you are asking someone who really needs to ask the question. Did
> that make sense? :-) I am not a dba, and have learned through trial and
> error what seems to work, at least most of the time.
> I thought 'bulk load' recovery was just beneficial when importing text
> files or using 'select into', no?
> In our ETL, we have a number of procedures (which can call other
> procedures) to test and process the data after the raw data is imported
> w/ DTS. The DTS part is always fast, some of the subsequent processing
> will lock-up periodically...I assume this would be a deadlock or lock
> escalation? (When I look at the locks I will see many pages on one
> object). It just happened to occurr on one machine...I stopped and
> restarted w/o mods to the code and the next time it processed w/o
> halting.
> I drop indexes for the DTS, for the data transformation I optimize sql
> for performance, individual statements are broken into transactions,
> most of the largest updates/inserts are broken into smaller transactions,
> temp tables are used when needed, I aggressively manage the multiple data
> and log files on multiple raids and hard drives. But I have not been
> able to understand the hints but more importantly why a process will run
> fine most times then decide to freeze once (btw, I also check for open
> transactions to make certain that there was no minor error leaving an
> open tran).
> thanks for the references
>
> ?url=/library/en-us/acdata/ac_8_con_7a_3hdf.asp As for maximizing the
> performance of your ETL, I
> http://www.microsoft.com/technet/pr...tain/incbulkloa
> d.mspx Setting the database to DBO Use Only would also be a good
>|||Ugh. Page splitting is difficult to explain; think of your table as a
blank notebook. The clustered index is kind of like a page number for
your other indexes to go and retrieve information; it's a way for the
optimizer to go figure out where row X is located in your table.
As you write data to your notebook, it begins to fill up each page. As
you continue to write, it skips to the next page; if your clustered
index does not increase in a sequential fashion, when the optimizer
encounters a a value that is out of order, it has to split the page at
the insertion point, and move records below that point on a page to a
new page; that's why you should cluster on a sequential value that is
independent of the order of the data outside of the warehouse. That's
why I recommend a datetime representation so it increases without
splitting the pages.
If your primary key is a natural key (like a combination of clientID
and rowID), then the likelihood is that your rows will be out of order
as they load into the target database, and hence some reordering will
be necessary if you cluster on that.
Clear as mud?
Stu|||Also consider that if a good candidate for a clustered index cannot be
found, then best not to implement one.
"Stu" <stuart.ainsworth@.gmail.com> wrote in message
news:1141539867.013564.240070@.e56g2000cwe.googlegroups.com...
> Ugh. Page splitting is difficult to explain; think of your table as a
> blank notebook. The clustered index is kind of like a page number for
> your other indexes to go and retrieve information; it's a way for the
> optimizer to go figure out where row X is located in your table.
> As you write data to your notebook, it begins to fill up each page. As
> you continue to write, it skips to the next page; if your clustered
> index does not increase in a sequential fashion, when the optimizer
> encounters a a value that is out of order, it has to split the page at
> the insertion point, and move records below that point on a page to a
> new page; that's why you should cluster on a sequential value that is
> independent of the order of the data outside of the warehouse. That's
> why I recommend a datetime representation so it increases without
> splitting the pages.
> If your primary key is a natural key (like a combination of clientID
> and rowID), then the likelihood is that your rows will be out of order
> as they load into the target database, and hence some reordering will
> be necessary if you cluster on that.
> Clear as mud?
> Stu
>|||gc (nospam@.hotmail.com) writes:
> In our ETL, we have a number of procedures (which can call other
> procedures) to test and process the data after the raw data is imported
> w/ DTS. The DTS part is always fast, some of the subsequent processing
> will lock-up periodically...I assume this would be a deadlock or lock
> escalation? (When I look at the locks I will see many pages on one
> object). It just happened to occurr on one machine...I stopped and
> restarted w/o mods to the code and the next time it processed w/o
> halting.
A deadlock is when two (or more) processes are waiting for each other
to release resources. SQL Server detects a deadlock, and will select one
of the processes as a deadlock victim and cancel execution for that
process, so that at least one of the processes can continue working.
If you processing "locks up" this can be due to blocking, but also due
to long-running queries. With the information you have provided, it is
difficult to tell. As JT said, use sp_who2 to check for blocking.
Another issue to watch out for is auto-grow of the database. By default,
SQL Server auto-grows a database with 10% when you run out of space. If
your database is huge - and data warehouses often are - then 10% can
take quite some to time grow. This can be dealt with expanding the database
to a reasonable size in advance.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx