Thursday, March 29, 2012

Database compare

Language : VB.NET
Database : MSSQLserver2000
i have developed a tool to compare two databases, but it is taking long time to compare. First it will take the whole table into main memory(order by primary key). Same procedure for table2 of second database. Then it will start comparison by taking row by row. i tested with one third party tool, which is taking only 2 min to compare 800000 records of a table. My tool is taking 1 hour,40 mins to do that. Is there any other optimized method?

Regards,
Lokait is likely that you wrote inefficient code. without providing us the information in Brett's sticky at the top of this page, you are unlikely to get any replies you find helpful.|||Are you taking advantage of the CHECKSUM and BINARY_CHECKSUM functions for comparing records?|||BINARY_CHECKSUM rocks! We use it for this very same situation, but use a stored proc that does a full outer join (to grab missing rows from both tables also, in addition to catching data differences with the BINARY_CHECKSUM) and this morning's run took 4.26 minutes to pass/compare two tables on two different servers each with 12,224,072 rows.

yep, we bad, we bad...|||...as long as you don't need ABSOLUTE EXACT ACCURACY, binary_checksum is very efficient.|||I seem to recall some trouble with it, but your comment irritates the hell out of...err...*blush* I mean, piques my interest :)

How can you be more absolute, exactly accurate than a binary comparison? Guess I need to go look at BOL again ;0

I was tripped up a time or two because it actually seems to be a byte-by-byte comparison. As I recall, if you are comparing two columns named "price", and one is defined as a FLOAT, and one as a MONEY, and both contain 2.45, the BINARY COMPARE will fail.

Also, even if both your columns DATE contain '12/21/2006', but one is defined as a datetime, and the other a smalldatetime, the BINARY COMPARE will fail.

In both cases, if you think about it, it makes sense. from the BINARY perspective the bits involved are going to be different because the variable size is different. But I still got tripped up with them early on.|||Thanks for your quick response... We implemented a tool to compare the databases in VB.NET and we should not put any stored procedure in the any of databases. Is there any other way to compare long datatypes in VB.NET.
if there is 10 lakh records in each database then main emory is not at all sufficient to take the whole table at a time. Please suggest me a better approach to solve this issue..|||How can you be more absolute, exactly accurate than a binary comparison? Because it is possible for two different strings to yield the same binary checksum value. In some scenarios, it is almost likely to happen.

We implemented a tool to compare the databases in VB.NET and we should not put any stored procedure in the any of databases.Your system could create a temporary stored procedure in the database and then delete it upon completion.|||Binary Checksum does exactly what the name implies, it computes a binary checksum. You can compute a sum from a row that is almost 8 Kb long, and even longer if you use TEXT or IMAGE columns. The checksum is a whole bunch smaller than the actual data being summed, so there are some combinations of data that will produce the same checksum value.

Binary Checksum is good, but it is a long way from being perfect (no checksum can be perfect in that sense).

-PatP|||Is it possible to create temporary stored procedure in the remote server?
"Binary check sum" is database function or .Net function? what is the syntax?
If you have sample stored procedure to compare two databases, please give it to me.

Regards,
Loka|||First it will take the whole table into main memory
Very bad idea. What do you do if the two tables do not fit into memory?|||Yes that's a problem, how to overcome this?|||Is it possible to create temporary stored procedure in the remote server?
"Binary check sum" is database function or .Net function? what is the syntax?
If you have sample stored procedure to compare two databases, please give it to me.

Regards,
Loka
BINARY_CHECKSUM is a SQLSVR database function.
You can create a stored procedure on the database at the start of your process using a simple CREATE PROCEDURE statement, and then drop it when you are finished with DROP PROCEDURE. The login used will need to have sufficient permissions to create procedures.|||Yes that's a problem, how to overcome this?I would recommend a cursor.

...just funnin' ya guys.

That's the purpose of the stored procedure, which can just select and (outer) join the two tables to find any differences.

And yup, you guys that dis' the checksum, binary or otherwise...I suppose you gotta know thy data, and thy checksum algorithm, and see if the risk is worth the benefit. As Mr. Pat pointed out, as soon as you start using a checksum, you introduce the possibility/probability that a checksum comparison will not catch a difference in data. There is some thought out there that the more data you scrunch into your checksum fields/components, the less likely to have this problem show up, but I am not sure I buy that at face value.

As with most things in life, it depends.

...and that is as close to philosophical as I am willing to get on Christmas Eve Eve Eve.

No comments:

Post a Comment