Thursday, March 29, 2012

Database comparision

Language : VB.NET
Database : MSSQLserver2000
i have developed a tool to compare two databases, but it is taking long time to compare. First it will take the whole table into main memory(order by primary key). Same procedure for table2 of second database. Then it will start comparison by taking row by row. i tested with one third party tool, which is taking only 2 min to compare 800000 records of a table. My tool is taking 1 hour,40 mins to do that. Is there any other optimized method?

Regards,

Loka

Did you consider that instead of checking column by column, that you 'might' be able to compare a checksum() of the column, or even of the entire row?|||

Hi,

The checksum idea is quite good - it should help you speed up the process. Just a couple of more thoughts on the subject:

(a) Any implementation with a client application would require an access to the data in some form - even if you use a server-side cursor, you would have to fetch the data for comparison and it will travel (maybe even through network if you're not running the app local to SQL Server) to the application. Thus the best approach is to implement the comparison inside SQL itself - as stored procedures, queries, views, etc. Be careful though - the comparison may consume SQL resources and starve existing processes and clients. You should be able to find a ballance between the server impact and the comparison task performance.

(b) You could take advantage of some of the new T-SQL enhancements in SQL 2005 like "intersect" and "except".

(c) If in the end, a standalone application proves to be needed, use OLEDB or ODBC directly with SQL Native Client to achieve maximum performance. Plan carefully for the types of cursors you need (ideally a simple firehose should be the fastest), the potential for blocking or other isolation issues, the proximity and network utilization between client and server, etc.

HTH,

Jivko Dobrev - MSFT

--
This posting is provided "AS IS" with no warranties, and confers no rights.

|||

Some other random ideas:

1. bcp out the tables and do a file compare of the resulting output files, if nothing is different this should be really fast.

2. Comparing rows in bcp out might actually be faster than scanning records using a recordset for sparse differences, hard to tell. For example, say you bcp out the tables into c:\foo1.txt and c:\foo2.txt. Then you can scan a single line at a time from each file and just memcmp the entire line in one command. If there is a missing pkey, they you have to scan each file to resync the keys but in general it should work pretty fast I think.

3. Computing a checksum on the server side should be faster than trying to download the data and checksum downloaded data I think.

No comments:

Post a Comment