MySQL – sql server: consistency check

I’m trying to check the results of a data load between two databases. Unfortunately, I only have access to one database (MySQL) directly, the company managing MSSQL can expose it to us via an API.

What I would like to do is check the consistency of certain columns across rowsets. Originally, I had hoped to be able to run a CRC or hash check against the columns, but there doesn’t seem to be a compatible way of doing this.

  • Select where null only if a non-null value doesn't exist?
  • Preserving ORDER BY in SELECT INTO
  • SQL Injection puzzle - what constitutes a SQL Server 'batch'?
  • SQL - help on “top 1 by group”
  • Microsoft Master Data Services : How to get/set description of Model/Entity programmatically
  • Converting int primary key to bigint in Sql Server
  • For example, we can run CRC32 against a column in MySQL, but there isn’t a reliable way of doing the same on MSSQL. Alternatively, there’s CHECKSUM_AGG on MSSQL, but no alternative on MySQL.

    The end result is that I would like to do a binary search if the checksums differ to identify the rows that require changing.

    There is currently no bulk load interface, and SSIS is not available (the MSSQL servers are not part of my company).

    One Solution collect form web for “MySQL – sql server: consistency check”

    I thought I’d come back to this and describe the solution we ended up implementing. This was a major pain in the neck!

    Firstly, because of the fixed versions of MySQL on our server and MSSQL on the remote server, there were no common encoding methods.
    The MSSQL API returned data in UTF-16LE, the MySQL database had Unicode data stored in Latin-1 tables sigh

    Firstly, we concatenated the fields that we were comparing, then we MD5’d the result. In order to get the MySQL result to match the output of the MSSQL HASHBYTES function, we had to do this:

        IF(MID(MD5(CONC), -8 , 1) >= "8", "FFFFFFFF", ""),
        RIGHT(MD5(CONC), 8)
    ), 16, -10 )) AS CALC

    where CONC is the result of a subselect concatenating the fields we are interested in.

    On the MSSQL server, we had to do the following query:

        CONVERT(NVARCHAR(4000), FIELD1 ) + 
        CONVERT(NVARCHAR(4000), FIELD2 ) + ...

    Then, we took the sum of the MD5 across the entire range, modulo three large-ish primes(311,313,317), as per Chinese Remainder Theorem

    This gave us three numbers for the range we were checking. We could be reasonably certain that if all three numbers matched for a given range on each server, then the data was consistent.

    I’ll spare you the details of the munging we had to do to get Unicode in Latin-1 transliterated to UTF-16LE

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.