SQL Server Connected to Hadoop – Thoughts and Challenges of Implementation

I wanted to broach the issue of SQL Server’s Hadoop distribution called HDInsight.

Given that there is a connection provided to Hadoop, does anyone have experience with HDInsight and particularly a comparison between the Hadoop / SQL Server connector and HDIinsight / SQL Server from a real life DTP scenario or personal 1 node installation?

  • ROW_NUMBER & PARTITION
  • Fetch a value from xml in SQL Stored procedure
  • Is there a more efficient way to write this SQL query?
  • Select max int from varchar column
  • SQL incremential count for X with a Left Outer Join
  • What is the equivalent for LOCK_ESCALATION = TABLE in SQL Server 2005?
  • http://sqlmag.com/blog/use-ssis-etl-hadoop

    http://www.microsoft.com/en-us/download/details.aspx?id=27584

    http://www.microsoft.com/en-us/sqlserver/solutions-technologies/business-intelligence/big-data.aspx

  • How can I run the mssql functions in php?
  • Connect PHP on Windows to MS SQL Server
  • SQL Server instead of MYSQL in WAMP
  • Call to undefined function odbc_connect() in Ubuntu
  • Connect to MS SQL Server from PHP on Linux?
  • Parallel execution of a stored procedure by using JMeter JDBC request and parameterization
  • One Solution collect form web for “SQL Server Connected to Hadoop – Thoughts and Challenges of Implementation”

    HDInsight is the distribution of Hadoop that Microsoft maintains for use in Azure. You could roughly compare this to Amazon Elastic MapReduce. They both serve the purpose of being a hosted Hadoop service that has almost no management overhead.

    The Hortonworks Data Platform for Windows contains the open source changes that Hortonworks and Microsoft have collaborated on to make Hadoop run well on Windows. HDP isn’t HDInsight.

    In short – you don’t need to use HDInsight if you want to run Hadoop in a Windows environment.

    While I can’t speak directly to using HDInsight and moving data back and forth between SQL Server, I’ve done implemented a data processing solution using SQL Server, Hadoop, and Elastic MapReduce. Barring some data quality issues and BULK INSERT weirdness, the process was painless.

    Finally, you ask “do we really want to run Hadoop size datasets on Windows servers?” – Windows performs well and has solid tooling around it. I’ve been somewhat skeptical about running Hadoop and other Java platform software on Windows because of legacy Java I/O issues and a lack of community support, not because of any performance issues.

    The largest issues that Windows companies will find moving to Hadoop is there will be limited support in community forums and channels when the problem becomes a Hadoop + Windows issue. It’s very easy for people to throw their hands up and say “Nope, not helping out, don’t have Windows.” With time and adoption, this problem goes away. Besides, nothing says you have to finish on the same platform you start with. You could easily deploy with HDP on Windows and move to HDP on Linux at a later date.

    I have put together some SQL Server and Hadoop basics for DBAs that should be helpful.

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.