SQL Server Connected to Hadoop – Thoughts and Challenges of Implementation
I wanted to broach the issue of SQL Server’s Hadoop distribution called HDInsight.
Given that there is a connection provided to Hadoop, does anyone have experience with HDInsight and particularly a comparison between the Hadoop / SQL Server connector and HDIinsight / SQL Server from a real life DTP scenario or personal 1 node installation?
One Solution collect form web for “SQL Server Connected to Hadoop – Thoughts and Challenges of Implementation”
HDInsight is the distribution of Hadoop that Microsoft maintains for use in Azure. You could roughly compare this to Amazon Elastic MapReduce. They both serve the purpose of being a hosted Hadoop service that has almost no management overhead.
The Hortonworks Data Platform for Windows contains the open source changes that Hortonworks and Microsoft have collaborated on to make Hadoop run well on Windows. HDP isn’t HDInsight.
In short – you don’t need to use HDInsight if you want to run Hadoop in a Windows environment.
While I can’t speak directly to using HDInsight and moving data back and forth between SQL Server, I’ve done implemented a data processing solution using SQL Server, Hadoop, and Elastic MapReduce. Barring some data quality issues and
BULK INSERT weirdness, the process was painless.
Finally, you ask “do we really want to run Hadoop size datasets on Windows servers?” – Windows performs well and has solid tooling around it. I’ve been somewhat skeptical about running Hadoop and other Java platform software on Windows because of legacy Java I/O issues and a lack of community support, not because of any performance issues.
The largest issues that Windows companies will find moving to Hadoop is there will be limited support in community forums and channels when the problem becomes a Hadoop + Windows issue. It’s very easy for people to throw their hands up and say “Nope, not helping out, don’t have Windows.” With time and adoption, this problem goes away. Besides, nothing says you have to finish on the same platform you start with. You could easily deploy with HDP on Windows and move to HDP on Linux at a later date.
I have put together some SQL Server and Hadoop basics for DBAs that should be helpful.