Analytics in the Microsoft Stack
February 24, 2015 Leave a comment
Disclaimer: I really don’t know what I am talking about
I received an email from a coworker/friend yesterday with this in the body:
So, I have a friend who works for a major supermarket chain. In IT, they are straight out of the year 2000. They have tons and tons of data in SQL Server and I think Oracle. The industrial engineers (who do all of the planning) ask the IT group to run queries throughout the day, which takes hours to run. They use Excel for most of their processing. On the weekends, they run reporting queries which take hours and hours to run – all to get just basic information.
This got my wheels spinning about how I would approach the problem with the analytics toolset that I know is available. The supermarket chain has a couple of problems
- Lots of data that takes too long to munge through
- The planners are dependent on IT group for processing the data
I would expect the official Microsoft answer is that they should implement Sql Server Analytics with Power BI. I would assume if the group threw enough resources at this solution, it would work. I then thought of a couple of alternative paths:
The first thing that comes to mind is using HDInsight (Microsoft’s Hadoop product) on Azure. That way the queries can run in a distributed manner and they can provision machines as they need them -> and when they are not running their queries, they can de-allocate the machines.
The second thought is using AzureML to do their model generation. However, depending on the size of the datasets, AzureML may not be able to scale. I have only used Azure ML on smaller datasets.
The third thought was using R? I don’t think R is the best answer here. Everything I know about R is that it is designed for data exploration and analysis of datasets that comfortably fit into the local machine’s memory. Performance on R is horrible and scaling R is a real challenge.
What about F#? So this might be a good answer. If you use the Hive Type Provider, you can get the benefits of HDInsight to do the processing and then have the goodness of the language syntax and REPL for data exploration. Also, the group could look at MBrace for some kick-butt distributed processing that can scale on Azure. Finally, if they don come up with some kind of insight that lends itself for building analytics or models into an app, you can take the code out of the script file and stick it into a compliable assembly all within Visual Studio.
What about Python? No idea, I don’t enough about it
What about Matlab, SAS, etc.. No idea. I stopped using those tools when R showed up.
What about Watson? No idea. I think I will have a better idea once I go to this.