Wake County Restaurant Inspection Data with Azure ML and F#
September 30, 2014 1 Comment
With Azure ML now available, I was thinking about some of the analysis I did last year and how I could do even more things with the same data set. One such analysis that came to mind was the restaurant inspection data that I analyzed last year. You can see the prior analysis here.
I uploaded the restaurant data into Azure and thought of a simple question –> can we predict inspection scores based on some easily available data? This is an interesting dataset because there are some categorical data elements (zip code, restaurant type, etc…) and there are some continuous ones (priority foundation, etc…).
Here is the base dataset:
I created a new experiment and I used a boosted regression model and a neural network regression and used a 70/30 train/test split.
After running the models and inspecting the model evaluation, I don’t have a very good model
I then decided to go back and pull some of the X variables out of the dataset and concentrate on only a couple of variables. I added a project column module and then selected Restaurant Type and Zip Code as the X variables and left the Inspection Score as the Y variable.
With this done, I added a couple of more models (Bayesian Linear Regression and a Decision Forest Regression) and gave it a whirl
Interesting, adding these models did not give us any better of a prediction and dropping the variables to two made a less accurate model. Without doing any more analysis, I picked the model with the lowest MAE )Boosted Decision Tree Regression) and published it at a web service:
I published it as a web service and now I can consume if from a client app. I used the code that I used for voting analysis found here as a template and sure enough:
["27519","Restaurant","0","96.0897827148438"]
["27612","Restaurant","0","95.5728530883789"]
So restaurants in Cary,NC have a higher inspection score than the ones found in Northwest Raleigh. However, before we start alerting the the Cary Chamber of Commerce to create a marketing campaign (“Eat in Cary, we are safer”), the difference is within the MAE.
In any event, it would be easy to create a phone app and you don’t know a restaurant score, you can punch in the establishment type and the zip code and have a good idea about the score of the restaurant.
This is an academic exercise b/c the establishments have to show you their card and yelp has their score on them, but a fun exercise none the less. Happy eating.
Pingback: F# Weekly #39-41, 2014 | Sergey Tihon's Blog