Restaurant Classification Via the Yellow Pages API Using F#
February 25, 2014 1 Comment
As part of the restaurant analysis I did for open data day, I built a crude classifier to identify Chinese restaurants. The classifier looked at the name of the establishment and if certain key words were in the name, it was tagged as a Chinese restaurant.
- member public x.IsEstablishmentAChineseRestraurant (establishmentName:string) =
- let upperCaseEstablishmentName = establishmentName.ToUpper()
- let numberOfMatchedWords = upperCaseEstablishmentName.Split(' ')
- |> Seq.map(fun x -> match x with
- | "ASIA" -> 1
- | "ASIAN" -> 1
- | "CHINA" -> 1
- | "CHINESE" -> 1
- | "PANDA" -> 1
- | "PEKING" -> 1
- | "WOK" -> 1
- | _ -> 0)
- |> Seq.sum
- match numberOfMatchedWords with
- | 0 -> false
- | _ -> true
Although this worked well enough for the analysis, I was interested in seeing if there was a way of using something that is more precise. To that end, I thought of the Yellow Pages – they classify restaurants into categories and assuming that the restaurant is in the yellow pages, it is a better way to determine the restaurant category versus just a name search.
The first thing I did was head over to the Yellow Pages (YP.com) website and sure enough, they have an API and a developers program. I signed up and had an API key within a couple of minutes.
The first thing I did was to try and search for a restaurant in the browser. I picked the first restaurant I came across in the dataset – Jumbo China #5. I created a request uri based on their API like so
When I plugged the name into the browser, I got this:
After screwing around with the code for about ten minutes thinking it was my API Key (Invalid Key would lead you to believe that, no?), Mike Thomas came over and told me that the url encoding was messing with my request – specifically the ‘#’ in Jumbo China #5. When I removed the # symbol, I got Json back:
Throwing the Json into Json2CSharp, the results look great:
I then took this URL and tried to load it into a F# type provider, I couldn’t understand why I was getting a red squiggly line of approbation (Json and XML):
so I pulled out Fiddler to see I was getting a 400. Digging into the response value, I found that “User Agent” was a required field.
The problem was then compounded because the FSharp Json type provider does not allow you to enter a User Agent into the constructor. I headed over to Stack Overflow where Thomas Petricek was kind enough to answer the question – basically you have to use the FSharp Http class to make the request (which you can add the user agent to) and then parse the response via the JsonProvider using the “Parse” versus the “Load” method. So spinning up the method like so:
This gave me the results back that I wanted. I then created a couple of methods to clean up any characters that might screw up the url encoding, added some argument validation, and I had a pretty good module to consume the YP.com listings:
- namespace ChickenSoftware.RestaurantClassifier
- open System
- open FSharp.Data
- open FSharp.Net
- type ypProvider = JsonProvider< @"YP.txt">
- type RestaurantCatagoryRepository() =
- member this.GetCatagories(restaurantName: string, restaurantAddress: string) =
- if(String.IsNullOrEmpty(restaurantName)) then
- failwith("restaurantName cannot be null or empty.")
- if(String.IsNullOrEmpty(restaurantAddress)) then
- failwith("restaurantAddress cannot be null or empty.")
- let cleanedName = this.CleanName(restaurantName)
- let cleanedAddress = this.CleanAddress(restaurantAddress);
- let uri = "http://pubapi.atti.com/search-api/search/devapi/search?term="+cleanedName+"&searchloc="+cleanedAddress+"&format=json&key=XXXXXX"
- let response = FSharp.Net.Http.Request(uri, headers=["user-agent", "None"])
- let ypResult = ypProvider.Parse(response)
- try
- ypResult.SearchResult.SearchListings.SearchListing.[0].Categories
- with
- | ex -> String.Empty
- member this.CleanName(name: string) =
- name.Replace("#","").Replace(" ","+")
- member this.CleanAddress(address: string)=
- address.Replace("#","").Replace(" ","+")
- member this.IsCatagoryInCatagories(catagories: string, catagory: string) =
- if(String.IsNullOrEmpty(catagories)) then false
- else if (String.IsNullOrEmpty(catagory)) then false
- else catagories.Contains(catagory)
- member this.IsRestaurantInCatagory(restaurantName: string, restaurantAddress: string, restaurantCatagory: string) =
- if(String.IsNullOrEmpty(restaurantName)) then
- failwith("restaurantName cannot be null or empty.")
- if(String.IsNullOrEmpty(restaurantAddress)) then
- failwith("restaurantAddress cannot be null or empty.")
- if(String.IsNullOrEmpty(restaurantCatagory)) then
- failwith("restaurantCatagory cannot be null or empty.")
- System.Threading.Thread.Sleep(new System.TimeSpan(0,0,1))
- let catagories = this.GetCatagories(restaurantName, restaurantAddress)
- if(String.IsNullOrEmpty(catagories)) then false
- else this.IsCatagoryInCatagories(catagories,restaurantCatagory)
- member this.IsRestaurantInCatagoryAsync(restaurantName: string, restaurantAddress: string, restaurantCatagory: string) =
- async {
- if(String.IsNullOrEmpty(restaurantName)) then
- failwith("restaurantName cannot be null or empty.")
- if(String.IsNullOrEmpty(restaurantAddress)) then
- failwith("restaurantAddress cannot be null or empty.")
- if(String.IsNullOrEmpty(restaurantCatagory)) then
- failwith("restaurantCatagory cannot be null or empty.")
- let catagories = this.GetCatagories(restaurantName, restaurantAddress)
- if(String.IsNullOrEmpty(catagories)) then return false
- else return this.IsCatagoryInCatagories(catagories,restaurantCatagory)
- }
The associated unit and integration tests that I made in building this module look like this:
- [TestClass]
- public class CatagoryBuilderTests
- {
- [TestMethod]
- public void CleanName_ReturnsExpectedValue()
- {
- RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
- String restaurantName = "Jumbo China #5";
- String expected = "Jumbo+China+5";
- String actual = repository.CleanName(restaurantName);
- Assert.AreEqual(expected, actual);
- }
- [TestMethod]
- public void CleanAddress_ReturnsExpectedValue()
- {
- RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
- String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
- String expected = "6108+Falls+Of+Neuse+Rd+27609";
- String actual = repository.CleanAddress(restaurantAddress);
- Assert.AreEqual(expected, actual);
- }
- [TestMethod]
- public void GetCatagories_ReturnsExpectedValue()
- {
- string restaurantName = "Jumbo China #5";
- String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
- RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
- var result = repository.GetCatagories(restaurantName, restaurantAddress);
- Assert.IsNotNull(result);
- }
- [TestMethod]
- public void CatagoryIsContainedInCatagoriesUsingValidTrueData_ReturnsExpectedValue()
- {
- RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
- String catagories = "Chinese Restaurants|Restaurants|";
- String catagory = "Chinese";
- Boolean expected = true;
- Boolean actual = repository.IsCatagoryInCatagories(catagories, catagory);
- Assert.AreEqual(expected, actual);
- }
- [TestMethod]
- public void CatagoryIsContainedInCatagoriesUsingValidFalseData_ReturnsExpectedValue()
- {
- RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
- String catagories = "Chinese Restaurants|Restaurants|";
- String catagory = "Seafood";
- Boolean expected = false;
- Boolean actual = repository.IsCatagoryInCatagories(catagories, catagory);
- Assert.AreEqual(expected, actual);
- }
- [TestMethod]
- public void IsJumboChinaAChineseRestaurant_ReturnsTrue()
- {
- RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
- string restaurantName = "Jumbo China #5";
- String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
- String restaurantCatagory = "Chinese";
- Boolean expected = true;
- Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
- Assert.AreEqual(expected, actual);
- }
- [TestMethod]
- public void IsJumboChinaAnItalianRestaurant_ReturnsFalse()
- {
- RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
- string restaurantName = "Jumbo China #5";
- String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
- String restaurantCatagory = "Italian";
- Boolean expected = false;
- Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
- Assert.AreEqual(expected, actual);
- }
- [TestMethod]
- public void IsUnknownAnItalianRestaurant_ReturnsFalse()
- {
- RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
- string restaurantName = "Some Unknown Restaurant";
- String restaurantAddress = "Some Address";
- String restaurantCatagory = "Italian";
- Boolean expected = false;
- Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
- Assert.AreEqual(expected, actual);
- }
- [TestMethod]
- public void CatagoryIsContainedInCatagoriesUsingEmptyCatagory_ReturnsExpectedValue()
- {
- RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
- String catagories = "Chinese Restaurants|Restaurants|";
- String catagory = String.Empty;
- Boolean expected = false;
- Boolean actual = repository.IsCatagoryInCatagories(catagories, catagory);
- Assert.AreEqual(expected, actual);
- }
The hardest test to get run green was the negative test – passing in a restaurant name that is not recognized
- [TestMethod]
- public void IsUnknownAnItalianRestaurant_ReturnsFalse()
- {
- RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
- string restaurantName = "Some Unknown Restaurant";
- String restaurantAddress = "Some Address";
- String restaurantCatagory = "Italian";
- Boolean expected = false;
- Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
- Assert.AreEqual(expected, actual);
- }
To code around the fact that a different set of Json came back and the original code is expecting a specific structure, I finally resorted to a try…catch
- try
- ypResult.SearchResult.SearchListings.SearchListing.[0].Categories
- with
- | ex -> String.Empty
I feel dirty, but I don’t know how else to get around it. In any event, I then coded up a module that pulled the list of restaurants from Azure and put them through the classifier.
- namespace ChickenSoftware.RestaurantClassifier
- open FSharp.Data
- open System.Linq
- open System.Configuration
- open Microsoft.FSharp.Linq
- open Microsoft.FSharp.Data.TypeProviders
- type internal SqlConnection = SqlEntityConnection<ConnectionStringName="azureData">
- type public RestaurantBuilder () =
- let connectionString = ConfigurationManager.ConnectionStrings.["azureData"].ConnectionString;
- member public this.GetRestaurants () =
- SqlConnection.GetDataContext(connectionString).Restaurants
- |> Seq.map(fun x -> x.EstablishmentName, x.EstablishmentAddress + " " + x.EstablishmnetZipCode)
- |> Seq.toArray
- member public this.GetChineseRestaurants () =
- let catagoryRepository = new RestaurantCatagoryRepository()
- let catagory = "Chinese"
- this.GetRestaurants()
- |> Seq.filter(fun (name, address) -> catagoryRepository.IsRestaurantInCatagory(name, address,catagory))
- |> Seq.toList
This code is almost identical to the code I posted 2 weeks ago. Sure enough, When I threw my integration tests at the functions, check out fiddler.
I was getting responses. I ran into the problem on the 50th request though.
To get around this occasional timeout issue, I threw in a second delay between each request, which seemed the solve the problem.
- System.Threading.Thread.Sleep(new System.TimeSpan(0,0,1))
- let catagories = this.GetCatagories(restaurantName, restaurantAddress)
- if(String.IsNullOrEmpty(catagories)) then false
- else this.IsCatagoryInCatagories(catagories,restaurantCatagory)
However, this then introduced a new problem. There are 4,000 or so restaurants, so that is over 66 minutes of running. Not good. Next week, I hope to add some parallelism to speed things up…
Pingback: F# Weekly #9, 2014 | Sergey Tihon's Blog