F# and the Open/Closed Principle

One of the advantages of using F# is that it is a .NET language.  Although F# is a functional-first language, it also supports object-oriented constructs.  One of the most powerful (indeed, the most powerful) technique in OO programming is using interfaces to follow the Open/Closed principle.  If you are not familiar, a good explanation of Open/Closed principle is found here.

As part of the F# for beginners dojo I am putting on next week, we are consuming and then analyzing Twitter.  The problem with always making calls to Twitter is that

1) The data changes every call

2) You might get throttled

Therefore, it makes good sense to have an in-memory representation of the data for testing and some Twitter data on disk so that different experiments can be run on the same data to see the result.  Using Interfaces in F# makes this a snap.

First, I created an interface:

  1. namespace NewCo.TwitterAnalysis
  2.  
  3. open System
  4. open System.Collections.Generic
  5.  
  6. type ITweeetProvider =
  7.    abstract member GetTweets : string -> IEnumerable<DateTime * int * string>

Next, I created the actual Twitter feed.  Note I am using TweetInvi (available on Nuget) and that this file has to be below the interface in the solution explorer:

  1. namespace NewCo.TwitterAnalysis
  2.  
  3. open System
  4. open System.Configuration
  5. open Tweetinvi
  6.  
  7. type TwitterProvider() =
  8.     interface ITweeetProvider with
  9.         member this.GetTweets(stockSymbol: string) =
  10.             let consumerKey = ConfigurationManager.AppSettings.["consumerKey"]
  11.             let consumerSecret = ConfigurationManager.AppSettings.["consumerSecret"]
  12.             let accessToken = ConfigurationManager.AppSettings.["accessToken"]
  13.             let accessTokenSecret = ConfigurationManager.AppSettings.["accessTokenSecret"]
  14.         
  15.             TwitterCredentials.SetCredentials(accessToken, accessTokenSecret, consumerKey, consumerSecret)
  16.             let tweets = Search.SearchTweets(stockSymbol);
  17.             tweets
  18.                 |> Seq.map(fun t -> t.CreatedAt, t.RetweetCount, t.Text)

 

I then hooked up a unit (integration, really) test

  1. [TestClass]
  2. public class UnitTest1
  3. {
  4.     [TestMethod]
  5.     public void GetTweetsUsingIBM_returnsExpectedValue()
  6.     {
  7.         ITweeetProvider provider = new TwitterProvider();
  8.         var actual = provider.GetTweets("IBM");
  9.         Assert.IsNotNull(actual);
  10.     }
  11. }

Sure enough, it ran green with actual Twitter data coming back:

image

I then created an In-Memory Tweet provider that can be used to:

1) Provide repeatable results

2) Have 0 external dependencies so that I can monkey with the code and a red unit test really does mean red

Here is its implementation:

  1. namespace NewCo.TwitterAnalysis
  2.  
  3. open System
  4. open System.Collections.Generic
  5.  
  6. type InMemoryProvider() =
  7.     interface ITweeetProvider with
  8.         member this.GetTweets(stockSymbol: string) =
  9.             let list = new List<(DateTime*int*string)>()
  10.             list.Add(DateTime.Now, 1,"Test1")
  11.             list.Add(DateTime.Now, 0,"Test2")
  12.             list :> IEnumerable<(DateTime*int*string)>

The only really interesting thing is the smiley/bird character (: >).  F# implements interfaces a bit differently than what I was used to –> F# implements interfaces explicitly.  I then fired up a true unit test and it also ran green:

  1. [TestClass]
  2. public class InMemoryProviderTests
  3. {
  4.     [TestMethod]
  5.     public void GetTweetsUsingValidInput_ReturnsExpectedValue()
  6.     {
  7.         ITweeetProvider provider = new InMemoryProvider();
  8.         var tweets = provider.GetTweets("TEST");
  9.         var tweetList = tweets.ToList();
  10.         Int32 expected = 2;
  11.         Int32 actual = tweetList.Count;
  12.         Assert.AreEqual(expected, actual);
  13.     }
  14. }

Finally, I created a file-system bound provider so that I can download and then hold static a large dataset.  Based on past experience dealing with on-line data sources, getting data local to run multiple tests against is generally a good idea.  Here is the implementation:

  1. namespace NewCo.TwitterAnalysis
  2.  
  3. open System
  4. open System.Collections.Generic
  5. open System.IO
  6.  
  7. type FileSystemProvider(filePath: string) =
  8.     interface ITweeetProvider with
  9.         member this.GetTweets(stockSymbol: string) =
  10.             let fileContents = File.ReadLines(filePath)
  11.                                 |> Seq.map(fun line -> line.Split([|'\t'|]))
  12.                                 |> Seq.map(fun values -> DateTime.Parse(values.[0]),int values.[1], string values.[2])
  13.             fileContents

And the covering unit (integration really) tests look like this:

  1. [TestClass]
  2. public class FileSystemProviderTests
  3. {
  4.     [TestMethod]
  5.     public void GetTweetsUsingValidInput_ReturnsExpectedValue()
  6.     {
  7.         var baseDir = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location);
  8.         var testFile = Path.Combine(baseDir, "TweetData.csv");
  9.         ITweeetProvider provider = new FileSystemProvider(testFile);
  10.         var tweets = provider.GetTweets("TEST");
  11.         var tweetList = tweets.ToList();
  12.         Int32 expected = 2;
  13.         Int32 actual = tweetList.Count;
  14.         Assert.AreEqual(expected, actual);
  15.     }
  16. }

Note that I had to add the actual file in the test project. 

image

Finally, the F# code needs to include try..catches for the external calls (web service and disk) and some argument validation for the strings come in.

In any event, I now have 3 different implementations that I can swap out depending on my needs.  I love having the power of Interfaces combined with benefits of using a functional-first language.

Consuming Twitter With F#

I set up a meetup for TRINUG’s F#/data analytics SIG to center around consuming and analyzing Tweets.  Since Twitter is just JSON, I assumed it would be easy enough to search Tweets for a given subjects in a given time period.  How wrong I was.  I spent several hours research different ways to consume Twitter to varying degrees of success.  My 1st stop was to investigate some of the more common libraries that C# developers use to consume Twitter.  Here is my survey of some of the more popular ones:

Twitterizer: No longer maintained

  1. // Install-Package twitterizer -Version 2.4.2
  2. // Update-Package Newtonsoft.Json -Reinstall
  3. open Twitterizer
  4.  
  5. type public TwitterProvider() =
  6.     member this.GetTweetsForDateRange(ticker:string, startDate: DateTime, endDate: DateTime) =
  7.         let consumerKey = ConfigurationManager.AppSettings.["consumerKey"]
  8.         let consumerSecret = ConfigurationManager.AppSettings.["consumerSecret"]
  9.         let accessToken = ConfigurationManager.AppSettings.["accessToken"]
  10.         let accessTokenSecret = ConfigurationManager.AppSettings.["accessTokenSecret"]
  11.         
  12.         let tokens = new OAuthTokens()
  13.         tokens.set_ConsumerKey(consumerKey)
  14.         tokens.set_ConsumerSecret(consumerSecret)
  15.         tokens.set_AccessToken(accessToken)
  16.         tokens.set_AccessTokenSecret(accessTokenSecret)
  17.  
  18.         let searchOptions = new SearchOptions()
  19.         searchOptions.SinceDate <- startDate
  20.         searchOptions.UntilDate <- endDate
  21.         let results = TwitterSearch.Search(tokens, ticker,searchOptions)
  22.         results.ResponseObject
  23.                     |> Seq.map(fun r -> r.CreatedDate, r.Text)

TweetSharp: No longer maintained

  1. open TweetSharp
  2.  
  3. type public TwitterProvider() =
  4.     member this.GetTweetsForDateRange(ticker:string, startDate: DateTime, endDate: DateTime) =
  5.         let consumerKey = ConfigurationManager.AppSettings.["consumerKey"]
  6.         let consumerSecret = ConfigurationManager.AppSettings.["consumerSecret"]
  7.         let accessToken = ConfigurationManager.AppSettings.["accessToken"]
  8.         let accessTokenSecret = ConfigurationManager.AppSettings.["accessTokenSecret"]
  9.         
  10.         let service = new TwitterService(consumerKey, consumerSecret)
  11.         service.AuthenticateWith(accessToken, accessTokenSecret)
  12.  
  13.         let searchOptions = new SearchOptions()
  14.         searchOptions.Q <- "IBM%20since%3A2014-03-01&src=typd"
  15.         service.Search(searchOptions).Statuses
  16.                                         |> Seq.map(fun s -> s.CreatedDate, s.Text)

Note that I did try and add a date range the way the Twitter API instructs, but it still came back with only 20 tweets.

LinqToTwitter: Active but nave to use Linq syntax.  Ugh!

Twitterinvi: Active but does not have date range functionality

  1. open System
  2. open System.Configuration
  3. open Tweetinvi
  4.  
  5. type public TwitterProvider() =
  6.     member this.GetTodaysTweets(ticker: string) =
  7.         let consumerKey = ConfigurationManager.AppSettings.["consumerKey"]
  8.         let consumerSecret = ConfigurationManager.AppSettings.["consumerSecret"]
  9.         let accessToken = ConfigurationManager.AppSettings.["accessToken"]
  10.         let accessTokenSecret = ConfigurationManager.AppSettings.["accessTokenSecret"]
  11.  
  12.         TwitterCredentials.SetCredentials(accessToken, accessTokenSecret, consumerKey, consumerSecret)
  13.         let tweets = Search.SearchTweets(ticker);
  14.         tweets |> Seq.map(fun t -> t.CreatedAt, t.RetweetCount)
  15.  
  16.     member this.GetTweetsForDateRange(ticker: string, startDate: DateTime)=
  17.         let consumerKey = ConfigurationManager.AppSettings.["consumerKey"]
  18.         let consumerSecret = ConfigurationManager.AppSettings.["consumerSecret"]
  19.         let accessToken = ConfigurationManager.AppSettings.["accessToken"]
  20.         let accessTokenSecret = ConfigurationManager.AppSettings.["accessTokenSecret"]
  21.  
  22.         TwitterCredentials.SetCredentials(accessToken, accessTokenSecret, consumerKey, consumerSecret)
  23.         let searchParameter = Search.GenerateSearchTweetParameter(ticker)
  24.         searchParameter.Until <- startDate;
  25.         let tweets = Search.SearchTweets(searchParameter);
  26.         tweets |> Seq.map(fun t -> t.CreatedAt, t.RetweetCount)

So without an out of the box API to use, I thought about using a Json Type Provider the way Lincoln Atkinson did.  The problem is that is example is for V1 of Twitter and V 1.1 uses Oauth.  If you run his code, you get

image

I then thought about a 3rd party API that captures Tweets.  I ran across gnip ($500!) and Topsy (no longer accepting new licenses b/c Apple bought them) so I am back to square one.

So finally I thought about rolling my own (with OAuth being the hard part) but I am quickly running out of time to get ready for the SIG and I don’t want to spend the time on only this part. 

Why isn’t there a Twitter type provider?  I’ll add it to the list….

Apriori Algorithm and F# Using Elevator Inspection Data

Now that I have the elevator dataset in a workable state, I wanted to see what I could see with the data.  I was reading Machine Learning In Action and the authors suggested that an Apriori Algorithm as a way to quantify associations among data points.  I read both Harrington’s code and Wikipedia’s description and I found both the be impenetrable – the former because their code was unreadable and the later because  the mathematical formulas depended on a level of algebra that I don’t have.

Fortunately, I found a C# project on Codeproject that had both an excellent example/introduction and C# code.  I used the examples on the website to formulate my F# implementation.

The first thing I did was create a class that matched the 1st grid in the example

image

  1. namespace ChickenSoftware.ElevatorChicken.Analysis
  2.  
  3. open System.Collections.Generic
  4.  
  5. type Transaction = {TID: string; Items: List<string> }
  6.  
  7. type Apriori(database: List<Transaction>, support: float, confidence: float) =
  8.     member this.Database = database
  9.     member this.Support = support
  10.     member this.Confidence = confidence

Note that because F# is immutable by default, the properties are read-only.  I then created a unit test project that makes sure the constructor works without exceptions.  The data matches the example:

  1. public AprioriTests()
  2. {
  3.     var database = new List<Transaction>();
  4.     database.Add(new Transaction("100", new List<string>() { "A", "C", "D" }));
  5.     database.Add(new Transaction("200", new List<string>() { "B", "C", "E" }));
  6.     database.Add(new Transaction("300", new List<string>() { "A", "B", "C", "E" }));
  7.     database.Add(new Transaction("400", new List<string>() { "B", "E" }));
  8.  
  9.     _apriori = new Apriori(database, .5, .80);
  10.  
  11. }
  12.  
  13. [TestMethod]
  14. public void ConstructorUsingValidArguments_ReturnsExpected()
  15. {
  16.     Assert.IsNotNull(_apriori);
  17. }

I then need a function to count up all of the items in the Itemsets.  I refused to use loops, so I first started using Seq.Fold, but I was having zero luck because I was trying to fold a Seq of List.  I then started experimenting with other functions when I found Seq.Collect – which was perfect.  So I created a function like this:

  1. member this.GetC1() =
  2.     database
  3.  
  4. member this.GetL1() =
  5.     let numberOfTransactions = this.GetC1().Count
  6.  
  7.     this.GetC1()
  8.         |> Seq.collect(fun d -> d.Items)
  9.         |> Seq.countBy(fun i -> i)
  10.         |> Seq.map(fun (t,i) -> t, i, float i/ float numberOfTransactions)
  11.         |> Seq.filter(fun (t,i,p) -> p >= support)
  12.         |> Seq.map(fun (t,i,p) -> t,i)
  13.         |> Seq.sort
  14.         |> Seq.toList

Note that the numberOfTransactions is for the database, not the individual items in the List<Item>.  And the results match the example:

imageimage

So this is great.  My next stop was to build a list of pair combinations of the remaining values

image

The trick is that is not a Cartesian join of the original sets – it is only the surviving sets that are needed.  My first attempt looked like:

  1. let C1 = database
  2.  
  3. let L1 = C1
  4.         |> Seq.map(fun t -> t.Items)
  5.         |> Seq.collect(fun i -> i)
  6.         |> Seq.countBy(fun i -> i)
  7.         |> Seq.map(fun (t,i) -> t, i, float i/ float numberOftransactions)
  8.         |> Seq.filter(fun (t,i,p) -> p >= support)
  9.         |> Seq.toArray
  10. let C2A = L1
  11.             |> Seq.map(fun (x,y,z) -> x)
  12.             |> Seq.toArray
  13. let C2B = L1
  14.             |> Seq.map(fun (x,y,z) -> x)
  15.             |> Seq.toArray
  16. let C2 = C2A |> Seq.collect(fun x -> C2B |> Seq.map(fun y -> x+y))
  17. C2   

With the output like this:

image

I was running out of Saturday morning so I went over to stack overflow and got a couple of responses.  I was on the right track with the concat, but I didn’t think about the List.Filter(), which would prune my list.  With this in mind, I copied Mark’s code and got what I was looking for

  1. member this.GetC2() =
  2.     let l1Itemset = this.GetL1()
  3.                     |> Seq.map(fun (i,s) -> i)
  4.  
  5.     let itemset =
  6.         l1Itemset
  7.             |> Seq.map(fun x -> l1Itemset |> Seq.map(fun y -> (x,y)))
  8.             |> Seq.concat
  9.             |> Seq.filter(fun (x,y) -> x < y)
  10.             |> Seq.sort
  11.             |> Seq.toList         
  12.     
  13.     let listContainsItem(l:List<string>, a,b) =
  14.             l.Contains(a) && l.Contains(b)
  15.     
  16.     let someFunctionINeedToRename(l1:List<string>, l2)=
  17.             l2 |> Seq.map(fun (x,y) -> listContainsItem(l1,x,y))
  18.  
  19.     let itemsetMatches = this.GetC1()
  20.                             |> Seq.map(fun t -> t.Items)
  21.                             |> Seq.map(fun i -> someFunctionINeedToRename(i,itemset))
  22.  
  23.     let itemSupport = itemsetMatches
  24.                             |> Seq.map(Seq.map(fun i -> if i then 1 else 0))
  25.                             |> Seq.reduce(Seq.map2(+))
  26.  
  27.     itemSupport
  28.         |> Seq.zip(itemset)
  29.         |> Seq.toList

So now I have C2 filling correctly:

image

 

Taking the results, I needed to get L2.

image

That was much simpler that getting C2 –> here is the code:

  1. member this.GetL2() =
  2.     let numberOfTransactions = this.GetC1().Count
  3.     
  4.     this.GetC2()
  5.             |> Seq.map(fun (i,n) -> i,n,float n/float numberOfTransactions)
  6.             |> Seq.filter(fun (i,n,p) -> p >= support)
  7.             |> Seq.map(fun (t,i,p) -> t,i)
  8.             |> Seq.sort
  9.             |> Seq.toList    

And when I run it – it matches this example exactly:

image

Finally, I added in a C# and L3.  This code is identical to the C2/L2 code with one exception: mapping a triple and not a tuple:  The C2 code maps like this

  1. let itemset =
  2.     l1Itemset
  3.         |> Seq.map(fun x -> l1Itemset |> Seq.map(fun y -> (x,y)))
  4.         |> Seq.concat
  5.         |> Seq.filter(fun (x,y) -> x < y)
  6.         |> Seq.sort
  7.         |> Seq.toList     

and the C3 code looks like this (took me 15 minutes to figure out line 3 below):

  1. let itemset =
  2.     l2Itemset
  3.         |> Seq.map(fun x -> l2Itemset |> Seq.map(fun y-> l2Itemset |> Seq.map(fun z->(fst x,fst y,snd z))))
  4.         |> Seq.concat
  5.         |> Seq.collect(fun d -> d)
  6.         |> Seq.filter(fun (x,y,z) -> x < y && y < z)
  7.         |> Seq.distinct
  8.         |> Seq.sort
  9.         |> Seq.toList    

With the C3 and L3 matching the example also:

image

image

 

I was now ready to put in the elevator data into the analysis.  I think I am getting better at F# because I did the mapping, filtering, and transformation of the data from the server without looking at any other material and it look only 15 minutes.

  1. type public ElevatorBuilder() =
  2.     let connectionString = ConfigurationManager.ConnectionStrings.["localData2"].ConnectionString;
  3.  
  4.     member public this.GetElevatorTransactions() =
  5.         let transactions = this.GetElevators()
  6.                               |> Seq.map(fun e ->this.ConvertElevatorToTransaction(e))
  7.         let transactionsList = new System.Collections.Generic.List<Transaction>(transactions)
  8.         transactionsList
  9.  
  10.     member public this.ConvertElevatorToTransaction(i: string, t:string, c:string, s:string) =
  11.         let items = new System.Collections.Generic.List<String>()
  12.         items.Add(t)
  13.         items.Add(c)
  14.         items.Add(s)
  15.         let transaction = {TID=i; Items=items}
  16.         transaction
  17.  
  18.     member public this.GetElevators () =
  19.         SqlConnection.GetDataContext(connectionString).ElevatorData201402
  20.             |> Seq.map(fun e -> e.ID, e.EquipType,e.Capacity,e.Speed)
  21.             |> Seq.filter(fun (i,et,c,s) -> not(String.IsNullOrEmpty(et)))
  22.             |> Seq.filter(fun (i,et,c,s) -> c.HasValue)
  23.             |> Seq.filter(fun (i,et,c,s) -> s.HasValue)
  24.             |> Seq.map(fun (i,t,c,s) -> i, this.CatagorizeEquipmentType(t),c,s)
  25.             |> Seq.map(fun (i,t,c,s) -> i,t,this.CatagorizeCapacity(c.Value),s)
  26.             |> Seq.map(fun (i,t,c,s) -> i,t,c,this.CatagorizeSpeed(s.Value))
  27.             |> Seq.map(fun (i,t,c,s) -> i.ToString(),t,c,s)

The longest part was aggregating the free-form text of the Equipment Type field (here is partial snip, you get the idea…)

  1. member public this.CatagorizeEquipmentType(et: string) =
  2.     match et.Trim() with
  3.         | "OTIS" -> "OTIS"
  4.         | "OTIS (1-2)" -> "OTIS"
  5.         | "OTIS (2-1)" -> "OTIS"
  6.         | "OTIS hydro" -> "OTIS"
  7.         | "OTIS, HYD" -> "OTIS"
  8.         | "OTIS/ ASHEVILLE " -> "OTIS"
  9.         | "OTIS/ MOUNTAIN " -> "OTIS"
  10.         | "OTIS/#1" -> "OTIS"
  11.         | "OTIS/#19 " -> "OTIS"

Assigning categories for speed and capacity was a snap using F#

  1. member public this.CatagorizeCapacity(c: int) =
  2.     let lowerBound = (c/25 * 25) + 1
  3.     let upperBound = lowerBound + 24
  4.     lowerBound.ToString() + "-" + upperBound.ToString()        
  5.  
  6. member public this.CatagorizeSpeed(s: int) =
  7.     let lowerBound = (s/50 * 50) + 1
  8.     let upperBound = lowerBound + 49
  9.     lowerBound.ToString() + "-" + upperBound.ToString()    

With this in hand, I created a Console app that takes the 27K records and pushes them though the apriori algorithm:

  1. private static void RunElevatorAnalysis()
  2. {
  3.     Stopwatch stopwatch = new Stopwatch();
  4.     stopwatch.Start();
  5.     ElevatorBuilder builder = new ElevatorBuilder();
  6.     var transactions = builder.GetElevatorTransactions();
  7.     stopwatch.Stop();
  8.     Console.WriteLine("Building " + transactions.Count + " transactions took: " + stopwatch.Elapsed.TotalSeconds);
  9.     var apriori = new Apriori(transactions, .1, .75);
  10.     var c2 = apriori.GetC2();
  11.     stopwatch.Reset();
  12.     stopwatch.Start();
  13.     var l1 = apriori.GetL1();
  14.     Console.WriteLine("Getting L1 took: " + stopwatch.Elapsed.TotalSeconds);
  15.     var l2 = apriori.GetL2();
  16.     Console.WriteLine("Getting L2 took: " + stopwatch.Elapsed.TotalSeconds);
  17.     var l3 = apriori.GetL3();
  18.     Console.WriteLine("Getting L3 took: " + stopwatch.Elapsed.TotalSeconds);
  19.     stopwatch.Stop();
  20.     Console.WriteLine("–L1");
  21.     foreach (var t in l1)
  22.     {
  23.         Console.WriteLine(t.Item1 + ":" + t.Item2);
  24.     }
  25.     Console.WriteLine("–L2");
  26.     foreach (var t in l2)
  27.     {
  28.         Console.WriteLine(t.Item1 + ":" + t.Item2);
  29.     }
  30.     Console.WriteLine("–L3");
  31.     foreach (var t in l3)
  32.     {
  33.         Console.WriteLine(t.Item1 + ":" + t.Item2);
  34.     }
  35. }

I then made an offering to the F# Gods and hit F5:

image

Doh!  The gods were not pleased.  I then went back to my initial filtering function and added a Seq.Take(25000) and the results:

image

So there a couple of things to draw from this exercise.

1) Apriori Algorithm is the wrong classification technique for this dataset.  I had to bring the support way down (10%) to even get any readings.  Also, there is too much dispersion of the values.  This kind of algorithm is much better with N number of a smaller set of data values versus a fixed number of large values.

2) Even so, how cool is this?  Compare the files just to make the C#/OO work versus with F#

imageimage

And the Total LOC is 539 for C# versus 120 for F# – and the F# can be optimized using a better way to create search and itemsets.  Hard-coding each level was a hack I did to get thing working and give me an understanding of how AA works.  I bet this can be consolidated to well under 75 lines without sacrificing readability

3) I think the StackOverflow exception is because I am doing a Cartesian join and then paring the result.  Using one of the other techniques suggested on SO will give much better results.

I any event, what a fun project!  I can’t wait to optimize this and perhaps throw a different algorithm at the dataset in the coming weeks.

 

 

 

Elevator App: Part 1 – Data Layer Using F#

 

At Open Data Day, fellow TRINUGER Elaine Cahill told me about a website where you can get all of the elevator inspection data for the state.  It is found here.  She went ahead and put the Wake County data onto Socrata.  I wanted to look at the entire state so I went to the report page like so:

 

image

Unfortunately, when you try and pull down the entire state, you cause a server exception:

 

image

 

So I split the download in half.  I then Imported it into Access and then SSISed it into Azure Sql.  I then created a project to server the data and I decided to use F# type providers as a replacement for Entity Framework for my ORM.  I could either use the SqlEntity TP or the SqlDataConnection TP to access the Sql Database on Azure.  Both do not work out of the box.

SqlDataConnection

I could not get  SqlDataConnection to work at all.  When I hooked it up to a standard connection string in the config file, I got:

image

So when I copy and paste the connection string into the TP directly, it does make the connection to Azure, but then it comes back with this exception:

image

Without looking at the source. my guess is that the TP has hard-coded a reference to ‘syscomments’ and alas, Azure does not have that table.

SqlEntity

I then headed over to the SlqEntityTP to see if I could have better luck.  Fortunately, the SqlEntity does work with both an Azure connection string in the .config file and can make a connection to an Azure database.

The problem I ran into was when I wanted to expose the SqlConnection the the WebAPI project that I wrote in C#.  You can not mark SqlEntityTPs as public:

image

Note that the SqlDataConnection can be marked as public. <sigh>.  I marked the SqlEntityTP as internal and then created a POCO to map between the SqlEntity type and a type that can be consumed by the outside world:

  1. type public Elevator ={
  2.         ID: int
  3.         County: string
  4.         StateId: string
  5.         Type: string
  6.         Operation: string
  7.         Owner: string
  8.         O_Address1: string
  9.         O_Address2: string
  10.         O_City: string
  11.         O_State: string
  12.         O_Zip: string
  13.         User: string
  14.         U_Address1: string
  15.         U_Address2: string
  16.         U_City: string
  17.         U_State: string
  18.         U_Zip: string
  19.         U_Lat: double
  20.         U_Long: double
  21.         Installed: DateTime
  22.         Complied: DateTime
  23.         Capacity: int
  24.         CertStatus: int
  25.         EquipType: string
  26.         Drive: string
  27.         Volts: string
  28.         Speed: int
  29.         FloorTo: string
  30.         FloorFrom: string
  31.         Landing: string
  32.         Entrances: string
  33.         Ropes: string
  34.         RopeSize: string
  35.     }
  36.  
  37. type public DataRepository() =
  38.     let connectionString = ConfigurationManager.ConnectionStrings.["azureData"].ConnectionString;
  39.  
  40.     member public this.GetElevators () =
  41.         SqlConnection.GetDataContext(connectionString).ElevatorData201402
  42.         |> Seq.map(fun x -> this.GetElevatorFromElevatorData(x))
  43.  
  44.     member public this.GetElevator (id: int) =
  45.         SqlConnection.GetDataContext(connectionString).ElevatorData201402
  46.         |> Seq.where(fun x -> x.ID = id)
  47.         |> Seq.map(fun x -> this.GetElevatorFromElevatorData(x))
  48.         |> Seq.head
  49.  
  50.     member internal this.GetElevatorFromElevatorData(elevatorData: SqlConnection.ServiceTypes.ElevatorData201402) =
  51.         let elevator = {ID= elevatorData.ID;
  52.             County=elevatorData.County;
  53.             StateId=elevatorData.StateID;
  54.             Type=elevatorData.Type;
  55.             Operation=elevatorData.Operation;
  56.             Owner=elevatorData.Owner;
  57.             O_Address1=elevatorData.O_Address1;
  58.             O_Address2=elevatorData.O_Address2;
  59.             O_City=elevatorData.O_City;
  60.             O_State=elevatorData.O_St;
  61.             O_Zip=elevatorData.O_Zip;
  62.             User=elevatorData.User;
  63.             U_Address1=elevatorData.U_Address1;
  64.             U_Address2=elevatorData.U_Address2;
  65.             U_City=elevatorData.U_City;
  66.             U_State=elevatorData.U_St;
  67.             U_Zip=elevatorData.U_Zip;
  68.             U_Lat=elevatorData.U_lat;
  69.             U_Long=elevatorData.U_long;
  70.             Installed=elevatorData.Installed.Value;
  71.             Complied=elevatorData.Complied.Value;
  72.             Capacity=elevatorData.Capacity.Value;
  73.             CertStatus=elevatorData.CertStatus.Value;
  74.             EquipType=elevatorData.EquipType;
  75.             Drive=elevatorData.Drive;
  76.             Volts=elevatorData.Volts;
  77.             Speed=int elevatorData.Speed;
  78.             FloorTo=elevatorData.FloorTo;
  79.             FloorFrom=elevatorData.FloorFrom;
  80.             Landing=elevatorData.Landing;
  81.             Entrances=elevatorData.Entrances;
  82.             Ropes=elevatorData.Ropes;
  83.             RopeSize=elevatorData.RopeSize
  84.         }
  85.         elevator

I am not happy about writing any of this code.  I have 84 lines of code for a single class.  I might have well used the code code gen of EF.  I could have taken the performance hit and used System.Reflection to map field of the same names (I have done that on other projects) , but that also feels like a hack.   In any event, I then added a reference to my F# project in my C# WebAPI project.  I did have to add a reference to FSharp.Core in the C# project (which further vexed me), but then I created a couple of GET methods to expose the data:

 

  1. public class ElevatorController : ApiController
  2. {
  3.     // GET api/Elevator
  4.     public IEnumerable<Elevator> Get()
  5.     {
  6.         DataRepository repository = new DataRepository();
  7.         return repository.GetElevators();
  8.     }
  9.  
  10.     // GET api/Elevator/5
  11.     public Elevator Get(int id)
  12.     {
  13.         DataRepository repository = new DataRepository();
  14.         return repository.GetElevator(id);
  15.     }
  16.  
  17. }

 

When I viewed the JSON from a handy browser, it looks like, well, junk:

image

So now I have to get rid of that random characters (x0040 suffix)– yet a 3rd POCO, this one in C#:

  1. public class ElevatorController : ApiController
  2. {
  3.     // GET api/Elevator
  4.     public IEnumerable<CS.Elevator> Get()
  5.     {
  6.         List<CS.Elevator> elevators = new List<CS.Elevator>();
  7.         FS.DataRepository repository = new FS.DataRepository();
  8.         var fsElevators = repository.GetElevators();
  9.         foreach (var fsElevator in fsElevators)
  10.         {
  11.             elevators.Add(GetElevatorFromFSharpElevator(fsElevator));
  12.         }
  13.         return elevators;
  14.     }
  15.  
  16.     // GET api/Elevator/5
  17.     public CS.Elevator Get(int id)
  18.     {
  19.         FS.DataRepository repository = new FS.DataRepository();
  20.         return GetElevatorFromFSharpElevator(repository.GetElevator(id));
  21.     }
  22.  
  23.     internal CS.Elevator GetElevatorFromFSharpElevator(FS.Elevator fsElevator)
  24.     {
  25.         CS.Elevator elevator = new CS.Elevator();
  26.         elevator.ID = fsElevator.ID;
  27.         elevator.County = fsElevator.County;
  28.         elevator.StateId = fsElevator.StateId;
  29.         elevator.Type = fsElevator.Type;
  30.         elevator.Operation = fsElevator.Operation;
  31.         elevator.Owner = fsElevator.Owner;
  32.         elevator.O_Address1 = fsElevator.O_Address1;
  33.         elevator.O_Address2 = fsElevator.O_Address2;
  34.         elevator.O_City = fsElevator.O_City;
  35.         elevator.O_State = fsElevator.O_State;
  36.         elevator.O_Zip = fsElevator.O_Zip;
  37.         elevator.User = fsElevator.User;
  38.         elevator.U_Address1 = fsElevator.U_Address1;
  39.         elevator.U_Address2 = fsElevator.U_Address2;
  40.         elevator.U_City = fsElevator.U_City;
  41.         elevator.U_State = fsElevator.U_State;
  42.         elevator.U_Zip = fsElevator.U_Zip;
  43.         elevator.Installed = fsElevator.Installed;
  44.         elevator.Complied = fsElevator.Complied;
  45.         elevator.Capacity = fsElevator.Capacity;
  46.         elevator.CertStatus = fsElevator.CertStatus;
  47.         elevator.EquipType = fsElevator.EquipType;
  48.         elevator.Drive = fsElevator.Drive;
  49.         elevator.Volts = fsElevator.Volts;
  50.         elevator.Speed = fsElevator.Speed;
  51.         elevator.FloorTo = fsElevator.FloorTo;
  52.         elevator.FloorFrom = fsElevator.FloorFrom;
  53.         elevator.Landing = fsElevator.Landing;
  54.         elevator.Entrances = fsElevator.Entrances;
  55.         elevator.Ropes = fsElevator.Ropes;
  56.         elevator.RopeSize = fsElevator.RopeSize;
  57.         return elevator;
  58.     }
  59.  
  60. }

 

So that gives me that I want…

image

As a side note, I learned the hard way that the only way to force the SqlEntityTP to update based on a schema  change in the DB is to change the connection string in the .config file

Finally, when I published the WebAPI project to Azure, I got an exception. 

<Error><Message>An error has occurred.</Message><ExceptionMessage>Could not load file or assembly 'FSharp.Core, Version=4.3.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.</ExceptionMessage><ExceptionType>System.IO.FileNotFoundException</ExceptionType><StackTrace> at System.Web.Http.ApiController.<InvokeActionWithExceptionFilters>d__1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Web.Http.Dispatcher.HttpControllerDispatcher.<SendAsync>d__0.MoveNext()</StackTrace

Turns out you need to not only add a reference to the F# project and FSharp.Core, you have to deploy the .dlls to Azure also.  Thanks to hocho on SO for that one.

In conclusion, I love the promise of TPs.  I want nothing more than to throw away all of the EF code-gen, .tt files, seeding for code-first nonsense, etc… and replace it with a single line TP.  I have done this on a local project, but when I did it with an Azure, things were harder than they should be.  Since it is easier to throw hand grenades than catch them, I made a list of the things I want to help the open source FSharp.Data project accomplish in the coming months:

1) SqlDatabaseConnection working with Azure Sql Storage

2) MSAccessConnection needed

3) ActiveDirectoryConnection needed

4) Json and WsdlService ability to handle proxies

5) SqlEntityConnection exposing classes publicly

Regardless of what the open-source community does, MSFT will still have to make a better commitment to F# on Azure, IMHO…

Restaurant Classifier: Async For Faster Performance?

Going back to my restaurant classifier using F# from last week, I decided to speed things up some.  Each request to the Yellow Pages API takes 1 second, so with the 5,682 records, I am looking at a little over 1.5 hours to pull down the data when running serial.

I first thought about making my methods async so I changed the API call method to async and used the Http.AsyncRequest method like so (line 10 below):

  1. member this.GetCatagoriesAsync(restaurantName: string, restaurantAddress: string) =
  2.          async{
  3.              if(String.IsNullOrEmpty(restaurantName)) then
  4.                  failwith("restaurantName cannot be null or empty.")
  5.              if(String.IsNullOrEmpty(restaurantAddress)) then
  6.                  failwith("restaurantAddress cannot be null or empty.")
  7.              let cleanedName = this.CleanName(restaurantName)
  8.              let cleanedAddress = this.CleanAddress(restaurantAddress);
  9.              let uri = "http://pubapi.atti.com/search-api/search/devapi/search?term=&quot;+cleanedName+"&searchloc="+cleanedAddress+"&format=json&key=qj5l8pphj5"
  10.              let! response = FSharp.Net.Http.AsyncRequest(uri, headers=["user-agent", "None"])
  11.              let ypResult = ypProvider.Parse(response)
  12.              try
  13.                  return ypResult.SearchResult.SearchListings.SearchListing.[0].Categories
  14.              with
  15.                  | ex -> return String.Empty
  16.          }

I then made the covering function async also (line 11 below)

  1. member this.IsRestaurantInCatagoryAsync(restaurantName: string, restaurantAddress: string, restaurantCatagory: string) =
  2.     async {
  3.         if(String.IsNullOrEmpty(restaurantName)) then
  4.             failwith("restaurantName cannot be null or empty.")
  5.         if(String.IsNullOrEmpty(restaurantAddress)) then
  6.             failwith("restaurantAddress cannot be null or empty.")
  7.         if(String.IsNullOrEmpty(restaurantCatagory)) then
  8.             failwith("restaurantCatagory cannot be null or empty.")
  9.  
  10.         System.Threading.Thread.Sleep(new System.TimeSpan(0,0,1))
  11.         let! catagories = this.GetCatagoriesAsync(restaurantName, restaurantAddress)
  12.         if(String.IsNullOrEmpty(catagories)) then return false
  13.         else return this.IsCatagoryInCatagories(catagories,restaurantCatagory)
  14.     }

The problem is that invoking the covering function via an anonymous method did not work easily.

image

After screwing around with the synax a bit, I went over to stack overflow where I found out two things:

  • There is not an easy way to do it (I was hoping for a Seq.FilterAsyc method)
  • Thomas Petricek is above my pay-grade. 

In any event, I decided to drop the async and just look at parallelism.   Turns out that there is a Parallel Seq class called PSeq, it is just not in the FSharp core library yet.   I created a PSeq file in my project, moved it to the top and dropped the code in.   I then changed the method call to use PSeq to invoke the serial methods:

  1. member public this.GetChineseRestaurants () =
  2.     let catagoryRepository = new RestaurantCatagoryRepository()
  3.     let catagory = "Chinese"
  4.     this.GetRestaurants()
  5.             |> PSeq.filter(fun (name, address) -> catagoryRepository.IsRestaurantInCatagory(name, address,catagory))
  6.             |> Seq.toList    

When I first invoked it and looked at Fiddler (OT: did anyone notice that Fiddler’s new logo looks alot like a FSharp one?  Probably just a coincidence), it was clear that things were running in parallel and that performance would improve.  I have two cores on this workstation so my time be cut in half. 

image

With the parallel method in my back pocket, I decided to see the ultimate result of the restaurant classification.  I created a quick console app

  1. class Program
  2. {
  3.     static void Main(string[] args)
  4.     {
  5.         Console.WriteLine("Start");
  6.  
  7.         Stopwatch stopwatch = new Stopwatch();
  8.         stopwatch.Start();
  9.         RestaurantBuilder builder = new RestaurantBuilder();
  10.         var restaurants = builder.GetChineseRestaurants();
  11.         
  12.         foreach (var restaurant in restaurants)
  13.         {
  14.             Console.WriteLine(restaurant.Item1 + ":" + restaurant.Item2);
  15.         }
  16.         
  17.         stopwatch.Stop();
  18.         Console.WriteLine("Number of Chinese Restaurants: " + restaurants.Count());
  19.         Console.WriteLine(stopwatch.Elapsed.ToString());
  20.         Console.WriteLine("End");
  21.         Console.ReadKey();
  22.     }
  23. }

I then ran the search on YP.com using my 4 core laptop and got the following results:

image

Compared to my original classifier based on name:

image

So the results make sense.  The YP serial search would take at least 94.7 minutes, the YP parallel search took 41 minutes, and the in-memory name search took 3 seconds.  The YP search(s) found restaurants that the name did not (Wang’s Kitchen, Crazy Fire Mongolian Grill, etc…) – 275 to 221, or 24% more restaurants.

I think that the next step is to look at the classifier and see how many restaurants are in both datasets and why the ones that are not in the YP one – where they are (did they even pay to be in the Yellow Pages?).  Perhaps there is another YP category that can be considered.  Also, it would be interesting to see of the restaurants that are in the name search and in the Yellow Pages that were not classified as Chinese – the false positive rate.  Finally, I did see some 500s in Fiddler that had “read time out” so there is room for improvement to account for the transient faults…

Restaurant Classification Via the Yellow Pages API Using F#

As part of the restaurant analysis I did for open data day, I built a crude classifier to identify Chinese restaurants.  The classifier looked at the name of the establishment and if certain key words were in the name, it was tagged as a Chinese restaurant.

  1. member public x.IsEstablishmentAChineseRestraurant (establishmentName:string) =
  2.     let upperCaseEstablishmentName = establishmentName.ToUpper()
  3.     let numberOfMatchedWords = upperCaseEstablishmentName.Split(' ')
  4.                                 |> Seq.map(fun x -> match x with
  5.                                                         | "ASIA" -> 1
  6.                                                         | "ASIAN" -> 1
  7.                                                         | "CHINA" -> 1
  8.                                                         | "CHINESE" -> 1
  9.                                                         | "PANDA" -> 1
  10.                                                         | "PEKING" -> 1
  11.                                                         | "WOK" -> 1
  12.                                                         | _ -> 0)
  13.                                 |> Seq.sum
  14.     match numberOfMatchedWords with
  15.         | 0 -> false
  16.         | _ -> true

Although this worked well enough for the analysis, I was interested in seeing if there was a way of using something that is more precise.  To that end, I thought of the Yellow Pages – they classify restaurants into categories and assuming that the restaurant is in the yellow pages, it is a better way to determine the restaurant category versus just a name search.

The first thing I did was head over to the Yellow Pages (YP.com) website and sure enough, they have an API and a developers program.  I signed up and had an API key within a couple of minutes.

The first thing I did was to try and search for a restaurant in the browser.  I picked the first restaurant I came across in the dataset – Jumbo China #5.  I created a request uri based on their API like so

http://pubapi.atti.com/search-api/search/devapi/search?term=Jumbo+China+5&searchloc=6108+Falls+Of+Neuse+Rd+27609&format=json&key=XXXXXXXXXX

When I plugged the name into the browser, I got this:

image

After screwing around with the code for about ten minutes thinking it was my API Key (Invalid Key would lead you to believe that, no?), Mike Thomas came over and told me that the url encoding was messing with my request – specifically the ‘#’ in Jumbo China #5.  When I removed the # symbol, I got Json back:

image

Throwing the Json into Json2CSharp, the results look great:

image

I then took this URL and tried to load it into a F# type provider, I couldn’t understand why I was getting a red squiggly line of approbation (Json and XML):

image

 

so I pulled out Fiddler to see I was getting a 400.  Digging into the response value, I found that “User Agent” was a required field. 

image

The problem was then compounded because the FSharp Json type provider does not allow you to enter a User Agent into the constructor.  I headed over to Stack Overflow where Thomas Petricek was kind enough to answer the question – basically you have to use the FSharp Http class to make the request (which you can add the user agent to) and then parse the response via the JsonProvider using the “Parse” versus the “Load” method.  So spinning up the method like so:

image

This gave me the results back that I wanted.  I then created a couple of methods to clean up any characters that might screw up the url encoding, added some argument validation, and I had a pretty good module to consume the YP.com listings:

  1. namespace ChickenSoftware.RestaurantClassifier
  2.  
  3. open System
  4. open FSharp.Data
  5. open FSharp.Net
  6.  
  7. type ypProvider = JsonProvider< @"YP.txt">
  8.  
  9. type RestaurantCatagoryRepository() =
  10.    member this.GetCatagories(restaurantName: string, restaurantAddress: string) =
  11.         if(String.IsNullOrEmpty(restaurantName)) then
  12.             failwith("restaurantName cannot be null or empty.")
  13.         if(String.IsNullOrEmpty(restaurantAddress)) then
  14.             failwith("restaurantAddress cannot be null or empty.")
  15.         let cleanedName = this.CleanName(restaurantName)
  16.         let cleanedAddress = this.CleanAddress(restaurantAddress);
  17.         let uri = "http://pubapi.atti.com/search-api/search/devapi/search?term=&quot;+cleanedName+"&searchloc="+cleanedAddress+"&format=json&key=XXXXXX"
  18.         let response = FSharp.Net.Http.Request(uri, headers=["user-agent", "None"])
  19.         let ypResult = ypProvider.Parse(response)
  20.         try
  21.             ypResult.SearchResult.SearchListings.SearchListing.[0].Categories
  22.         with
  23.             | ex -> String.Empty
  24.  
  25.     member this.CleanName(name: string) =
  26.                 name.Replace("#","").Replace(" ","+")
  27.  
  28.     member this.CleanAddress(address: string)=
  29.                 address.Replace("#","").Replace(" ","+")
  30.     
  31.     member this.IsCatagoryInCatagories(catagories: string, catagory: string) =
  32.         if(String.IsNullOrEmpty(catagories)) then false
  33.         else if (String.IsNullOrEmpty(catagory)) then false
  34.         else catagories.Contains(catagory)
  35.  
  36.     member this.IsRestaurantInCatagory(restaurantName: string, restaurantAddress: string, restaurantCatagory: string) =
  37.         if(String.IsNullOrEmpty(restaurantName)) then
  38.             failwith("restaurantName cannot be null or empty.")
  39.         if(String.IsNullOrEmpty(restaurantAddress)) then
  40.             failwith("restaurantAddress cannot be null or empty.")
  41.         if(String.IsNullOrEmpty(restaurantCatagory)) then
  42.             failwith("restaurantCatagory cannot be null or empty.")
  43.  
  44.         System.Threading.Thread.Sleep(new System.TimeSpan(0,0,1))
  45.         let catagories = this.GetCatagories(restaurantName, restaurantAddress)
  46.         if(String.IsNullOrEmpty(catagories)) then false
  47.         else this.IsCatagoryInCatagories(catagories,restaurantCatagory)
  48.  
  49.     member this.IsRestaurantInCatagoryAsync(restaurantName: string, restaurantAddress: string, restaurantCatagory: string) =
  50.         async {
  51.             if(String.IsNullOrEmpty(restaurantName)) then
  52.                 failwith("restaurantName cannot be null or empty.")
  53.             if(String.IsNullOrEmpty(restaurantAddress)) then
  54.                 failwith("restaurantAddress cannot be null or empty.")
  55.             if(String.IsNullOrEmpty(restaurantCatagory)) then
  56.                 failwith("restaurantCatagory cannot be null or empty.")
  57.  
  58.             let catagories = this.GetCatagories(restaurantName, restaurantAddress)
  59.             if(String.IsNullOrEmpty(catagories)) then return false
  60.             else return this.IsCatagoryInCatagories(catagories,restaurantCatagory)
  61.         }

The associated unit and integration tests that I made in building this module look like this:

  1. [TestClass]
  2. public class CatagoryBuilderTests
  3. {
  4.  
  5.     [TestMethod]
  6.     public void CleanName_ReturnsExpectedValue()
  7.     {
  8.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  9.         String restaurantName = "Jumbo China #5";
  10.  
  11.         String expected = "Jumbo+China+5";
  12.         String actual = repository.CleanName(restaurantName);
  13.         Assert.AreEqual(expected, actual);
  14.     }
  15.  
  16.     [TestMethod]
  17.     public void CleanAddress_ReturnsExpectedValue()
  18.     {
  19.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  20.         String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
  21.  
  22.         String expected = "6108+Falls+Of+Neuse+Rd+27609";
  23.         String actual = repository.CleanAddress(restaurantAddress);
  24.         Assert.AreEqual(expected, actual);
  25.     }
  26.  
  27.  
  28.     [TestMethod]
  29.     public void GetCatagories_ReturnsExpectedValue()
  30.     {
  31.         string restaurantName = "Jumbo China #5";
  32.         String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
  33.  
  34.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  35.         var result = repository.GetCatagories(restaurantName, restaurantAddress);
  36.         Assert.IsNotNull(result);
  37.     }
  38.  
  39.     [TestMethod]
  40.     public void CatagoryIsContainedInCatagoriesUsingValidTrueData_ReturnsExpectedValue()
  41.     {
  42.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  43.  
  44.         String catagories = "Chinese Restaurants|Restaurants|";
  45.         String catagory = "Chinese";
  46.  
  47.         Boolean expected = true;
  48.         Boolean actual = repository.IsCatagoryInCatagories(catagories, catagory);
  49.  
  50.         Assert.AreEqual(expected, actual);
  51.     }
  52.  
  53.     [TestMethod]
  54.     public void CatagoryIsContainedInCatagoriesUsingValidFalseData_ReturnsExpectedValue()
  55.     {
  56.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  57.  
  58.         String catagories = "Chinese Restaurants|Restaurants|";
  59.         String catagory = "Seafood";
  60.  
  61.         Boolean expected = false;
  62.         Boolean actual = repository.IsCatagoryInCatagories(catagories, catagory);
  63.  
  64.         Assert.AreEqual(expected, actual);
  65.     }
  66.  
  67.     [TestMethod]
  68.     public void IsJumboChinaAChineseRestaurant_ReturnsTrue()
  69.     {
  70.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  71.  
  72.         string restaurantName = "Jumbo China #5";
  73.         String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
  74.         String restaurantCatagory = "Chinese";
  75.  
  76.         Boolean expected = true;
  77.         Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
  78.  
  79.         Assert.AreEqual(expected, actual);
  80.     }
  81.  
  82.     [TestMethod]
  83.     public void IsJumboChinaAnItalianRestaurant_ReturnsFalse()
  84.     {
  85.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  86.  
  87.         string restaurantName = "Jumbo China #5";
  88.         String restaurantAddress = "6108 Falls Of Neuse Rd 27609";
  89.         String restaurantCatagory = "Italian";
  90.  
  91.         Boolean expected = false;
  92.         Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
  93.  
  94.         Assert.AreEqual(expected, actual);
  95.     }
  96.  
  97.     [TestMethod]
  98.     public void IsUnknownAnItalianRestaurant_ReturnsFalse()
  99.     {
  100.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  101.  
  102.         string restaurantName = "Some Unknown Restaurant";
  103.         String restaurantAddress = "Some Address";
  104.         String restaurantCatagory = "Italian";
  105.  
  106.         Boolean expected = false;
  107.         Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
  108.  
  109.         Assert.AreEqual(expected, actual);
  110.     }
  111.  
  112.  
  113.  
  114.     [TestMethod]
  115.     public void CatagoryIsContainedInCatagoriesUsingEmptyCatagory_ReturnsExpectedValue()
  116.     {
  117.         RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  118.  
  119.         String catagories = "Chinese Restaurants|Restaurants|";
  120.         String catagory = String.Empty;
  121.  
  122.         Boolean expected = false;
  123.         Boolean actual = repository.IsCatagoryInCatagories(catagories, catagory);
  124.  
  125.         Assert.AreEqual(expected, actual);
  126.     }

The hardest test to get run green was the negative test – passing in a restaurant name that is not recognized

  1. [TestMethod]
  2. public void IsUnknownAnItalianRestaurant_ReturnsFalse()
  3. {
  4.     RestaurantCatagoryRepository repository = new RestaurantCatagoryRepository();
  5.  
  6.     string restaurantName = "Some Unknown Restaurant";
  7.     String restaurantAddress = "Some Address";
  8.     String restaurantCatagory = "Italian";
  9.  
  10.     Boolean expected = false;
  11.     Boolean actual = repository.IsRestaurantInCatagory(restaurantName, restaurantAddress, restaurantCatagory);
  12.  
  13.     Assert.AreEqual(expected, actual);
  14. }

To code around the fact that a different set of Json came back and the original code is expecting a specific structure, I finally resorted to a try…catch

  1. try
  2.     ypResult.SearchResult.SearchListings.SearchListing.[0].Categories
  3. with
  4.     | ex -> String.Empty

I feel dirty, but I don’t know how else to get around it.  In any event, I then coded up a module that pulled the list of restaurants from Azure and put them through the classifier.

  1. namespace ChickenSoftware.RestaurantClassifier
  2.  
  3. open FSharp.Data
  4. open System.Linq
  5. open System.Configuration
  6. open Microsoft.FSharp.Linq
  7. open Microsoft.FSharp.Data.TypeProviders
  8.  
  9. type internal SqlConnection = SqlEntityConnection<ConnectionStringName="azureData">
  10.  
  11. type public RestaurantBuilder () =
  12.     
  13.     let connectionString = ConfigurationManager.ConnectionStrings.["azureData"].ConnectionString;
  14.     
  15.     member public this.GetRestaurants () =
  16.         SqlConnection.GetDataContext(connectionString).Restaurants
  17.             |> Seq.map(fun x -> x.EstablishmentName, x.EstablishmentAddress + " " + x.EstablishmnetZipCode)
  18.             |> Seq.toArray
  19.             
  20.     member public this.GetChineseRestaurants () =
  21.         let catagoryRepository = new RestaurantCatagoryRepository()
  22.         let catagory = "Chinese"
  23.         this.GetRestaurants()
  24.                 |> Seq.filter(fun (name, address) -> catagoryRepository.IsRestaurantInCatagory(name, address,catagory))
  25.                 |> Seq.toList

This code is almost identical to the code I posted 2 weeks ago.  Sure enough, When I threw my integration tests at the functions, check out fiddler. 

image

I was getting responses.  I ran into the problem on the 50th request though.

image

To get around this occasional timeout issue, I threw in a second delay between each request, which seemed the solve the problem.

  1. System.Threading.Thread.Sleep(new System.TimeSpan(0,0,1))
  2. let catagories = this.GetCatagories(restaurantName, restaurantAddress)
  3. if(String.IsNullOrEmpty(catagories)) then false
  4. else this.IsCatagoryInCatagories(catagories,restaurantCatagory)

However, this then introduced a new problem.  There are 4,000 or so restaurants, so that is over 66 minutes of running.  Not good.  Next week, I hope to add some parallelism to speed things up…

 

 

 

 

 

Trigrams and F#

Rob Seder wrote a great post of trigrams last week.  He then asked me how the same functionality would be implemented in F# – specifically dropping the for..each.  Challenge accepted!.

The first thing I did was hit Stack Overflow to see if there is a built in function to parse a string by groups and  I had a answer within minutes for exactly what I was looking for (thanks MattNewport).

So to match Rob’s BuildTrigram function, I wrote this:

  1. type TrigramBuilder() =
  2.     member this.BuildTrigrams(inputString: string) =
  3.         inputString
  4.             |> Seq.windowed 3
  5.             |> Seq.map(fun a -> System.String a)
  6.             |> Seq.toArray

And I had a covering unit test already created:

  1. [TestMethod]
  2. public void GetTrigrams_ReturnsExpectedValue()
  3. {
  4.     var builder = new TrigramBuilder();
  5.     String inputString = "ABCDEFG";
  6.  
  7.     String[] expected = new String[] { "ABC", "BCD", "CDE", "DEF", "EFG" };
  8.     String[] actual = builder.BuildTrigrams(inputString);
  9.  
  10.     CollectionAssert.AreEqual(expected, actual);
  11. }

I then Implemented a function that matches his double loops (can’t tell the function name from the code snippet on the blog post):

  1. member this.GetMatchPercent(baseString: string, compareString: string) =
  2.     let trigrams = this.BuildTrigrams(compareString)
  3.     let matchCount = trigrams
  4.                         |> Seq.map(fun t -> match baseString.Contains(t) with
  5.                                                 | true -> 1
  6.                                                 | false -> 0)
  7.                         |> Seq.sum
  8.     let totalCount = trigrams.Length
  9.     float matchCount/float totalCount

And throwing in some covering unit tests:

  1. public void GetMatchPercentageOfExactMatch_ReturnsExpectedValue()
  2. {
  3.     var builder = new TrigramBuilder();
  4.     
  5.     String baseString = "ABCDEF";
  6.     String compareString = "ABCDEF";
  7.  
  8.     double expected = 1.0;
  9.     double actual = builder.GetMatchPercent(baseString, compareString);
  10.  
  11.     Assert.AreEqual(expected, actual);
  12. }
  13.  
  14. [TestMethod]
  15. public void GetMatchPercentageOf50PercentMatch_ReturnsExpectedValue()
  16. {
  17.     var builder = new TrigramBuilder();
  18.  
  19.     String baseString = "ABCD";
  20.     String compareString = "ABCDEF";
  21.  
  22.     double expected = 0.5;
  23.     double actual = builder.GetMatchPercent(baseString, compareString);
  24.  
  25.     Assert.AreEqual(expected, actual);
  26. }

Sure enough, green across the board:

image

 

Analysis of Health Inspection Data using F#

As part of the TRINUG F#/Analytics SIG, I did a public records request from Wake County for all of the restaurant inspections in 2013.  If you are not familiar, the inspectors go out and then give a score to the restaurant.  The restaurant then has to display their score like this:

image

After some back and forth, I got the data as an Excel spreadsheet that looks like this

image

I then loaded the spreadsheet into a sql server and exposed it as some OData endpoints.

  1. // GET odata/Restaurant
  2. [Queryable]
  3. public IQueryable<Restaurant> GetRestaurant()
  4. {
  5.     return db.Restaurants;
  6. }
  7.  
  8. // GET odata/Restaurant(5)
  9. [Queryable]
  10. public SingleResult<Restaurant> GetRestaurant([FromODataUri] int key)
  11. {
  12.     return SingleResult.Create(db.Restaurants.Where(restaurant => restaurant.Id == key));
  13. }

I then dove into the data to see if there were any interesting conclusions to be found.  Following my pattern of doing analytics using F# and unit testing using C#, I created a project with the following code:

  1. namespace ChickenSoftware.RestraurantChicken.Analysis
  2.  
  3. open System.Linq
  4. open System.Configuration
  5. open Microsoft.FSharp.Linq
  6. open Microsoft.FSharp.Data.TypeProviders
  7.  
  8. type internal SqlConnection = SqlEntityConnection<ConnectionStringName="azureData">
  9.  
  10. type public RestaurantAnalysis () =
  11.     
  12.     let connectionString = ConfigurationManager.ConnectionStrings.["azureData"].ConnectionString;

Note that I am using the connection string in two places – the 1st for the type provider to do its magic at design time and the second for actually accessing the data at run time.  With that set up, the 1st question I had was “ is there seasonality in inspection scores like there are in traffic tickets?”  To that end, I created the following function:

  1. member public x.GetAverageScoreByMonth () =
  2.     SqlConnection.GetDataContext(connectionString).Restaurants
  3.         |> Seq.map(fun x -> x.InspectionDate.Value.Month, x.InspectionScore.Value)
  4.         |> Seq.groupBy(fun x -> fst x)
  5.         |> Seq.map(fun (x,y) -> (x,y |> Seq.averageBy snd))
  6.         |> Seq.map(fun (x,y) -> x, System.Math.Round(y,2))
  7.         |> Seq.toArray
  8.         |> Array.sort

This is pretty vanilla F# code, with the tricky part being the average by month (lines 4 and 5 here).  What the code is doing is grouping up the 4,000 or so tuples that were created on line 3 into another tuple – with the fst being the groupBy value (in this case month) and then the second tuple being a tuple with the month and score.  Then, by averaging up the score of the second tuple, we get an average for each month.  I create a unit (really integration) test like so:

  1. [TestMethod]
  2. public void GetAverageScoreByMonth_ReturnsTwelveItems()
  3. {
  4.     var analysis = new RestaurantAnalysis();
  5.     var scores = analysis.GetAverageScoreByMonth();
  6.     Int32 expected = 12;
  7.     Int32 actual = scores.Length;
  8.     Assert.AreEqual(expected, actual);
  9. }

And the result ran green. 

image

Putting a break on the Assert and a watch on scores, you can see the values:

image

A couple of things stand out

1) The overall average is around 96 and change

2) There does not seem to be any significant variance among months.

Since I am trying to also teach myself D3, I then added a MVC5 project to my solution and added an analysis controller that calls the function in the analysis module and serves the results as json:

  1. public JsonResult AverageScoreByMonth()
  2. {
  3.     var analysis = new RestaurantAnalysis();
  4.     var scores = analysis.GetAverageScoreByMonth();
  5.     return Json(scores,JsonRequestBehavior.AllowGet);
  6. }

I then made a page with a simple D3 chart that calls this controller

  1. @{
  2.     Layout = "~/Views/Shared/_Layout.cshtml";
  3. }
  4.  
  5. <svg class="chart"></svg>
  6.  
  7. <style>
  8.     .bar {
  9.         fill: steelblue;
  10.     }
  11.  
  12.         .bar:hover {
  13.             fill: brown;
  14.         }
  15.  
  16.     .axis {
  17.         font: 10px sans-serif;
  18.     }
  19.  
  20.         .axis path,
  21.         .axis line {
  22.             fill: none;
  23.             stroke: #000;
  24.             shape-rendering: crispEdges;
  25.         }
  26.  
  27.     .x.axis path {
  28.         display: none;
  29.     }
  30. </style>
  31.  
  32.  
  33.  
  34. <script>
  35.  
  36.     var margin = { top: 20, right: 20, bottom: 30, left: 40 },
  37.         width = 960 – margin.left – margin.right,
  38.         height = 500 – margin.top – margin.bottom;
  39.  
  40.     var x = d3.scale.ordinal()
  41.         .rangeRoundBands([0, width], .1);
  42.  
  43.     var y = d3.scale.linear()
  44.         .range([height, 0]);
  45.  
  46.     var xAxis = d3.svg.axis()
  47.         .scale(x)
  48.         .orient("bottom");
  49.  
  50.     var yAxis = d3.svg.axis()
  51.         .scale(y)
  52.         .orient("left")
  53.         .ticks(10, "%");
  54.  
  55.     var svg = d3.select("body").append("svg")
  56.         .attr("width", width + margin.left + margin.right)
  57.         .attr("height", height + margin.top + margin.bottom)
  58.       .append("g")
  59.         .attr("transform", "translate(" + margin.left + "," + margin.top + ")");
  60.  
  61.  
  62.  
  63.     $.ajax({
  64.         url: "http://localhost:3057/Analysis/AverageScoreByMonth/&quot;,
  65.         dataType: "json",
  66.         success: function (data) {
  67.             x.domain(data.map(function (d) { return d.Item1; }));
  68.             y.domain([0, d3.max(data, function (d) { return d.Item2; })]);
  69.  
  70.             svg.append("g")
  71.                 .attr("class", "x axis")
  72.                 .attr("transform", "translate(0," + height + ")")
  73.                 .call(xAxis);
  74.  
  75.             svg.append("g")
  76.                 .attr("class", "y axis")
  77.                 .call(yAxis)
  78.               .append("text")
  79.                 .attr("transform", "rotate(-90)")
  80.                 .attr("y", 6)
  81.                 .attr("dy", ".71em")
  82.                 .style("text-anchor", "end")
  83.                 .text("Frequency");
  84.  
  85.             svg.selectAll(".bar")
  86.                 .data(data)
  87.               .enter().append("rect")
  88.                 .attr("class", "bar")
  89.                 .attr("x", function (d) { return x(d.Item1); })
  90.                 .attr("width", x.rangeBand())
  91.                 .attr("y", function (d) { return y(d.Item2); })
  92.                 .attr("height", function (d) { return height – y(d.Item2); });
  93.  
  94.         },
  95.         error: function (e) {
  96.             alert("error");
  97.         }
  98.     });
  99.  
  100.     function type(d) {
  101.         d.Item2 = +d.Item2;
  102.         return d;
  103.     }
  104. </script>

And when I run it, a run-of-the mill barchart (I did have to adjust the F# to shift the decimal to the left two positions so that I could match the scale of the chart’s template.  For me, it is easier to alter the F# than the javascript:

image

Following this pattern, I did some other season analysis like average by DayOfMonth

image

DayOf Week.

image

So there does not seem to be any seasonality in inspection scores.

I then did an average of inspectors

image

And there looks to be some variance, but it is getting lost of the scale of the map.  The problem is that the range of the scores is not 0 to 100

Here is a function that counts the number of scores (rounded to 0)

  1. member public x.CountOfRoundedScores () =
  2.     SqlConnection.GetDataContext(connectionString).Restaurants
  3.         |> Seq.map(fun x -> System.Math.Round(x.InspectionScore.Value,0), x.InspectionID)
  4.         |> Seq.groupBy(fun x -> fst x)
  5.         |> Seq.map(fun (x,y) -> (x,y |> Seq.countBy snd))
  6.         |> Seq.map(fun (x,y) -> (x,y |> Seq.sumBy snd))
  7.         |> Seq.toArray

That graphically looks like:

image

So back to inspectors, I needed to adjust the scale from 0 to 100 to 80 to 100.  I also needed to remove the null inspection Ids and the records that were for the ‘test facility’ and the 6 records that were below 80.

  1. member public x.AverageScoreByInspector () =
  2.     SqlConnection.GetDataContext(connectionString).Restaurants
  3.         |> Seq.filter(fun x -> x.EstablishmentName <> "Test Facility")
  4.         |> Seq.filter(fun x -> x.InspectionScore.Value > 80.)
  5.         |> Seq.filter(fun x -> x.InspectionID <> null)
  6.         |> Seq.map(fun x -> x.InspectorID, x.InspectionScore.Value)
  7.         |> Seq.groupBy(fun x -> fst x)
  8.         |> Seq.map(fun (x,y) -> (x,y |> Seq.averageBy snd))
  9.         |> Seq.map(fun (x,y) -> x, y/100.)
  10.         |> Seq.map(fun (x,y) -> x, System.Math.Round(y,4))
  11.         |> Seq.toArray
  12.         |> Array.sort

I then adjusted the scale of the inspector graph to have to domain from 80 to 100 (versus 0 to 100) and the scale of the y axis.  This was a good article explaining Scales and Domains in D3.

  1. var yAxis = d3.svg.axis()
  2.     .scale(y)
  3.     .orient("left")
  4.     .ticks(10);

  1. $.ajax({
  2.     url: "http://localhost:3057/Analysis/AverageScoreByInspector/&quot;,
  3.     dataType: "json",
  4.     success: function (data) {
  5.         x.domain(data.map(function (d) { return d.Item1; }));
  6.         y.domain([80, d3.max(data, function (d) { return d.Item2; })]);

and now there is pretty good graph showing the variance among inspectors:

image

So the interesting this is that #1168 is 2 below the average – which of a domain of 10 is pretty significant.  Interestingly, 1168 is also the inspector who has all of the “Test facility” records – so they are probably the trainer and/or lead inspector.  With this analysis in the back pocket, ran a function that did the inspection score by establishment type:

image

This is kinda interesting (esp that pushcarts got the highest scores) but I wanted to see if there was any truth the the common perception that Chinese restaurants are less sanitary than other kinds of restaurants.  To that end, I created a rudimentary classifier that searched the name of the establishment to see if it had a name that is typically associated with fast-food Chinese:

  1. member public x.IsEstablishmentAChineseRestraurant (establishmentName:string) =
  2.     let upperCaseEstablishmentName = establishmentName.ToUpper()
  3.     let numberOfMatchedWords = upperCaseEstablishmentName.Split(' ')
  4.                                 |> Seq.map(fun x -> match x with
  5.                                                         | "ASIA" -> 1
  6.                                                         | "ASIAN" -> 1
  7.                                                         | "CHINA" -> 1
  8.                                                         | "CHINESE" -> 1
  9.                                                         | "PANDA" -> 1
  10.                                                         | "PEKING" -> 1
  11.                                                         | "WOK" -> 1
  12.                                                         | _ -> 0)
  13.                                 |> Seq.sum
  14.     match numberOfMatchedWords with
  15.         | 0 -> false
  16.         | _ -> true

I then created a function that returned the average and ran my unit tests.

  1. [TestMethod]
  2. public void IsEstablishmentAChineseRestraurantUsingWOK_ReturnsTrue()
  3. {
  4.     var analysis = new RestaurantAnalysis();
  5.     String establishmentName = "JAMIE'S WOK";
  6.  
  7.     var expected = true;
  8.     var actual = analysis.IsEstablishmentAChineseRestraurant(establishmentName);
  9.     Assert.AreEqual(expected, actual);
  10. }
  11.  
  12. [TestMethod]
  13. public void IsEstablishmentAChineseRestraurantUsingWok_ReturnsTrue()
  14. {
  15.     var analysis = new RestaurantAnalysis();
  16.     String establishmentName = "Jamie's Wok";
  17.  
  18.     var expected = true;
  19.     var actual = analysis.IsEstablishmentAChineseRestraurant(establishmentName);
  20.     Assert.AreEqual(expected, actual);
  21. }
  22.  
  23. [TestMethod]
  24. public void AverageScoreForChineseRestaurants_ReturnsExpected()
  25. {
  26.     var analysis = new RestaurantAnalysis();
  27.     var actual = analysis.AverageScoreForChineseRestaurants();
  28.     Assert.IsNotNull(actual);
  29. }

When a break was put on the value of the average, it was apparent that Chinese restaurants scored significantly lower than the average of 96

image

So then I applied 1 more segmentation: Chinese versus Non-Chinese scores by inspector:

  1. member public x.AverageScoresOfChineseAndNonChineseByInspector () =
  2.     let dataSet = SqlConnection.GetDataContext(connectionString).Restaurants
  3.                     |> Seq.map(fun x -> x.EstablishmentName, x.InspectorID,x.InspectionScore.Value)
  4.     let chineseRestraurants = dataSet
  5.                                 |> Seq.filter(fun (a,b,c) -> x.IsEstablishmentAChineseRestraurant(a))
  6.                                 |> Seq.map(fun (a,b,c) -> b,c)
  7.                                 |> Seq.groupBy(fun x -> fst x)
  8.                                 |> Seq.map(fun (x,y) -> (x,y |> Seq.averageBy snd))
  9.                                 |> Seq.map(fun (x,y) -> x, System.Math.Round(y,2))
  10.                                 |> Seq.toArray
  11.                                 |> Array.sort
  12.     let nonChineseRestraurants = dataSet
  13.                                 |> Seq.filter(fun (a,b,c) -> not(x.IsEstablishmentAChineseRestraurant(a)))
  14.                                 |> Seq.map(fun (a,b,c) -> b,c)
  15.                                 |> Seq.groupBy(fun x -> fst x)
  16.                                 |> Seq.map(fun (x,y) -> (x,y |> Seq.averageBy snd))
  17.                                 |> Seq.map(fun (x,y) -> x, System.Math.Round(y,2))
  18.                                 |> Seq.toArray
  19.                                 |> Array.sort
  20.     Seq.zip chineseRestraurants nonChineseRestraurants
  21.            |> Seq.map(fun ((a,b),(c,d)) -> a,b,d)
  22.            |> Seq.toList

And in graphics using a double-bar chart:

image

So this is kinda interesting.  The lead inspector (1168) who grades everyone lower actually gives Chinese restaurants higher marks.  Everyone else pretty much grades Chinese restaurants lower except for 1 inspector.  Also, 1708 must really not like Chinese restaurants – or their inspection list has a series of really bad Chinese restaurants.

Note that this may not be statistically significant (I didn’t control for sample size, etc..) – but further analysis might be warranted, no?  If you are interested, here is the endpoint: http://restaurantchicken.cloudapp.net/odata/Restaurant

Finally, when I presented this analysis to TRINUG last week, lots of people became interested in F# and analytics (ok, maybe 3).  You can see the comments here.  Also, I now have an appointment with the head of the health department department and the CIO of Wake County later this week – let’s see what they say…

 

 

Unit Testing F# Projects

I am a big believer of unit tests and as I write more and more code, I suffer what sociologists call “confirmation bias” whereby I keep finding more and more reasons to confirm that I am right.  But I am at the point where I don’t believe in developer documentation (sorry Sandcastle), reflector (sorry Redgate), code comments, or architectural diagrams.  I believe in the code, the whole code, and nothing but the code.  Or as Rasheed Wallace might say if he was a coder versus a professional basketball player: “Code Don’t Lie!”.

And the only code that tells you what a module is doing is the unit tests.  If you want to see how the module behaves, look at the green unit tests.  If you want to see how the module is supposed to behave but is not, look at the red unit tests.  If you want to see how the module might or might behave because the code is out of control, look for the non-existent unit tests.

So when I started writing code in F#, the unit tests went along for the ride.  F# folks will tell you to use the REPL to get your code working and/or use a unit test project in F#.  I don’t do either because:

1) The REPL is designed for quick prototyping, not for having a durable, cantonal example of module behavior. Having a full suite of unit tests gives you code coverage, tests at the build, and a fail-proof way of documenting the code’s behavior.

2) Most of the other developers I work with are CSharpers.  Having the unit tests in C# allows them to understand the behavior in a language in which they are familiar.  Since the tests need to communicate the working code’s intent, having that in a language they understand is critical.  They don’t have to understand F# to use a F# module.  Also, porting from C# in MSTest to NUnit in C# is a snap.

So when you add a C# Unit Test project to your solution that has a F# module you want to test, there are a couple of things you need to do.

1) Add a reference from the Unit Test Project to the F# Project (Right Click, Add Reference)

image

2) Add a reference to F#

image

3) If you are using type providers (and who doesn’t) and you have the connection string in the .config file of the working code project, add a .config file to the unit test project and copy over the connection string

image

Note that you code has to reflect that the connection string is being used in 2 different ways by the type provider, as explained in this post.  Your F# code needs to look like this:

  1. type internal SqlConnection = SqlEntityConnection<ConnectionStringName="azureData">
  2.  
  3.  
  4. type public RestaurantAnalysis () =
  5.     
  6.     let connectionString = ConfigurationManager.ConnectionStrings.["azureData"].ConnectionString;
  7.     member public x.GetScoresByMonth () =
  8.         SqlConnection.GetDataContext(connectionString).Restaurants
  9.             |> Seq.map(fun x -> x.InspectionDate.Value.Month, x.InspectionScore.Value)
  10.             |> Seq.groupBy(fun x -> fst x)
  11.             |> Seq.toList

4) Finally, you have to rebuild the F# project each time you want the unit tests to pick up changes.  That is different from a C# unit test project referencing a C# working code project.

Finally, not related to unit testing but too short for a blog post, if you are using the type providers (and who doesn’t) and you need to expose your classes publicly, you can’t use the SqlEntity provider – you need to use the SqlData provider.  The catch is that SqlData does not work with Azure Sql Storage as far as I can tell.  In my case, I used SqlEntity and exposed tuples and custom types publicly.  Not the best, but still better than using C# and Entity Framework…

Basic Insert Operation Using F#

So the more I use F#,  the more I come to understand the benefits and limitations of the language.  Since I spend a majority of my day job in C# and JavaScript, it is a natural comparison between these two languages and F#.  One of the tenants of F# is ‘less noise, more signal’.  After looking at some projects, I am coming to the conclusion that Entity Framework, LinqToSql, <Any other ORM> is just noise.  It is expensive noise at that – if you have worked on a production app using EF and you to do anything outside of the examples on MSDN, you know what I mean.

So can EF type providers replace the overhead, code bloat, and additional costs of Entity Framework?  I decided to do a small test to see.  I needed to load into SqlServer 27,481 records of crash data that I get from the North Carolina DOT.  The records came to me in an Excel Spreadsheet which I pumped into MSAccess.  Then, instead of using SQl Server SSIS/Bulk Data Load functions, I decided to create an app that pulls that data from the Access database and load it into the SqlServer database via a type provider.

My first step was to look for a MSAccess type provider.  No luck.  I then hit up Stack Overflow and found this and this article for working with Access.  I coded up a solution to get the data into a DataReader like so

  1. static member GetCrashData =
  2.     let connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=E:\Documents\Road Alert\WakeCountyCrashes.accdb; Persist Security Info=False;"
  3.     use connection = new OleDbConnection(connectionString)
  4.     let commandText = "Select * from Data"
  5.     use command = new OleDbCommand(commandText,connection)
  6.     connection.Open()
  7.     use reader = command.ExecuteReader()

My first attempt to get the data from the read was a tuple like so:

  1. [while reader.Read() do
  2.      yield reader.GetInt32(0),
  3.      reader.GetFieldValue(1).ToString(),
  4.      reader.GetFieldValue(2).ToString(),
  5.      reader.GetDouble(3),
  6.      reader.GetFieldValue(4).ToString(),
  7.      reader.GetFieldValue(5).ToString(),
  8.      reader.GetFieldValue(6).ToString(),
  9.      reader.GetDateTime(7),
  10.      reader.GetDateTime(8),
  11.      reader.GetFieldValue(9).ToString(),
  12.      reader.GetFieldValue(10).ToString()
  13.  ]

Sure enough, it works like a champ from my C# UI (once I ran AccessDatabaseEngine.exe on my machine – sigh)

  1. private static void GetCrashData()
  2. {
  3.     var results = CrashDataLoader.GetCrashData;
  4.     Console.WriteLine(results.Count().ToString());
  5. }

Gives

image

The next thing I did was to create a Type Provider

image

And then create the method to insert the data into the database:

  1. static member LoadCrashData  =
  2.     let targetDatabase = targetSchema.GetDataContext()
  3.     let rows = CrashDataLoader.GetCrashData
  4.     targetDatabase.TrafficCrashes.InsertAllOnSubmit(rows)
  5.     targetDatabase.DataContext.SubmitChanges()
  6.     true

The problem is that I ran into was that the GetCrashData was returning a Tuple and the LoadCrashData was expecting a Typed CrashData element.  I searched for a bit and then gave up trying to figure out how to map the two without explicitly assigning each field.  So then I did it the old fashion way like so:

  1. static member TrafficCrashFromReader(reader: OleDbDataReader) =
  2.     let trafficCrash = new targetSchema.ServiceTypes.TrafficCrashes()
  3.     trafficCrash.NCDmvCrashId <- System.Nullable<float> (float (reader.GetFieldValue(0).ToString()))
  4.     trafficCrash.Municipality <- reader.GetFieldValue(1).ToString()  
  5.     trafficCrash.OnRoad <- reader.GetFieldValue(2).ToString()  
  6.     trafficCrash.Miles <- System.Nullable<double> (double (reader.GetFieldValue(3).ToString()))
  7.     trafficCrash.Direction <- reader.GetFieldValue(4).ToString()
  8.     trafficCrash.FromRoad <- reader.GetFieldValue(5).ToString()
  9.     trafficCrash.TowardRoad <- reader.GetFieldValue(6).ToString()
  10.     trafficCrash.DateOfCrash <- System.Nullable<DateTime> (reader.GetDateTime(7))
  11.     trafficCrash.TimeOfCrash <- System.Nullable<DateTime> (reader.GetDateTime(8))
  12.     trafficCrash.CrashType <- reader.GetFieldValue(9).ToString()
  13.     trafficCrash.CrashSeverity <- reader.GetFieldValue(10).ToString()
  14.     trafficCrash

The fact that I am using the <- symbol is a code smell, but I am not sure how to get around it.

In any event, once I ran it in my Console app:

  1. private static void LoadCrashData()
  2. {
  3.     Stopwatch stopWatch = new Stopwatch();
  4.     stopWatch.Start();
  5.     CrashDataLoader.LoadCrashData();
  6.     stopWatch.Stop();
  7.     Console.WriteLine("The load took " + stopWatch.Elapsed.TotalSeconds + " seconds.");
  8. }

I got nothing after 30m minutes!  Yikes.

I then went back and wrote a function to insert 1 row at a time:

  1. static member LoadCrashDataRow dataRow =
  2.     let targetDatabase = targetSchema.GetDataContext()
  3.     targetDatabase.TrafficCrashes.InsertOnSubmit(dataRow)
  4.     targetDatabase.DataContext.SubmitChanges()
  5.     true

And the consuming app:

  1. private static void LoadCrashData()
  2. {
  3.  
  4.     var crashRows = CrashDataLoader.GetCrashData;
  5.     Stopwatch stopWatch = new Stopwatch();
  6.     stopWatch.Start();
  7.     foreach (var crashRow in crashRows)
  8.     {
  9.         CrashDataLoader.LoadCrashDataRow(crashRow);
  10.         Console.WriteLine(crashRow.NCDmvCrashId + " loaded.");
  11.     }
  12.     stopWatch.Stop();
  13.     Console.WriteLine("The load took " + stopWatch.Elapsed.TotalSeconds + " seconds.");
  14. }

Sure enough, it works like a champ.

image

So it is slow – though I am not sure EF is any faster.  But not having to deal with that .edmx files, the .tt files, the whatever-else-we-throw-in files, I think further research is definitely warranted.  Also, there are some other things I think F# Type Providers need to have:

1) Ability to handle proxies

2) Making Plural tables singular (The table name is Crashes, the type should be Crash)

3) An MS Access TP would be great

4) An Azure Sql Database TP would be doubly great