F# | Jamie Dixon's Home

Geocoding Using Texas A&M Service and F#

June 9, 2015 1 Comment

Geocoding is the technique of taking an address and turning it in to a geocordinate (latitude and longitude). There are plenty of geocoding services out there (notably Google and Bing) but I decided to try a lesser know (though apparently just as good) service from Texas A&M found here.

It took about 30 seconds to create an account with the only downer is that they only give you 2,500 calls (and it is not clear if that is per day/month/forever). In any event, the documentation was easy enough to work though and since they seem to offer the data in a variety of formats, I picked the json type provider. Also, their documentation starts off with POST examples, which a bit harder to mange using a TP, but they do support GETs with the parameters in the query string.

I fired up Visual Studio and started a new FSharp project. The first thing I did was pull down their sample json result which I saved as a local file to the project folder (I also put a sample XML in there in case the json provider didn’t work out. Ironically, I would not get the XmlProvider working but the Json one worked like a champ):

I then added in some code to make the request. A majority of the code is creating the query string:

 1 #r "../packages/FSharp.Data.2.2.2/lib/net40/FSharp.Data.dll"
 2 
 3 open System.IO
 4 open System.Text
 5 open FSharp.Data
 6 
 7 [<Literal>]
 8 let sample = "C:\Users\Dixon\Desktop\SampleApp_CSharp\ChickenSoftware.Geolocation.Solution\Data\TAMUHttpGet.json"
 9 
10 type Context = JsonProvider<sample>
11 
12 let streetAddress = "904 Strathorn Drive"
13 let city = "Cary"
14 let state = "NC"
15 let zip = "27519"
16 let apiKey = "XXXXXXX"
17 
18 let stringBuilder = new StringBuilder()
19 stringBuilder.Append("https://geoservices.tamu.edu/Services/Geocode/WebService/GeocoderWebServiceHttpNonParsed_V04_01.aspx") |> ignore
20 stringBuilder.Append("?streetAddress=") |> ignore
21 stringBuilder.Append(streetAddress) |> ignore
22 stringBuilder.Append("&city=") |> ignore
23 stringBuilder.Append(city) |> ignore
24 stringBuilder.Append("&state=") |> ignore
25 stringBuilder.Append(state) |> ignore
26 stringBuilder.Append("&zip=") |> ignore
27 stringBuilder.Append(zip) |> ignore
28 stringBuilder.Append("&apiKey=") |> ignore
29 stringBuilder.Append(apiKey) |> ignore
30 stringBuilder.Append("&version=4.01") |> ignore
31 stringBuilder.Append("&format=json") |> ignore
32 
33 let searchUri = stringBuilder.ToString()
34 let searchResult = Context.Load(searchUri)
35 
36 let firstResult = searchResult.OutputGeocodes |> Seq.head
37 firstResult.OutputGeocode.Latitude
38 firstResult.OutputGeocode.Longitude
39 firstResult.OutputGeocode.MatchScore
40 
41 
42 
43 
44

And sure enough: data that is correct

FSharp made it stupid simple to consume this service and the only real gotchas I found were in the documentation itself:

1) The MatchScore is a decimal but the json sample has it as “100” so it was inferred as an int. I replaced the value as “98.4023668639053” to force the correct type

2) The documentation’s formats are listed as this

But since they had samples in json below, I just added in

1 stringBuilder.Append("&format=json") |> ignore

and it worked fine.

You can see the gist here.

Filed under F#

//Build Word Count!

June 3, 2015 2 Comments

I started working with HadoopFs last week to see if I could get a better understanding of how to write FSharp mappers. Since everyone uses word counts when doing a “Hello World” using hadoop, I thought I would also.

I decided to compare Satya’s //Build keynote from 2014 and 2015 to see if there was any shift in his focus between last year and this. Isaac Abraham managed to reduce the 20+ lines of catastrophic C# code in the Azure HDInsight tutorial into 2 lines of F# code

 1         static void Main(string[] args)
 2         {
 3             if (args.Length > 0)
 4             {
 5                 Console.SetIn(new StreamReader(args[0]));
 6             }
 7 
 8             string line;
 9             string[] words;
10 
11             while ((line = Console.ReadLine()) != null)
12             {
13                 words = line.Split(' ');
14 
15                 foreach (string word in words)
16                     Console.WriteLine(word.ToLower());
17             }
18         }

1 let result = testString.Split([| ' ' |], StringSplitOptions.RemoveEmptyEntries) |> Seq.countBy id
2 result
3

I added the data files to my solution and then added way to locate those files via a relative path.

1 let baseDirectory = __SOURCE_DIRECTORY__
2 let baseDirectory' = Directory.GetParent(baseDirectory)
3 let filePath = "Data\Build_Keynote2014.txt"
4 let fullPath = Path.Combine(baseDirectory'.FullName, filePath)
5 let buildKeynote =  File.ReadAllText(fullPath)

I then ran the mapper that Isaac created and got what I expected

1 buildKeynote.Split([| ' ' |], StringSplitOptions.RemoveEmptyEntries) 
2     |> Seq.countBy id
3     |> Seq.sortBy(fun (w,c) -> c)
4     |> Seq.toList
5     |> List.rev

Interestingly, the 1st word that really jumps out is “Windows” at 26 times.

I then loaded in the 2015 Build keynote and ran the same function

1 let filePath' = "Data\Build_Keynote2015.txt"
2 let fullPath' = Path.Combine(baseDirectory'.FullName, filePath')
3 let buildKeynote' =  File.ReadAllText(fullPath')
4 
5 buildKeynote'.Split([| ' ' |], StringSplitOptions.RemoveEmptyEntries) 
6     |> Seq.countBy id
7     |> Seq.sortBy(fun (w,c) -> c)
8     |> Seq.toList
9     |> List.rev

And the 1st interesting word is “Platform” at 9 mentions. “Windows” fell to 2 mentions.

1 result |> Seq.filter(fun (w,c) -> w = "Windows")

And just because I couldn’t resist

1 result |> Seq.filter(fun (w,c) -> w = "F#")
2 result |> Seq.filter(fun (w,c) -> w = "C#")
3

So I am feeling pretty good about HadoopFs and will now start trying to implement it on my instance of Azure this weekend.

Filed under F#, Hadoop

Business Logic and F#

May 26, 2015 2 Comments

One of the reasons I like F# so much is that it allows me to think about the problem I am trying to solve, not about the language syntax and coding around language constructs. Consider this example.

I am putting on an art show in my neighborhood and I managed to obtain 3 paintings of cultural significance:

Starry Night

Sunday Afternoon on the Island of La Grande Jatte

Dogs Playing Poker

Each painting is in its own room and due to the volume of people that the art gallery can support, a person can only visit 1 painting. 1,000 tickets sold and all 1,000 people are going to show up. This is a hot event.

I needed a way to forecast how many people will go into each room. Since all 3 paintings are immensely popular, I could assume that each room will have 1/3 the number of visitors. However, I wanted to be be a bit more precise and I know that each painting has a certain number of tags associated with them:

Tag	Starry Night	Afternoon	Poker
Impressionism	X	X
Nature	X	X
Leisure Activity		X	X
Modernism			X

Assuming that people will want to go see paintings with tags that interest them, paintings that have tag overlap will split visitors, paintings with no tag overlap will see more visitors, and paintings with more tags will draw more visitors. In Excel:

Tag	Starry Night	Afternoon	Poker	Total
Impressionism	1	1		2
Nature	1	1		2
Leisure Activity		1	1	2
Modernism			1	1


Tag	Starry Night	Afternoon	Poker
Impressionism	0.5	0.5
Nature	0.5	0.5
Leisure Activity		0.5	0.5
Modernism			1


Tag	People	Starry Night	Afternoon	Poker
Impressionism	250	125	125
Nature	250	125	125
Leisure Activity	250		125	125
Modernism	250			250
	1,000	250	375	375

Putting this to code, I opened up the F# REPL and created my art show like so:

 1 
 2 type Painting = {id:int;name:string;tags:string}
 3 type ArtShow = {id:int;name:string;expectedAttendance:int;paintings:Painting list}
 4 
 5 let painting0 = {id=0;
 6                 name="Starry Night";
 7                 tags="Impressionism;Nature"}
 8 let painting1 = {id=1;
 9                 name="Sunday Afternoon on the Island of La Grande Jatte"; 
10                 tags="Impressionism;Nature;LeisureActivities"}
11 let painting2 = {id=2;
12                 name="Dogs Playing Poker";
13                 tags="Modernism;LeisureActivities"}
14 let paintings = [painting0;painting1;painting2]
15 
16 let artShow = {id=0;
17                 name="Art Extravaganza";
18                 expectedAttendance=1000;
19                 paintings=paintings}
20

I then needed a way of uniquely identifying the tags. Enter the goodness of piping and high order functions:

1 let tagSet = artShow.paintings |> Seq.map(fun p -> p.tags)
2                                     |> Seq.collect(fun t -> t.Split(';'))
3                                     |> Seq.groupBy(fun t -> t)
4                                     |> Seq.map(fun (id,t) -> id, t |> Seq.length)
5

I then needed a way of assigning number of people to tags. Easy enough (this could have been part of the code block above but I split it for illustrative purposes)

1 let visitorsPerTag = artShow.expectedAttendance / (tagSet |> Seq.length)
2 let tagSet' = tagSet |> Seq.map(fun (id,c) -> id, visitorsPerTag/ c )

And then a function that calculates the number of expected visitor based on that the individual painting:

1 let tagModifier(painting: Painting) =
2     let tags = painting.tags.Split(';')
3     tags |> Seq.map(fun pt -> tagSet' |> Seq.find(fun(t,c) -> pt = t))
4                                       |> Seq.sumBy(fun(t,c) -> c )

And running it against my show’s paintings gives me the expected values:

1 artShow.paintings |> Seq.map(fun p -> p, tagModifier(p))
2

So this is why I love F#. The REPL and the language helped me reason and solve the problem. You can see the gist here.

After note: I sent the same challenge to some C# devs I know about how they would reason and then code the answer. No one took me up on it.

Filed under F#

System.AggregateException using Tweetinvi

May 26, 2015 2 Comments

Dear Future Jamie

If you are using TweetInvi in a new project and you get a System.AggregateException

And that exception contains a single Inner exception of System.IO.FileNotFoundException and the exception reads “cannot load System.Http.Primitives”

Install Microsoft.Net.Http in the calling project (in this case it was the unit test project).

Love

Current Jamie

PS You should really exercise more

Filed under F#

“Word Counts”: Using FSharp and HDInsight

May 12, 2015 1 Comment

I decided to learn a bit more about HDINisght, Microsoft’s implementation of Hadoop on Azure. I was surprised about the dirth of tutorials on-line (not even Pluralsight) with only this one seemingly having what I wanted. I started down the tutorial path –> and rewrite the map and reduce programs in F#.

Here is the original mapper code (in C#)

 1 static void Main(string[] args)
 2 {
 3     if (args.Length > 0)
 4     {
 5         Console.SetIn(new StreamReader(args[0]));
 6     }
 7 
 8     string line;
 9     string[] words;
10 
11     while ((line = Console.ReadLine()) != null)
12     {
13         words = line.Split(' ');
14 
15         foreach (string word in words)
16             Console.WriteLine(word.ToLower());
17     }
18 }

And here it is in F#

 1 [<EntryPoint>]
 2 let main argv = 
 3     if argv.Length > 0 then
 4         let inputString = argv.[0]
 5         Console.SetIn(new StreamReader(inputString))
 6     let mutable continueLooping = true
 7     while continueLooping do
 8         let line = Console.ReadLine()
 9         match String.IsNullOrEmpty(line) with
10         | true -> 
11             continueLooping <- false
12         | false ->
13             let words = line.Split(' ')
14             words |> Seq.iter(fun w -> Console.WriteLine(w.ToLower()))
15     0

And here is the original reducer in C#

 1 static void Main(string[] args)
 2 {
 3     string word, lastWord = null;
 4     int count = 0;
 5 
 6     if (args.Length > 0)
 7     {
 8         Console.SetIn(new StreamReader(args[0]));
 9     }
10 
11     while ((word = Console.ReadLine()) != null)
12     {
13         if (word != lastWord)
14         {
15             if(lastWord != null)
16                 Console.WriteLine("{0}[{1}]", lastWord, count);
17 
18             count = 1;
19             lastWord = word;
20         }
21         else
22         {
23             count += 1; 
24         }
25     }
26     Console.WriteLine(count);
27 }

and here it is in F#

 1 [<EntryPoint>]
 2 let main argv = 
 3     if argv.Length > 0 then
 4         let inputString = argv.[0]
 5         Console.SetIn(new StreamReader(inputString))
 6     let mutable continueLooping = true
 7     let mutable lastWord = String.Empty
 8     let mutable count = 0
 9     while continueLooping do
10         let word = Console.ReadLine()
11         match String.IsNullOrEmpty(word), word = lastWord, String.IsNullOrEmpty(lastWord) with
12         | true,_,_ -> 
13             continueLooping <- false
14         | false,true,_ ->
15             count <- count + 1
16         | false,false,true ->
17             count <- 1
18             lastWord <- word
19         | false,false,false ->
20             Console.WriteLine("{0}[{1}]",lastWord,count)
21     Console.WriteLine(count)
22     0

The biggest difference is that the conditional if..thens of the imperative style C# is replaced by pattern matching, which I feel makes the logic much more understandable. The use of the mutable keyword is a smell, but I am not sure how to loop user input in a Console app without it.

In any event, with the programs complete and pushed out to the Hadoop file system, I ran it via the Azure Powershell

And looking at the output, nothing is coming down.

Drat. I then tried to run the C# program and nothing is coming down. I wonder if it is a problem with the original code or perhaps the data I am using? The tutorial does not include a link to a dataset that works with the programs so I am a bit out of luck. More investigation needed, as it were.

Filed under F#, Hadoop, HDInsight

Set For List Comparisons in F#

May 12, 2015 2 Comments

Dear Jamie Of The Future:

Next time you want to see if there are elements in 2 different lists, use Set

1 let tags0 = Set.ofList(["A";"B";"C"])
2 let tags1 = Set.ofList(["A";"D"])
3 let tags2 = Set.ofList(["A";"B"])
4 let tags3 = Set.ofList(["D"])
5 
6 Set.intersect tags0 tags1 
7 Set.intersect tags0 tags2 
8 Set.intersect tags0 tags3

Love, Jamie of May 2015

PS. You really should exercise more…

Filed under F#

Using the XML Type Provider

May 5, 2015 1 Comment

Dear Future Jamie:

If you want to use the XML Type Provider to read an XML document from the web and you see something like this:

You need to add a reference to System.Xml.Linq. The easiest way is to do Add.Reference in the solution explorer and and copy/paste the path from its property window into your script:

And then you should be cooking with gas:

Love,

Jamie of May 2015

PS: You really should exercise more…

Filed under F#

Global Azure Bootcamp: Car Lab Analysis

April 28, 2015 1 Comment

As part of the Global Azure Bootcamp, the organizers created a hand-on lab where individuals could install a racing game and compete against other drivers. The cool thing was the amount of telemetry that the game pushed to Azure (I assume using Event Hubs to Azure Tables). The lab also had a basic “hello world” web app that could read data from the Azure Table REST endpoints so newcomers could see how easy it was to create and then deploy a website on Azure.

I decided to take a bit of a jaunt though the data endpoint to see what analytics I could run on it using Azure ML. I went to the initial endpoint here and sure enough, the data comes down in the browser. Unfortunately, when I set it up in Azure ML using a data reader:

I got 0 records returned. I think this has something to do with how the datareader deals with XML. I quickly used F# in Visual Studio with the XML type provider:

 1 #r "../packages/FSharp.Data.2.2.0/lib/net40/FSharp.Data.dll"
 2 
 3 open FSharp.Data
 4 
 5 [<Literal>]
 6 let uri = "https://reddoggabtest-secondary.table.core.windows.net/TestTelemetryData0?tn=TestTelemetryData0&sv=2014-02-14&si=GabLab&sig=GGc%2BHEa9wJYDoOGNE3BhaAeduVOA4MH8Pgss5kWEIW4%3D"
 7 
 8 type CarTelemetry = XmlProvider<uri>
 9 let carTelemetry = CarTelemetry.Load(uri)
10 
11

I reached out to the creator of the lab and he put a summary file on Azure Blob Storage that was very easy to consume with AzureML, you can find it herehere. I created Regression to predict the amount of damage a car will sustain based on the country and car type:

This was great, but I wanted to working on my R chops some so I decided to play around with the data in R Studio. I imported the data into R Studio and then fired up the scripting window. The first question I wanted to answer was “how does each country stack up against each other in terms of car crashes?”

I did some basic data exploration like so:

1 summary(PlayerLapTimes)
2 
3 aggregate(Damage ~ Country, PlayerLapTimes, sum)
4 aggregate(Damage ~ Country, PlayerLapTimes, FUN=length)
5

And then getting down to the business of answering the question:

 1 
 2 dfSum <- aggregate(Damage ~ Country, PlayerLapTimes, sum)
 3 dfCount <- aggregate(Damage ~ Country, PlayerLapTimes, FUN=length)
 4 
 5 dfDamage <- merge(x=dfSum, y=dfCount, by.x="Country", by.y="Country")
 6 names(dfDamage)[2] <- "Sum"
 7 names(dfDamage)[3] <- "Count"
 8 dfDamage$Avg <- dfDamage$Sum/dfDamage$Count 
 9 dfDamage2 <- dfDamage[order(dfDamage$Avg),] 
10

So that is kinda interesting that France has the most damage per race. I have to ask Mathias Brandewinder about that.

In any event, I then wanted to ask “what county finished first”. I decided to apply some R charting to the same biolerplate that I created earlier

1 dfSum <- aggregate(LapTimeMs ~ Country, PlayerLapTimes, sum)
2 dfCount <- aggregate(LapTimeMs ~ Country, PlayerLapTimes, FUN=length)
3 dfSpeed <- merge(x=dfSum, y=dfCount, by.x="Country", by.y="Country")
4 names(dfSpeed)[2] <- "Sum"
5 names(dfSpeed)[3] <- "Count"
6 dfSpeed$Avg <- dfSpeed$Sum/dfSpeed$Count 
7 dfSpeed2 <- dfSpeed[order(dfSpeed$Avg),] 
8 plot(PlayerLapTimes$Country,PlayerLapTimes$Damage)
9

So even though France appears to have the slowest drivers, the average is skewed by 2 pretty bad races –> perhaps the person never finished.

In any event, this was a fun exercise and I hope to continue with the data to show the awesomeness of Azure, F#, and R…

Filed under Analytics, F#, R

Battlehack Raleigh

April 21, 2015 1 Comment

This last weekend, I was fortunate enough to be part of a team that competed in Battlehack, a world-wide hackathon sponsored by Paypal. The premise of the hackathon is that you are coding an application that uses Paypal and is for social good.

My team met one week before and decided that the social problem that the application should address is how to make teenage driving safer. This topic was inspired by this heat map that shows that there is a statistically significant increase of car crashes around certain local high schools. The common theme of these high schools is that they are over capacity

This is also a personal issue for my daughter, whose was friendly with a girl who died in an accident last year near Panther Creek High School. In fact, she still wears a bracelet with the victims name on it. Unfortunately, she could not come b/c of school and sports commitments that weekend.

The team approached safe driving as a “carrot/stick” issue with kids. The phone app will capture the speed at which they are driving. If they stay within a safe range for the week, they will receive a cash payment. If they engage in risky behavior (speeding, fast stops, etc..), they will have some money charged to them. We used the hackathon’s sponsors Braintree’s for payment and SendGrid for email.

We divided the application into a couple major sections and the division of labor along each component. I really wanted to use Azure EventHubs and Stream Analytics but the Api developer was not familiar with that and a hackathon is defiantly not a place where you want to learn a new technology.

We set to work

Here is the part of the solution that I worked on:

The Api is a typical boiler plate MVC5/Web Api2 application and the Data Model holds all of the server data structures and Interfaces. C# was the right choice there as the Api developer was a C# web dev and the C# data structures serialize nicely to Json.

I did all of the Poc in the F# REPL and then moved the code into a compliable assembly. The Braintree code was easy with their Nuget package:

 1 type BrainTreeDebitService() = 
 2     interface IDebitService with 
 3         member this.DebitAccount(customerId, token, amount) = 
 4             let gateway = new BraintreeGateway()
 5             gateway.Environment <- Environment.SANDBOX
 6             gateway.MerchantId <- "aaaa"
 7             gateway.PublicKey <- "bbbbb"
 8             gateway.PrivateKey <- "cccc"
 9 
10             let transaction = new TransactionRequest()
11             transaction.Amount <- amount
12             transaction.CustomerId <- customerId
13             transaction.PaymentMethodToken <- token
14             gateway.Transaction.Sale(transaction) |> ignore

The Google Maps Api does have a nice set of methods for calculating Speed Limit. Since I didn’t have the right account, I only had some demo Json –> enter the F# Type Provider:

1 type SpeedLimit = JsonProvider<"../Data/GoogleSpeedLimit.json">
2 
3 type GoogleMapsSpeedLimitProvider() = 
4     interface ISpeedLimitProvider with 
5         member this.GetSpeedLimit(latitude, longitude) = 
6             let speedLimits = SpeedLimit.Load("../Data/GoogleSpeedLimit.json");
7             let lastSpeedLimit = speedLimits.SpeedLimits |> Seq.head
8             lastSpeedLimit.SpeedLimit

Finally, we used MongoDb for our data store:

 1 
 2 type MongoDataProvider() = 
 3     member this.GetLatestDriverData(driverId) = 
 4         let connectionString = "aaa"
 5         let client = MongoDB.Driver.MongoClient(connectionString)
 6         let server = client.GetServer()
 7         let database = server.GetDatabase("battlehackraleigh");
 8         let collection = database.GetCollection<DriverPosition>("driverpositions");
 9         let collection' = collection.AsQueryable()
10         let records = collection'.Where(fun x -> x.DriverId  = driverId)
11         records |> Seq.head
12 
13     member this.GetCustomerData(customerId)=
14         let connectionString = "aaa"
15         let client = MongoDB.Driver.MongoClient(connectionString)
16         let server = client.GetServer()
17         let database = server.GetDatabase("battlehackraleigh");
18         let collection = database.GetCollection<Customer>("customers");
19         let collection' = collection.AsQueryable()
20         let records = collection'.Where(fun x -> x.Id  = customerId)
21         records |> Seq.head
22 
23     member this.GetCustomerDataFromDriverId(driverId)=
24         let connectionString = "aaa"
25         let client = MongoDB.Driver.MongoClient(connectionString)
26         let server = client.GetServer()
27         let database = server.GetDatabase("battlehackraleigh");
28         let collection = database.GetCollection<Customer>("customers");
29         let collection' = collection.AsQueryable()
30         let records = collection'.Where(fun x -> x.Number  = driverId)
31         records |> Seq.head

There were 19 teams in Raleigh’s hackathon and my team placed 3rd. I think the general consensus of our team (and the teams around us) is that we should have won with the idea but our presentation was very weak (the problem with coders presenting to non-coders). We had 2 minutes to present and 1 minute for QA. We packed our 2 minutes with technical details when we should have been spinning the ideas. Also, I completely blew the QA piece.

Question #1

Q: “How did you Integration IBM Watson?”

A: “We used it for the language translation service”

A I Wished I Said: “We baked machine learning into the app. Do you know how Uber does surge pricing? We tried a series of models that forecast a person’s driving based on their recent history. If we see someone creeping up the danger scale, we increase the reward payout for them for the week. The winning model was a linear regression, it had the best false-positive rate. It is machine learning because we continually train our model as new data comes in.

Question #2

Q: “How will you make money on this?”

A: “Since we are taking money from poor drivers and giving it to good drivers, presumably we could keep a part for the company”

A I Wished I Said: “Making is money is so far from our minds. Right now, there are too many kids driving around over capacity schools and after talking to the chief of police, they are looking for some good ideas. This application is about social good first and foremost.”

Lesson learned –> I hate to say it, but if you are in a hack-a-thon, you need to know the judge’s background. There was not an obvious coder on the panel, so we should have gone with more high level stuff and answered technical details in the QA. Unfortunately, the coaches at Battlehack said it was the other way around (technical details 1st) in our dry-run. In fact, we ditched the slide that showed a picture of the car crash at Panther Creek High School that started this app as well as the heat map. That would have been much more effective in hindsight.

Filed under F#, Hack-A-Thons, MongoDb

Refactoring McCaffrey’s Regression to F#

April 14, 2015 1 Comment

James McCaffrey’s most recent MSDN article is about multi-class regression article is a great starting place for folks interested in the ins and outs of creating a regression. You can find the article here. He wrote the code in C# in a very much imperative style so the FSharp in me immediately wanted to rewrite it in F#.

Interestingly, Mathias Brandewinder also had the same idea and did a better (and more complete) job than me. You can see his post here.

I decided to duck into McCaffrey’s code and see where I could rewrite part of the code. My first step was to move his C# code to a more manageable format.

I changed the project from a console app to a .dll and then split the two classes into their own file. I then added some unit tests so that I can verify that my reworking was correct:

 1     [TestClass]
 2     public class CSLogisticMultiTests
 3     {
 4         LogisticMulti _lc = null;
 5         double[][] _trainData;
 6         double[][] _testData;
 7 
 8         public CSLogisticMultiTests()
 9         {
10             int numFeatures = 4;
11             int numClasses = 3;
12             int numRows = 1000;
13             int seed = 42;
14             var data = LogisticMultiProgram.MakeDummyData(numFeatures, numClasses, numRows, seed);
15             LogisticMultiProgram.SplitTrainTest(data, 0.80, 7, out _trainData, out _testData);
16             _lc = new LogisticMulti(numFeatures, numClasses);
17 
18             int maxEpochs = 100;
19             double learnRate = 0.01;
20             double decay = 0.10;
21             _lc.Train(_trainData, maxEpochs, learnRate, decay);
22         }
23 
24         [TestMethod]
25         public void GetWeights_ReturnExpected()
26         {
27             double[][] bestWts = _lc.GetWeights();
28             var expected = 13.939104508387803;
29             var actual = bestWts[0][0];
30             Assert.AreEqual(expected, actual);
31         }
32 
33         [TestMethod]
34         public void GetBiases_ReturnExpected()
35         {
36             double[] bestBiases = _lc.GetBiases();
37             var expected = 11.795019237894717;
38             var actual = bestBiases[0];
39             Assert.AreEqual(expected, actual);
40         }
41 
42         [TestMethod]
43         public void GetTrainAccuracy_ReturnExpected()
44         {
45             var expected = 0.92125;
46             var actual = _lc.Accuracy(_trainData);
47             Assert.AreEqual(expected, actual);
48         }
49 
50         [TestMethod]
51         public void GetTestAccuracy_ReturnExpected()
52         {
53             var expected = 0.895;
54             double actual = _lc.Accuracy(_testData);
55             Assert.AreEqual(expected, actual);
56         }
57     }
58

You will notice that this is the exact code that McCaffrey uses in his output for the Console app. In any event, they were running all green

I then went into the F# Project and fired up the REPL. I decided to start with the MakeDummyData method because it seemed beefy enough to demonstrate the language differences between the languages, it is fairly self-contained, and its data is already testable. Here is the first 9 lines of code.

1       Random rnd = new Random(seed); 
2       double[][] wts = new double[numFeatures][];
3       for (int i = 0; i < numFeatures; ++i)
4         wts[i] = new double[numClasses];
5       double hi = 10.0;
6       double lo = -10.0;
7       for (int i = 0; i < numFeatures; ++i)
8         for (int j = 0; j < numClasses; ++j)
9           wts[i][j] = (hi - lo) * rnd.NextDouble() + lo;

And here is the F# equivalent

1 let rnd = new Random(seed)    
2 let hi = 10.0
3 let lo = -10.0
4 let wts = Array.create numFeatures (Array.create numClasses 1.)
5 let wts' = wts |> Array.map(fun row -> row |> Array.map(fun col -> (hi - lo) * rnd.NextDouble() + lo))
6

There is one obvious difference and 1 subtle difference. The obvious difference is that the F# code does not do any looping to create and populate the array of arrays data structure, rather it uses the high-order Array.Map function. This reduces the idiomatic line count from 9 to 5 – a 50% decrease (and a funny move from the 1980s). (Note that I use the words “idiomatic line count” because you can reduce both examples to a single line of code but that makes in unworkable by humans. Both examples show the typical way you would write code in the language.) So with the fewer lines of code, which is more readable? That is a subjective opinion. A C#/Java/Javascript/Curly-Brace dev would say the C#. Everyone else in the world would say F#.

The less obvious difference is that F# emphasizes immutability so that there are two variables (wts and wts’) and the C# has 1 variable that is mutated. The implication is lost in such a small example, but if the numFeatures was large, you would want to take advantage of mutli-core processors and the F# code is ready for parallelism. The C# code would have to be reworked to use an immutable collection.

The next lines create and populate the biases variable. The C# Code:

1       double[] biases = new double[numClasses];
2       for (int i = 0; i < numClasses; ++i)
3         biases[i] = (hi - lo) * rnd.NextDouble() + lo;
4

And the F# Code

1 let biases = Array.create numClasses 1.
2 let biases' = biases |> Array.map(fun row -> (hi - lo) * rnd.NextDouble() + lo)
3

Same deal as before. No loops or mutation. Fewer lines of code and better readability.

The last set of code is a ball of string so it is very hard to separate out.

 1       double[][] result = new double[numRows][]; // allocate result
 2       for (int i = 0; i < numRows; ++i)
 3         result[i] = new double[numFeatures + numClasses];
 4 
 5       for (int i = 0; i < numRows; ++i) // create one row at a time
 6       {
 7         double[] x = new double[numFeatures]; // generate random x-values
 8         for (int j = 0; j < numFeatures; ++j)
 9           x[j] = (hi - lo) * rnd.NextDouble() + lo;
10 
11         double[] y = new double[numClasses]; // computed outputs storage
12         for (int j = 0; j < numClasses; ++j) // compute z-values
13         {
14           for (int f = 0; f < numFeatures; ++f)
15             y[j] += x[f] * wts[f][j];
16           y[j] += biases[j];
17         }
18 
19         // determine loc. of max (no need for 1 / 1 + e^-z)
20         int maxIndex = 0;
21         double maxVal = y[0];
22         for (int c = 0; c < numClasses; ++c)
23         {
24           if (y[c] > maxVal)
25           {
26             maxVal = y[c];
27             maxIndex = c;
28           }
29         }
30         
31         for (int c = 0; c < numClasses; ++c) // convert y to 0s or 1s
32           if (c == maxIndex)
33             y[c] = 1.0;
34           else
35             y[c] = 0.0;
36 
37         int col = 0; // copy x and y into result
38         for (int f = 0; f < numFeatures; ++f)
39           result[i][col++] = x[f];
40         for (int c = 0; c < numClasses; ++c)
41           result[i][col++] = y[c];
42       }
43

Note the use of code comments, which is typically considered a code smell, even in demonstration code.

Here is the F# Code:

 1 let x = Array.create numFeatures 1.
 2 let x' = x |> Array.map(fun row -> (hi - lo) * rnd.NextDouble() + lo)
 3 
 4 let xWts = Array.zip x' wts'
 5 let xWts' = xWts |> Array.map(fun (x,wts) -> wts |> Array.sumBy(fun wt -> wt * x))
 6 
 7 let y = Array.create numClasses 1.
 8 let yWts = Array.zip y xWts'
 9 let y' = yWts |> Array.map(fun (y,xwt) -> y + xwt)
10  
11 let yBias = Array.zip y' biases'
12 let y'' = yBias |> Array.map(fun (y,bias) -> y + bias)
13 
14 let maxVal = y'' |> Array.max
15 
16 let y''' = y'' |> Array.map(fun y -> if y = maxVal then 1. else 0.)
17 
18 let xy = Array.append x' y'''
19 let result = Array.create numRows xy

This is pretty much the same as before,no loops, immutability, and a 50% reduction of code. Also, notice that by using a more functional style breaks apart the ball of string. Individual values are one their own line to be individual evaluated and manipulated. Also, the if..then statement goes to a single line.

So I had a lot of fun working through these examples. The major differences were

Amount of Code and Code Readability
Immutability and ready for parallelism

I am not planning to refactor the rest of the project, but you can too as the project is found

here

. I am curious if using an array of arrays is the best way to represent the matric –> I guess it is standard for the curly-brace community? I would think using Deedle would be better, but I don’t know enough about it (yet).

Filed under Analytics, F#

← Older posts

Newer posts →

Jamie Dixon's Home

Geocoding Using Texas A&M Service and F#

//Build Word Count!

Business Logic and F#

System.AggregateException using Tweetinvi

“Word Counts”: Using FSharp and HDInsight

Set For List Comparisons in F#

Using the XML Type Provider

Global Azure Bootcamp: Car Lab Analysis

Battlehack Raleigh

Refactoring McCaffrey’s Regression to F#

Categories

Recent Posts

Archives

Blogroll

Meta