Business Logic and F#

One of the reasons I like F# so much is that it allows me to think about the problem I am trying to solve, not about the language syntax and coding around language constructs.  Consider this example. 

I am putting on an art show in my neighborhood and I managed to obtain 3 paintings of cultural significance:

Starry Night

Capture2

Sunday Afternoon on the Island of La Grande Jatte

Capture

Dogs Playing Poker

Capture3

Each painting is in its own room and due to the volume of people that the art gallery can support, a person can only visit 1 painting.  1,000 tickets sold and all 1,000 people are going to show up.  This is a hot event.

I needed a way to forecast how many people will go into each room.  Since all 3 paintings are immensely popular, I could assume that each room will have 1/3 the number of visitors.  However, I wanted to be be a bit more precise and I know that each painting has a certain number of tags associated with them:

Tag Starry Night Afternoon Poker
Impressionism X X  
Nature X X  
Leisure Activity   X X
Modernism     X

Assuming that people will want to go see paintings with tags that interest them, paintings that have tag overlap will split visitors, paintings with no tag overlap will see more visitors, and paintings with more tags will draw more visitors.  In Excel:

Tag Starry Night Afternoon Poker Total
Impressionism 1 1   2
Nature 1 1   2
Leisure Activity   1 1 2
Modernism     1 1
         
         
Tag Starry Night Afternoon Poker  
Impressionism 0.5 0.5    
Nature 0.5 0.5    
Leisure Activity   0.5 0.5  
Modernism     1  
         
         
Tag People Starry Night Afternoon Poker
Impressionism 250 125 125  
Nature 250 125 125  
Leisure Activity 250   125 125
Modernism 250     250
  1,000 250 375 375

Putting this to code, I opened up the F# REPL and created my art show like so:

1 2 type Painting = {id:int;name:string;tags:string} 3 type ArtShow = {id:int;name:string;expectedAttendance:int;paintings:Painting list} 4 5 let painting0 = {id=0; 6 name="Starry Night"; 7 tags="Impressionism;Nature"} 8 let painting1 = {id=1; 9 name="Sunday Afternoon on the Island of La Grande Jatte"; 10 tags="Impressionism;Nature;LeisureActivities"} 11 let painting2 = {id=2; 12 name="Dogs Playing Poker"; 13 tags="Modernism;LeisureActivities"} 14 let paintings = [painting0;painting1;painting2] 15 16 let artShow = {id=0; 17 name="Art Extravaganza"; 18 expectedAttendance=1000; 19 paintings=paintings} 20

I then needed a way of uniquely identifying the tags.  Enter the goodness of piping and high order functions:

1 let tagSet = artShow.paintings |> Seq.map(fun p -> p.tags) 2 |> Seq.collect(fun t -> t.Split(';')) 3 |> Seq.groupBy(fun t -> t) 4 |> Seq.map(fun (id,t) -> id, t |> Seq.length) 5

image

I then needed a way of assigning number of people to tags.  Easy enough (this could have been part of the code block above but I split it for illustrative purposes)

1 let visitorsPerTag = artShow.expectedAttendance / (tagSet |> Seq.length) 2 let tagSet' = tagSet |> Seq.map(fun (id,c) -> id, visitorsPerTag/ c )

image

And then a function that calculates the number of expected visitor based on that the individual painting:

1 let tagModifier(painting: Painting) = 2 let tags = painting.tags.Split(';') 3 tags |> Seq.map(fun pt -> tagSet' |> Seq.find(fun(t,c) -> pt = t)) 4 |> Seq.sumBy(fun(t,c) -> c )

And running it against my show’s paintings gives me the expected values:

1 artShow.paintings |> Seq.map(fun p -> p, tagModifier(p)) 2

image

So this is why I love F#.  The REPL and the language helped me reason and solve the problem.  You can see the gist here.

After note: I sent the same challenge to some C# devs I know about how they would reason and then code the answer.  No one took me up on it.

System.AggregateException using Tweetinvi

Dear Future Jamie

If you are using TweetInvi in a new project and you get a System.AggregateException

Capture1

And that exception contains a single Inner exception of System.IO.FileNotFoundException and the exception reads “cannot load System.Http.Primitives”

 Capture

Install Microsoft.Net.Http in the calling project (in this case it was the unit test project).

Capture2

 

Love

Current Jamie

PS You should really exercise more

Global Azure Bootcamp Racing Game: More Analytics Using R and AzureML

Alan Smith, the creator and keeper of the Global Azure Bootcamp Racing Game, was kind enough to put the telemetry data from the races out on Azure Blob Storage.  The data was already available as XML from Table Storage but AzureML was choking on the format so Alan was kind enough to turn it in to csv and put the file out here:

https://alanazuredemos.blob.core.windows.net/alan/TelemetryData0.csv
https://alanazuredemos.blob.core.windows.net/alan/TelemetryData1.csv
https://alanazuredemos.blob.core.windows.net/alan/TelemetryData2.csv
https://alanazuredemos.blob.core.windows.net/alan/PlayerLapTimes0.csv
https://alanazuredemos.blob.core.windows.net/alan/PlayerLapTimes1.csv
https://alanazuredemos.blob.core.windows.net/alan/PlayerLapTimes2.csv

Note that there are 3 races with race0, race1, and race2 each having 2 datasets.  The TelemetryData is a reading foreaceach car in the race every 10 MS or so and the PlayerLapTimes is a summary of the demographics of the player as well as some final results.

I decided to do some unsupervised learning using Chapter 8 of Practical Data Science With R as my guide.  I pulled down all 972,780 observations from the Race0 telemetry data in R Studio.  It took a bit :-)  I then ran the following script to do a cluster dendrogram.  Alas, I killed the job after several minutes (actually the job killed my machine and I got a out of memory exception)

1 summary(TelemetryData0) 2 pmatrix <- scale(TelemetryData0[,]) 3 d <- dist(pmatrix, method="euclidean") 4 pfit <- hclust(d,method="ward") 5 plot(pfit) 6

I then tried to narrow my search down to damage and speed:

1 damage <- TelemetryData0$Damage 2 speed <- TelemetryData0$Speed 3 4 plot(damage, speed, main="Damage and Speed", 5 xlab="Damage ", ylab="Speed ", pch=20) 6 7 abline(lm(speed~speed), col="red") # regression line (y~x) 8 lines(lowess(speed,speed), col="blue") # lowess line (x,y) 9

(I added the red line manually)

image

So that is interesting.  It looks like there is a slight downhill (more damage) the lower the speed.  So perhaps speed does not automatically mean more damage to the car.  Anyone who drives in San Francisco can attest to that 🙂

I then went back and took a sample of the telemetry data

1 telemetry <- TelemetryData0[sample(1:nrow(TelemetryData0),10000),] 2 telemetry <- telemetry[0:10000,c("Damage","Speed")] 3 summary(telemetry) 4 pmatrix <- scale(telemetry[,]) 5 d <- dist(pmatrix, method="euclidean") 6 pfit <- hclust(d,method="ward") 7 plot(pfit) 8

And I got this:

image

And the fact that it is not showing me anything made me think of this clip:

image

In any event, I decided to try a similar analysis using AzureML to see if AzureML can handle the 975K records better than my desktop.

I fired up AzureML and added a data reader to the original file and then added some cleaning:

image

The problem is that these steps would take 10-12 minutes to complete.  I decided to give up and bring a copy of the data locally via the “Save As Dataset” context menu.  This speed things up significantly.  I added in a k-means module for speed and damage and ran the model

image 

The first ten times or so I ran this, I got a this

image

After I added in the “Clean Missing Data” module before the normalization step,

image

I got some results.  Note that Removing the entire row is what R does as a default when cleaning the data via import so I thought I would keep it matching.  In any event, the results look like this:

image

So I am not sure what this shows, other than there is overlap of speed and damage and there seems to be a relationship.

So there are some other questions I want to answer, like:

1) After a player sustains some damage, do they have a generic response (like breaking, turning right, etc…)

2) Are there certain “lines’’” that winner players take going though individual curves?

3) Do you really have to avoid damage to win?

I plan to try and answer these questions and more in the coming weeks.

“Word Counts”: Using FSharp and HDInsight

 

I decided to learn a bit more about HDINisght, Microsoft’s implementation of Hadoop on Azure.  I was surprised about the dirth of tutorials on-line (not even Pluralsight) with only this one seemingly having what I wanted.  I started down the tutorial path –> and rewrite the map and reduce programs in F#.

Here is the original mapper code (in C#)

1 static void Main(string[] args) 2 { 3 if (args.Length > 0) 4 { 5 Console.SetIn(new StreamReader(args[0])); 6 } 7 8 string line; 9 string[] words; 10 11 while ((line = Console.ReadLine()) != null) 12 { 13 words = line.Split(' '); 14 15 foreach (string word in words) 16 Console.WriteLine(word.ToLower()); 17 } 18 }

And here it is in F#

1 [<EntryPoint>] 2 let main argv = 3 if argv.Length > 0 then 4 let inputString = argv.[0] 5 Console.SetIn(new StreamReader(inputString)) 6 let mutable continueLooping = true 7 while continueLooping do 8 let line = Console.ReadLine() 9 match String.IsNullOrEmpty(line) with 10 | true -> 11 continueLooping <- false 12 | false -> 13 let words = line.Split(' ') 14 words |> Seq.iter(fun w -> Console.WriteLine(w.ToLower())) 15 0

 

And here is the original reducer in C#

1 static void Main(string[] args) 2 { 3 string word, lastWord = null; 4 int count = 0; 5 6 if (args.Length > 0) 7 { 8 Console.SetIn(new StreamReader(args[0])); 9 } 10 11 while ((word = Console.ReadLine()) != null) 12 { 13 if (word != lastWord) 14 { 15 if(lastWord != null) 16 Console.WriteLine("{0}[{1}]", lastWord, count); 17 18 count = 1; 19 lastWord = word; 20 } 21 else 22 { 23 count += 1; 24 } 25 } 26 Console.WriteLine(count); 27 }

and here it is in F#

1 [<EntryPoint>] 2 let main argv = 3 if argv.Length > 0 then 4 let inputString = argv.[0] 5 Console.SetIn(new StreamReader(inputString)) 6 let mutable continueLooping = true 7 let mutable lastWord = String.Empty 8 let mutable count = 0 9 while continueLooping do 10 let word = Console.ReadLine() 11 match String.IsNullOrEmpty(word), word = lastWord, String.IsNullOrEmpty(lastWord) with 12 | true,_,_ -> 13 continueLooping <- false 14 | false,true,_ -> 15 count <- count + 1 16 | false,false,true -> 17 count <- 1 18 lastWord <- word 19 | false,false,false -> 20 Console.WriteLine("{0}[{1}]",lastWord,count) 21 Console.WriteLine(count) 22 0

 

The biggest difference is that the conditional if..thens of the imperative style C# is replaced by pattern matching, which I feel makes the logic much more understandable.  The use of the mutable keyword is a smell, but I am not sure how to loop user input in a Console app without it.

In any event, with the programs complete and pushed out to the Hadoop file system, I ran it via the Azure Powershell

 image

 

image

And looking at the output, nothing is coming down.

image

Drat.  I then tried to run the C# program and nothing is coming down.  I wonder if it is a problem with the original code or perhaps the data I am using?  The tutorial does not include a link to a dataset that works with the programs so I am a bit out of luck.  More investigation needed, as it were.

Set For List Comparisons in F#

Dear Jamie Of The Future:

Next time you want to see if there are elements in 2 different lists, use Set

1 let tags0 = Set.ofList(["A";"B";"C"]) 2 let tags1 = Set.ofList(["A";"D"]) 3 let tags2 = Set.ofList(["A";"B"]) 4 let tags3 = Set.ofList(["D"]) 5 6 Set.intersect tags0 tags1 7 Set.intersect tags0 tags2 8 Set.intersect tags0 tags3

image

Love, Jamie of May 2015

PS.  You really should exercise more…

Using the XML Type Provider

Dear Future Jamie:

If you want to use the XML Type Provider to read an XML document from the web and you see something like this:

image

You need to add a reference to System.Xml.Linq.  The easiest way is to do Add.Reference in the solution explorer and and copy/paste the path from its property window into your script:

image

And then you should be cooking with gas:

image

Love,

Jamie of May 2015

PS: You really should exercise more…