May | 2015 | Jamie Dixon's Home

Business Logic and F#

May 26, 2015 2 Comments

One of the reasons I like F# so much is that it allows me to think about the problem I am trying to solve, not about the language syntax and coding around language constructs. Consider this example.

I am putting on an art show in my neighborhood and I managed to obtain 3 paintings of cultural significance:

Starry Night

Sunday Afternoon on the Island of La Grande Jatte

Dogs Playing Poker

Each painting is in its own room and due to the volume of people that the art gallery can support, a person can only visit 1 painting. 1,000 tickets sold and all 1,000 people are going to show up. This is a hot event.

I needed a way to forecast how many people will go into each room. Since all 3 paintings are immensely popular, I could assume that each room will have 1/3 the number of visitors. However, I wanted to be be a bit more precise and I know that each painting has a certain number of tags associated with them:

Tag	Starry Night	Afternoon	Poker
Impressionism	X	X
Nature	X	X
Leisure Activity		X	X
Modernism			X

Assuming that people will want to go see paintings with tags that interest them, paintings that have tag overlap will split visitors, paintings with no tag overlap will see more visitors, and paintings with more tags will draw more visitors. In Excel:

Tag	Starry Night	Afternoon	Poker	Total
Impressionism	1	1		2
Nature	1	1		2
Leisure Activity		1	1	2
Modernism			1	1


Tag	Starry Night	Afternoon	Poker
Impressionism	0.5	0.5
Nature	0.5	0.5
Leisure Activity		0.5	0.5
Modernism			1


Tag	People	Starry Night	Afternoon	Poker
Impressionism	250	125	125
Nature	250	125	125
Leisure Activity	250		125	125
Modernism	250			250
	1,000	250	375	375

Putting this to code, I opened up the F# REPL and created my art show like so:

 1 
 2 type Painting = {id:int;name:string;tags:string}
 3 type ArtShow = {id:int;name:string;expectedAttendance:int;paintings:Painting list}
 4 
 5 let painting0 = {id=0;
 6                 name="Starry Night";
 7                 tags="Impressionism;Nature"}
 8 let painting1 = {id=1;
 9                 name="Sunday Afternoon on the Island of La Grande Jatte"; 
10                 tags="Impressionism;Nature;LeisureActivities"}
11 let painting2 = {id=2;
12                 name="Dogs Playing Poker";
13                 tags="Modernism;LeisureActivities"}
14 let paintings = [painting0;painting1;painting2]
15 
16 let artShow = {id=0;
17                 name="Art Extravaganza";
18                 expectedAttendance=1000;
19                 paintings=paintings}
20

I then needed a way of uniquely identifying the tags. Enter the goodness of piping and high order functions:

1 let tagSet = artShow.paintings |> Seq.map(fun p -> p.tags)
2                                     |> Seq.collect(fun t -> t.Split(';'))
3                                     |> Seq.groupBy(fun t -> t)
4                                     |> Seq.map(fun (id,t) -> id, t |> Seq.length)
5

I then needed a way of assigning number of people to tags. Easy enough (this could have been part of the code block above but I split it for illustrative purposes)

1 let visitorsPerTag = artShow.expectedAttendance / (tagSet |> Seq.length)
2 let tagSet' = tagSet |> Seq.map(fun (id,c) -> id, visitorsPerTag/ c )

And then a function that calculates the number of expected visitor based on that the individual painting:

1 let tagModifier(painting: Painting) =
2     let tags = painting.tags.Split(';')
3     tags |> Seq.map(fun pt -> tagSet' |> Seq.find(fun(t,c) -> pt = t))
4                                       |> Seq.sumBy(fun(t,c) -> c )

And running it against my show’s paintings gives me the expected values:

1 artShow.paintings |> Seq.map(fun p -> p, tagModifier(p))
2

So this is why I love F#. The REPL and the language helped me reason and solve the problem. You can see the gist here.

After note: I sent the same challenge to some C# devs I know about how they would reason and then code the answer. No one took me up on it.

Filed under F#

System.AggregateException using Tweetinvi

May 26, 2015 2 Comments

Dear Future Jamie

If you are using TweetInvi in a new project and you get a System.AggregateException

And that exception contains a single Inner exception of System.IO.FileNotFoundException and the exception reads “cannot load System.Http.Primitives”

Install Microsoft.Net.Http in the calling project (in this case it was the unit test project).

Love

Current Jamie

PS You should really exercise more

Filed under F#

Global Azure Bootcamp Racing Game: More Analytics Using R and AzureML

May 19, 2015 1 Comment

Alan Smith, the creator and keeper of the Global Azure Bootcamp Racing Game, was kind enough to put the telemetry data from the races out on Azure Blob Storage. The data was already available as XML from Table Storage but AzureML was choking on the format so Alan was kind enough to turn it in to csv and put the file out here:

https://alanazuredemos.blob.core.windows.net/alan/TelemetryData0.csv
https://alanazuredemos.blob.core.windows.net/alan/TelemetryData1.csv
https://alanazuredemos.blob.core.windows.net/alan/TelemetryData2.csv
https://alanazuredemos.blob.core.windows.net/alan/PlayerLapTimes0.csv
https://alanazuredemos.blob.core.windows.net/alan/PlayerLapTimes1.csv
https://alanazuredemos.blob.core.windows.net/alan/PlayerLapTimes2.csv

Note that there are 3 races with race0, race1, and race2 each having 2 datasets. The TelemetryData is a reading foreaceach car in the race every 10 MS or so and the PlayerLapTimes is a summary of the demographics of the player as well as some final results.

I decided to do some unsupervised learning using Chapter 8 of Practical Data Science With R as my guide. I pulled down all 972,780 observations from the Race0 telemetry data in R Studio. It took a bit :-) I then ran the following script to do a cluster dendrogram. Alas, I killed the job after several minutes (actually the job killed my machine and I got a out of memory exception)

1 summary(TelemetryData0)
2 pmatrix <- scale(TelemetryData0[,])
3 d <- dist(pmatrix, method="euclidean")
4 pfit <- hclust(d,method="ward")
5 plot(pfit)
6

I then tried to narrow my search down to damage and speed:

1 damage <- TelemetryData0$Damage
2 speed <- TelemetryData0$Speed
3 
4 plot(damage, speed, main="Damage and Speed", 
5      xlab="Damage ", ylab="Speed ", pch=20)
6 
7 abline(lm(speed~speed), col="red") # regression line (y~x) 
8 lines(lowess(speed,speed), col="blue") # lowess line (x,y)
9

(I added the red line manually)

So that is interesting. It looks like there is a slight downhill (more damage) the lower the speed. So perhaps speed does not automatically mean more damage to the car. Anyone who drives in San Francisco can attest to that 🙂

I then went back and took a sample of the telemetry data

1 telemetry <- TelemetryData0[sample(1:nrow(TelemetryData0),10000),]
2 telemetry <- telemetry[0:10000,c("Damage","Speed")]
3 summary(telemetry)
4 pmatrix <- scale(telemetry[,])
5 d <- dist(pmatrix, method="euclidean")
6 pfit <- hclust(d,method="ward")
7 plot(pfit)
8

And I got this:

And the fact that it is not showing me anything made me think of this clip:

In any event, I decided to try a similar analysis using AzureML to see if AzureML can handle the 975K records better than my desktop.

I fired up AzureML and added a data reader to the original file and then added some cleaning:

The problem is that these steps would take 10-12 minutes to complete. I decided to give up and bring a copy of the data locally via the “Save As Dataset” context menu. This speed things up significantly. I added in a k-means module for speed and damage and ran the model

The first ten times or so I ran this, I got a this

After I added in the “Clean Missing Data” module before the normalization step,

I got some results. Note that Removing the entire row is what R does as a default when cleaning the data via import so I thought I would keep it matching. In any event, the results look like this:

So I am not sure what this shows, other than there is overlap of speed and damage and there seems to be a relationship.

So there are some other questions I want to answer, like:

1) After a player sustains some damage, do they have a generic response (like breaking, turning right, etc…)

2) Are there certain “lines’’” that winner players take going though individual curves?

3) Do you really have to avoid damage to win?

I plan to try and answer these questions and more in the coming weeks.

Filed under Azure ML, R

May 15, 2015 Leave a comment

Filed under Uncategorized

“Word Counts”: Using FSharp and HDInsight

May 12, 2015 1 Comment

I decided to learn a bit more about HDINisght, Microsoft’s implementation of Hadoop on Azure. I was surprised about the dirth of tutorials on-line (not even Pluralsight) with only this one seemingly having what I wanted. I started down the tutorial path –> and rewrite the map and reduce programs in F#.

Here is the original mapper code (in C#)

 1 static void Main(string[] args)
 2 {
 3     if (args.Length > 0)
 4     {
 5         Console.SetIn(new StreamReader(args[0]));
 6     }
 7 
 8     string line;
 9     string[] words;
10 
11     while ((line = Console.ReadLine()) != null)
12     {
13         words = line.Split(' ');
14 
15         foreach (string word in words)
16             Console.WriteLine(word.ToLower());
17     }
18 }

And here it is in F#

 1 [<EntryPoint>]
 2 let main argv = 
 3     if argv.Length > 0 then
 4         let inputString = argv.[0]
 5         Console.SetIn(new StreamReader(inputString))
 6     let mutable continueLooping = true
 7     while continueLooping do
 8         let line = Console.ReadLine()
 9         match String.IsNullOrEmpty(line) with
10         | true -> 
11             continueLooping <- false
12         | false ->
13             let words = line.Split(' ')
14             words |> Seq.iter(fun w -> Console.WriteLine(w.ToLower()))
15     0

And here is the original reducer in C#

 1 static void Main(string[] args)
 2 {
 3     string word, lastWord = null;
 4     int count = 0;
 5 
 6     if (args.Length > 0)
 7     {
 8         Console.SetIn(new StreamReader(args[0]));
 9     }
10 
11     while ((word = Console.ReadLine()) != null)
12     {
13         if (word != lastWord)
14         {
15             if(lastWord != null)
16                 Console.WriteLine("{0}[{1}]", lastWord, count);
17 
18             count = 1;
19             lastWord = word;
20         }
21         else
22         {
23             count += 1; 
24         }
25     }
26     Console.WriteLine(count);
27 }

and here it is in F#

 1 [<EntryPoint>]
 2 let main argv = 
 3     if argv.Length > 0 then
 4         let inputString = argv.[0]
 5         Console.SetIn(new StreamReader(inputString))
 6     let mutable continueLooping = true
 7     let mutable lastWord = String.Empty
 8     let mutable count = 0
 9     while continueLooping do
10         let word = Console.ReadLine()
11         match String.IsNullOrEmpty(word), word = lastWord, String.IsNullOrEmpty(lastWord) with
12         | true,_,_ -> 
13             continueLooping <- false
14         | false,true,_ ->
15             count <- count + 1
16         | false,false,true ->
17             count <- 1
18             lastWord <- word
19         | false,false,false ->
20             Console.WriteLine("{0}[{1}]",lastWord,count)
21     Console.WriteLine(count)
22     0

The biggest difference is that the conditional if..thens of the imperative style C# is replaced by pattern matching, which I feel makes the logic much more understandable. The use of the mutable keyword is a smell, but I am not sure how to loop user input in a Console app without it.

In any event, with the programs complete and pushed out to the Hadoop file system, I ran it via the Azure Powershell

And looking at the output, nothing is coming down.

Drat. I then tried to run the C# program and nothing is coming down. I wonder if it is a problem with the original code or perhaps the data I am using? The tutorial does not include a link to a dataset that works with the programs so I am a bit out of luck. More investigation needed, as it were.

Filed under F#, Hadoop, HDInsight

Set For List Comparisons in F#

May 12, 2015 2 Comments

Dear Jamie Of The Future:

Next time you want to see if there are elements in 2 different lists, use Set

1 let tags0 = Set.ofList(["A";"B";"C"])
2 let tags1 = Set.ofList(["A";"D"])
3 let tags2 = Set.ofList(["A";"B"])
4 let tags3 = Set.ofList(["D"])
5 
6 Set.intersect tags0 tags1 
7 Set.intersect tags0 tags2 
8 Set.intersect tags0 tags3

Love, Jamie of May 2015

PS. You really should exercise more…

Filed under F#

Using the XML Type Provider

May 5, 2015 1 Comment

Dear Future Jamie:

If you want to use the XML Type Provider to read an XML document from the web and you see something like this:

You need to add a reference to System.Xml.Linq. The easiest way is to do Add.Reference in the solution explorer and and copy/paste the path from its property window into your script:

And then you should be cooking with gas:

Love,

Jamie of May 2015

PS: You really should exercise more…

Filed under F#

Jamie Dixon's Home

Business Logic and F#

System.AggregateException using Tweetinvi

Global Azure Bootcamp Racing Game: More Analytics Using R and AzureML

“Word Counts”: Using FSharp and HDInsight

Set For List Comparisons in F#

Using the XML Type Provider

Categories

Recent Posts

Archives

Blogroll

Meta