June | 2015 | Jamie Dixon's Home

Sandcastle Help File Builder and FSharp

June 30, 2015 3 Comments

If you are going to write and release a professional-grade .NET assembly, there are some things that need to be considered: logging, exception handling, and documentation. For .NET components, Sandcastle Help File Builder is the go-to tool to generate documentation as either the old-school .chm file or as a web deploy.

Consider an assembly that contains a Customer record type, an interface for a Customer Repository, and two implementations (In-Memory and ADO.NET)

 1 type Customer = {id:int; firstName:string; lastName:string}
 2 
 3 type ICusomerRepository =
 4     abstract member GetCustomer : int -> Customer
 5     abstract member InsertCustomer: Customer -> int
 6     abstract member DeleteCustomer: int -> unit
 7 
 8 type InMemoryCustomerRepository ()= 
 9     let customers = [
10             {id=1; firstName = "First"; lastName = "Customer"}
11             {id=2; firstName = "Second"; lastName = "Customer"}
12             {id=3; firstName = "Third"; lastName = "Customer"}]
13     let customers' = new List<Customer>(customers)
14 
15     interface ICusomerRepository with
16         member this.GetCustomer(id:int) =
17             customers' |> Seq.find(fun c -> c.id = id)
18         member this.InsertCustomer(customer: Customer) =
19             let nextId = customers'.Count
20             let customer' = {customer with id=nextId}
21             customers'.Add(customer')
22             nextId
23         member this.DeleteCustomer(id: int) =
24             let customer = customers |> Seq.find(fun c -> c.id = id)
25             customers'.Remove(customer) |> ignore
26 
27 type SqlServerCustomerRepository (connectionString:string) =
28     interface ICusomerRepository with
29         member this.GetCustomer(id:int) =
30             use connection = new SqlConnection(connectionString)
31             let commandText = "Select * from customers where id = " + id.ToString()
32             use command = new SqlCommand(commandText, connection)
33             connection.Open()
34             use reader = command.ExecuteReader()
35             reader.Read() |> ignore
36             {id=reader.[0] :?> int; 
37             firstName=reader.[1] :?> string; 
38             lastName =reader.[2] :?> string}
39                         
40         member this.InsertCustomer(customer: Customer) =
41             use connection = new SqlConnection(connectionString)
42             let commandText = new StringBuilder()
43             commandText.Append("Insert customers values") |> ignore
44             commandText.Append(customer.firstName) |> ignore
45             commandText.Append(",") |> ignore
46             commandText.Append(customer.lastName) |> ignore
47             use command = new SqlCommand(commandText.ToString(), connection)
48             connection.Open()
49             command.ExecuteNonQuery()
50 
51         member this.DeleteCustomer(id: int) =
52             use connection = new SqlConnection(connectionString)
53             let commandText = "Delete customers where id = " + id.ToString()
54             use command = new SqlCommand(commandText, connection)
55             connection.Open()
56             command.ExecuteNonQuery() |> ignore
57

To auto-generate XML code comments, you need to mark “XML documentation file” on the Build page of project properties:

With the .XML file created during the build, you can then fire up Sandcastle to point to the .XML file

With that, you can get some nice component documents based on your XML Code Comments. Since I have not put any into my project yet, there is nothing in the docs.

So therein lies the rub. I started entering XML comments (bare minimum) like so:

 1 /// <summary>
 2 /// Interface for Customer Repository implementations.
 3 /// </summary>
 4 type ICusomerRepository =
 5     /// <summary>
 6     /// Get a single validated customer.
 7     /// </summary>
 8     ///<param name="param0">The customer Id</param>
 9     ///<returns>A validated Customer.</returns>
10     abstract member GetCustomer : int -> Customer
11     /// <summary>
12     /// Insert a single validated customer.
13     /// </summary>
14     ///<param name="param0">A validated customer.</param>
15     ///<returns>The Id of the customer, generated by the respository.</returns>
16     abstract member InsertCustomer: Customer -> int
17     /// <summary>
18     /// Deletes a single customer from the respository.
19     /// </summary>
20     ///<param name="param0">The customer Id</param>
21     abstract member DeleteCustomer: int -> unit

And you can see what happens. The code base goes from 5 lines of readable code to 21 lines of clutter to make the help file.

One of the tenants of good code is that it is clean –> so we use SOLID principles, run FxCop, and the like. Another tenant of good code is that it is uncluttered –> so we use FSharp, use ROP instead of structured exception handling, and avoid boilerplates and templating. The problem is that we still can’t get away from clutter if we want to have good documentation. Option A is to just drop documentation, a laudable but unrealistic goal, especially in a corporate environment. Option B I am not sure on. I am wondering if I create a separate file in the project just for the code comments. That way the actual code is uncluttered and you can work with it undistracted and the XML still gets generated…

Filed under Coding Best Practices, F#

R for the .NET Developer

June 23, 2015 Leave a comment

I spent some time over the last week putting my ideas down for a new speaking topic: “R for the .NET Developer.” With Microsoft acquiring Revolution Analytics and making a concerted push into analytics tooling and platforms, it makes sense that .NET developers have some exposure to the most common language in the data science space – R.

I started the presentation using Prezi (thanks David Green) and set up the major points I wanted to cover:

· R Overview
· R Language Features
· R In Action
· R Lessons Learned

You can see the Prezi here.

I worked through and then borrowed from several different books:

this great you tube clip

and this Pluralsight course

I then jumped into R Studio to work though some of the code ideas that the Prezi illustrates. The entire set of code is found here on Github here but I wanted to show a couple of the cooler things that I did.

First, I implemented the Automotive In R from Data Mining and Business Analytics Book. This is pretty much a straight port of his exercise, with the exception is that I convert some vectors to factors to demonstrate who/when to do it:

 1 setwd("C:\\Git\\R4DotNet")
 2 
 3 #y = x1 + x2 + x3 + E
 4 #y is what you are trying explain
 5 #x1, x2, x3 are the variables that cause/influence y
 6 #E is things that we are not measuring/ using for calculations
 7 
 8 fuel.efficiency <- read.csv("C:/Git/R4DotNet/Data/FuelEfficiency.csv")
 9 summary(fuel.efficiency)
10 
11 #MPG = Miles per gallon
12 #GPM = Gallons per 100 miles
13 #WT = Weight of car in 1000 lbs
14 #DIS = Displacment in cubic inches
15 #NC = number of cylinders
16 #HP = Horsepower
17 #ACC = Acceleration in seconds from 0-60
18 #ET = Engine Type 0 = V, 1 = Straight
19 
20 plot(GPM~WT,data=fuel.efficiency)
21 plot(GPM~DIS,data=fuel.efficiency)
22 
23 fuel.efficiency$NC <- factor(fuel.efficiency$NC)
24 fuel.efficiency$ET <- factor(fuel.efficiency$ET)
25 summary(fuel.efficiency)
26 
27 plot(GPM~NC,data=fuel.efficiency)
28 
29 model <- lm(GPM~.,data=fuel.efficiency)
30 summary(model)
31 
32 # Multiple R-squared:  0.9804 
33 # means that we can explain 98% of the GPM with the variables we have E = 2%
34 # That is pretty friggen good
35 
36 # turning back to numeric so we can do cor accross data frame
37 fuel.efficiency$NC <- as.integer(fuel.efficiency$NC)
38 fuel.efficiency$ET <- as.integer(fuel.efficiency$ET)
39 cor(fuel.efficiency)
40 
41 #DIS -> WT = 0.9507647
42 
43 library(leaps)
44 x=fuel.efficiency[,3:7]
45 y=fuel.efficiency[,2]
46 out = summary(regsubsets(x,y,nbest=2,nvmax=ncol(x)))
47 tab=cbind(out$which,out$req,out$adjr2,out$cp)
48 tab
49 
50 #trade off between model size and model fit
51 #just weight is 
52 
53 model2 = lm(GPM~WT,data=fuel.efficiency)
54 summary(model2)

Here are the plots (as continuous and as a factor):

Then, I implemented this K-Means from Azure ML to show the difference between the two implementations. The AzureML experiment is found here. And my code looks like this. Note that I did not do a regression

 1 flowers <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")
 2 summary(flowers)
 3 
 4 colnames(flowers) <- c("F1", "F2", "F3", "F4", "Label")
 5 summary(flowers)
 6 
 7 
 8 indexes = sample(1:nrow(flowers), size=0.6*nrow(flowers))
 9 flowers.train <- flowers[-indexes,]
10 flowers.test <- flowers[indexes,]
11 
12 fit <- kmeans(flowers.train[,1:4],5)
13 fit
14 
15 plot(flowers.train[c("F1", "F2")], col=fit$cluster)
16 points(fit$centers[,c("F1", "F2")], col=1:3, pch=8, cex=2)

With a plot example like this:

So I think I am ready for the presentation. It is really true, the best way to learn about something is to teach it…

Filed under R

Parsing Wireshark Files Using F#

June 16, 2015 1 Comment

I went to the Research Triangle Analysts Meetup for network security where I was exposed to Wireshark for the1st time. One of the problems with analyzing packets is that the data comes in a variety of structures, depending on the nature of what is being captured and what level of commutation is being analyzed. I decided to learn a bit about network analysis using this book:

One of the examples was analyzing Twitter Direct Messages. The interesting thing is that the contents of DMs are sent in plain text, so that is a good word to the wise.

I was thinking about how to best analyze the sample packets for the DM and I immediately thought of using F# Type Providers. You can export the data from Wireshark in a variety of formats, I chose XML for no particular reason

After exporting the data to the file system and bring the data in via the TP, I then wrote a quick script to see how fast I could get to the message sent. Turns out pretty quick:

 1 open System.IO
 2 open FSharp.Data
 3 
 4 [<Literal>]
 5 let uri = @"C:\Users\jamie\Desktop\ChickenSoftware.PacketAnalysis.Solution\Data\twitter_dm"
 6 
 7 type Context = XmlProvider<uri>
 8 let data = Context.Load(uri)
 9 
10 let protoes = data.Packets |> Seq.collect(fun p -> p.Protoes)
11 let fields = protoes |> Seq.collect(fun p -> p.Fields)
12 let content = fields |> Seq.filter(fun f -> f.Name = Some "urlencoded-form")
13 let values = content |> Seq.map(fun c -> c.Showname)
14 let values' = values |> Seq.filter(fun v -> v.Value.Contains("text"))
15 values'

So F# makes is very easy to consume the data and traverse it. I am curious how easy it will be to start applying machine learning to these files using F#. That is up next…

Filed under F#, Network Analysis

Geocoding Using Texas A&M Service and F#

June 9, 2015 1 Comment

Geocoding is the technique of taking an address and turning it in to a geocordinate (latitude and longitude). There are plenty of geocoding services out there (notably Google and Bing) but I decided to try a lesser know (though apparently just as good) service from Texas A&M found here.

It took about 30 seconds to create an account with the only downer is that they only give you 2,500 calls (and it is not clear if that is per day/month/forever). In any event, the documentation was easy enough to work though and since they seem to offer the data in a variety of formats, I picked the json type provider. Also, their documentation starts off with POST examples, which a bit harder to mange using a TP, but they do support GETs with the parameters in the query string.

I fired up Visual Studio and started a new FSharp project. The first thing I did was pull down their sample json result which I saved as a local file to the project folder (I also put a sample XML in there in case the json provider didn’t work out. Ironically, I would not get the XmlProvider working but the Json one worked like a champ):

I then added in some code to make the request. A majority of the code is creating the query string:

 1 #r "../packages/FSharp.Data.2.2.2/lib/net40/FSharp.Data.dll"
 2 
 3 open System.IO
 4 open System.Text
 5 open FSharp.Data
 6 
 7 [<Literal>]
 8 let sample = "C:\Users\Dixon\Desktop\SampleApp_CSharp\ChickenSoftware.Geolocation.Solution\Data\TAMUHttpGet.json"
 9 
10 type Context = JsonProvider<sample>
11 
12 let streetAddress = "904 Strathorn Drive"
13 let city = "Cary"
14 let state = "NC"
15 let zip = "27519"
16 let apiKey = "XXXXXXX"
17 
18 let stringBuilder = new StringBuilder()
19 stringBuilder.Append("https://geoservices.tamu.edu/Services/Geocode/WebService/GeocoderWebServiceHttpNonParsed_V04_01.aspx") |> ignore
20 stringBuilder.Append("?streetAddress=") |> ignore
21 stringBuilder.Append(streetAddress) |> ignore
22 stringBuilder.Append("&city=") |> ignore
23 stringBuilder.Append(city) |> ignore
24 stringBuilder.Append("&state=") |> ignore
25 stringBuilder.Append(state) |> ignore
26 stringBuilder.Append("&zip=") |> ignore
27 stringBuilder.Append(zip) |> ignore
28 stringBuilder.Append("&apiKey=") |> ignore
29 stringBuilder.Append(apiKey) |> ignore
30 stringBuilder.Append("&version=4.01") |> ignore
31 stringBuilder.Append("&format=json") |> ignore
32 
33 let searchUri = stringBuilder.ToString()
34 let searchResult = Context.Load(searchUri)
35 
36 let firstResult = searchResult.OutputGeocodes |> Seq.head
37 firstResult.OutputGeocode.Latitude
38 firstResult.OutputGeocode.Longitude
39 firstResult.OutputGeocode.MatchScore
40 
41 
42 
43 
44

And sure enough: data that is correct

FSharp made it stupid simple to consume this service and the only real gotchas I found were in the documentation itself:

1) The MatchScore is a decimal but the json sample has it as “100” so it was inferred as an int. I replaced the value as “98.4023668639053” to force the correct type

2) The documentation’s formats are listed as this

But since they had samples in json below, I just added in

1 stringBuilder.Append("&format=json") |> ignore

and it worked fine.

You can see the gist here.

Filed under F#

//Build Word Count!

June 3, 2015 2 Comments

I started working with HadoopFs last week to see if I could get a better understanding of how to write FSharp mappers. Since everyone uses word counts when doing a “Hello World” using hadoop, I thought I would also.

I decided to compare Satya’s //Build keynote from 2014 and 2015 to see if there was any shift in his focus between last year and this. Isaac Abraham managed to reduce the 20+ lines of catastrophic C# code in the Azure HDInsight tutorial into 2 lines of F# code

 1         static void Main(string[] args)
 2         {
 3             if (args.Length > 0)
 4             {
 5                 Console.SetIn(new StreamReader(args[0]));
 6             }
 7 
 8             string line;
 9             string[] words;
10 
11             while ((line = Console.ReadLine()) != null)
12             {
13                 words = line.Split(' ');
14 
15                 foreach (string word in words)
16                     Console.WriteLine(word.ToLower());
17             }
18         }

1 let result = testString.Split([| ' ' |], StringSplitOptions.RemoveEmptyEntries) |> Seq.countBy id
2 result
3

I added the data files to my solution and then added way to locate those files via a relative path.

1 let baseDirectory = __SOURCE_DIRECTORY__
2 let baseDirectory' = Directory.GetParent(baseDirectory)
3 let filePath = "Data\Build_Keynote2014.txt"
4 let fullPath = Path.Combine(baseDirectory'.FullName, filePath)
5 let buildKeynote =  File.ReadAllText(fullPath)

I then ran the mapper that Isaac created and got what I expected

1 buildKeynote.Split([| ' ' |], StringSplitOptions.RemoveEmptyEntries) 
2     |> Seq.countBy id
3     |> Seq.sortBy(fun (w,c) -> c)
4     |> Seq.toList
5     |> List.rev

Interestingly, the 1st word that really jumps out is “Windows” at 26 times.

I then loaded in the 2015 Build keynote and ran the same function

1 let filePath' = "Data\Build_Keynote2015.txt"
2 let fullPath' = Path.Combine(baseDirectory'.FullName, filePath')
3 let buildKeynote' =  File.ReadAllText(fullPath')
4 
5 buildKeynote'.Split([| ' ' |], StringSplitOptions.RemoveEmptyEntries) 
6     |> Seq.countBy id
7     |> Seq.sortBy(fun (w,c) -> c)
8     |> Seq.toList
9     |> List.rev

And the 1st interesting word is “Platform” at 9 mentions. “Windows” fell to 2 mentions.

1 result |> Seq.filter(fun (w,c) -> w = "Windows")

And just because I couldn’t resist

1 result |> Seq.filter(fun (w,c) -> w = "F#")
2 result |> Seq.filter(fun (w,c) -> w = "C#")
3

So I am feeling pretty good about HadoopFs and will now start trying to implement it on my instance of Azure this weekend.

Filed under F#, Hadoop

Jamie Dixon's Home

Sandcastle Help File Builder and FSharp

R for the .NET Developer

Parsing Wireshark Files Using F#

Geocoding Using Texas A&M Service and F#

//Build Word Count!

Categories

Recent Posts

Archives

Blogroll

Meta