Sandcastle Help File Builder and FSharp

If you are going to write and release a professional-grade  .NET assembly, there are some things that need to be considered: logging, exception handling, and documentation.  For .NET components, Sandcastle Help File Builder is the go-to tool to generate documentation as either the old-school .chm file or as a web deploy. 

Consider an assembly that contains a Customer record type, an interface for a Customer Repository, and two implementations (In-Memory and ADO.NET)

1 type Customer = {id:int; firstName:string; lastName:string} 2 3 type ICusomerRepository = 4 abstract member GetCustomer : int -> Customer 5 abstract member InsertCustomer: Customer -> int 6 abstract member DeleteCustomer: int -> unit 7 8 type InMemoryCustomerRepository ()= 9 let customers = [ 10 {id=1; firstName = "First"; lastName = "Customer"} 11 {id=2; firstName = "Second"; lastName = "Customer"} 12 {id=3; firstName = "Third"; lastName = "Customer"}] 13 let customers' = new List<Customer>(customers) 14 15 interface ICusomerRepository with 16 member this.GetCustomer(id:int) = 17 customers' |> Seq.find(fun c -> c.id = id) 18 member this.InsertCustomer(customer: Customer) = 19 let nextId = customers'.Count 20 let customer' = {customer with id=nextId} 21 customers'.Add(customer') 22 nextId 23 member this.DeleteCustomer(id: int) = 24 let customer = customers |> Seq.find(fun c -> c.id = id) 25 customers'.Remove(customer) |> ignore 26 27 type SqlServerCustomerRepository (connectionString:string) = 28 interface ICusomerRepository with 29 member this.GetCustomer(id:int) = 30 use connection = new SqlConnection(connectionString) 31 let commandText = "Select * from customers where id = " + id.ToString() 32 use command = new SqlCommand(commandText, connection) 33 connection.Open() 34 use reader = command.ExecuteReader() 35 reader.Read() |> ignore 36 {id=reader.[0] :?> int; 37 firstName=reader.[1] :?> string; 38 lastName =reader.[2] :?> string} 39 40 member this.InsertCustomer(customer: Customer) = 41 use connection = new SqlConnection(connectionString) 42 let commandText = new StringBuilder() 43 commandText.Append("Insert customers values") |> ignore 44 commandText.Append(customer.firstName) |> ignore 45 commandText.Append(",") |> ignore 46 commandText.Append(customer.lastName) |> ignore 47 use command = new SqlCommand(commandText.ToString(), connection) 48 connection.Open() 49 command.ExecuteNonQuery() 50 51 member this.DeleteCustomer(id: int) = 52 use connection = new SqlConnection(connectionString) 53 let commandText = "Delete customers where id = " + id.ToString() 54 use command = new SqlCommand(commandText, connection) 55 connection.Open() 56 command.ExecuteNonQuery() |> ignore 57

To auto-generate XML code comments, you need to mark “XML documentation file” on the Build page of project properties:

image

With the .XML file created during the build, you can then fire up Sandcastle to point to the .XML file

image

With that, you can get some nice component documents based on your XML Code Comments.  Since I have not put any into my project yet, there is nothing in the docs.

image

So therein lies the rub.  I started entering XML comments (bare minimum) like so:

1 /// <summary> 2 /// Interface for Customer Repository implementations. 3 /// </summary> 4 type ICusomerRepository = 5 /// <summary> 6 /// Get a single validated customer. 7 /// </summary> 8 ///<param name="param0">The customer Id</param> 9 ///<returns>A validated Customer.</returns> 10 abstract member GetCustomer : int -> Customer 11 /// <summary> 12 /// Insert a single validated customer. 13 /// </summary> 14 ///<param name="param0">A validated customer.</param> 15 ///<returns>The Id of the customer, generated by the respository.</returns> 16 abstract member InsertCustomer: Customer -> int 17 /// <summary> 18 /// Deletes a single customer from the respository. 19 /// </summary> 20 ///<param name="param0">The customer Id</param> 21 abstract member DeleteCustomer: int -> unit

image

And you can see what happens.  The code base goes from 5 lines of readable code to 21 lines of clutter to make the help file.

One of the tenants of good code is that it is clean –> so we use SOLID principles, run FxCop, and the like.  Another tenant of good code is that it is uncluttered –> so we use FSharp, use ROP instead of structured exception handling, and avoid boilerplates and templating.  The problem is that we still can’t get away from clutter if we want to have good documentation.  Option A is to just drop documentation, a laudable but unrealistic goal, especially in a corporate environment.  Option B I am not sure on.  I am wondering if I create a separate file in the project just for the code comments.  That way the actual code is uncluttered and you can work with it undistracted and the XML still gets generated…

 

R for the .NET Developer

I spent some time over the last week putting my ideas down for a new speaking topic: “R for the .NET Developer.” With Microsoft acquiring Revolution Analytics and making a concerted push into analytics tooling and platforms, it makes sense that .NET developers have some exposure to the most common language in the data science space – R.

I started the presentation using Prezi (thanks David Green) and set up the major points I wanted to cover:

  • · R Overview
  • · R Language Features
  • · R In Action
  • · R Lessons Learned

You can see the Prezi here.

I worked through and then borrowed from several different books:

clip_image002clip_image004clip_image006clip_image008

clip_image010image image image image

this great you tube clip

clip_image012

and this Pluralsight course

clip_image014

I then jumped into R Studio to work though some of the code ideas that the Prezi illustrates. The entire set of code is found here on Github here but I wanted to show a couple of the cooler things that I did.

First, I implemented the Automotive In R from Data Mining and Business Analytics Book. This is pretty much a straight port of his exercise, with the exception is that I convert some vectors to factors to demonstrate who/when to do it:

1 setwd("C:\\Git\\R4DotNet") 2 3 #y = x1 + x2 + x3 + E 4 #y is what you are trying explain 5 #x1, x2, x3 are the variables that cause/influence y 6 #E is things that we are not measuring/ using for calculations 7 8 fuel.efficiency <- read.csv("C:/Git/R4DotNet/Data/FuelEfficiency.csv") 9 summary(fuel.efficiency) 10 11 #MPG = Miles per gallon 12 #GPM = Gallons per 100 miles 13 #WT = Weight of car in 1000 lbs 14 #DIS = Displacment in cubic inches 15 #NC = number of cylinders 16 #HP = Horsepower 17 #ACC = Acceleration in seconds from 0-60 18 #ET = Engine Type 0 = V, 1 = Straight 19 20 plot(GPM~WT,data=fuel.efficiency) 21 plot(GPM~DIS,data=fuel.efficiency) 22 23 fuel.efficiency$NC <- factor(fuel.efficiency$NC) 24 fuel.efficiency$ET <- factor(fuel.efficiency$ET) 25 summary(fuel.efficiency) 26 27 plot(GPM~NC,data=fuel.efficiency) 28 29 model <- lm(GPM~.,data=fuel.efficiency) 30 summary(model) 31 32 # Multiple R-squared: 0.9804 33 # means that we can explain 98% of the GPM with the variables we have E = 2% 34 # That is pretty friggen good 35 36 # turning back to numeric so we can do cor accross data frame 37 fuel.efficiency$NC <- as.integer(fuel.efficiency$NC) 38 fuel.efficiency$ET <- as.integer(fuel.efficiency$ET) 39 cor(fuel.efficiency) 40 41 #DIS -> WT = 0.9507647 42 43 library(leaps) 44 x=fuel.efficiency[,3:7] 45 y=fuel.efficiency[,2] 46 out = summary(regsubsets(x,y,nbest=2,nvmax=ncol(x))) 47 tab=cbind(out$which,out$req,out$adjr2,out$cp) 48 tab 49 50 #trade off between model size and model fit 51 #just weight is 52 53 model2 = lm(GPM~WT,data=fuel.efficiency) 54 summary(model2)

Here are the plots (as continuous and as a factor):

image image

Then, I implemented this K-Means from Azure ML to show the difference between the two implementations. The AzureML experiment is found here.  And my code looks like this. Note that I did not do a regression

1 flowers <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data") 2 summary(flowers) 3 4 colnames(flowers) <- c("F1", "F2", "F3", "F4", "Label") 5 summary(flowers) 6 7 8 indexes = sample(1:nrow(flowers), size=0.6*nrow(flowers)) 9 flowers.train <- flowers[-indexes,] 10 flowers.test <- flowers[indexes,] 11 12 fit <- kmeans(flowers.train[,1:4],5) 13 fit 14 15 plot(flowers.train[c("F1", "F2")], col=fit$cluster) 16 points(fit$centers[,c("F1", "F2")], col=1:3, pch=8, cex=2)

With a plot example like this:

image

So I think I am ready for the presentation.  It is really true, the best way to learn about something is to teach it…

Parsing Wireshark Files Using F#

I went to the Research Triangle Analysts Meetup for network security where I was exposed to Wireshark for the1st time.  One of the problems with analyzing packets is that the data comes in a variety of structures, depending on the nature of what is being captured and what level of commutation is being analyzed.  I decided to learn a bit about network analysis using this book:

image

One of the examples was analyzing Twitter Direct Messages.  The interesting thing is that the contents of DMs are sent in plain text, so that is a good word to the wise.

image

I was thinking about how to best analyze the sample packets for the DM and I immediately thought of using F# Type Providers.  You can export the data from Wireshark in a variety of formats, I chose XML for no particular reason

image

After exporting the data to the file system and bring the data in via the TP, I then wrote a quick script to see how fast I could get to the message sent.  Turns out pretty quick:

1 open System.IO 2 open FSharp.Data 3 4 [<Literal>] 5 let uri = @"C:\Users\jamie\Desktop\ChickenSoftware.PacketAnalysis.Solution\Data\twitter_dm" 6 7 type Context = XmlProvider<uri> 8 let data = Context.Load(uri) 9 10 let protoes = data.Packets |> Seq.collect(fun p -> p.Protoes) 11 let fields = protoes |> Seq.collect(fun p -> p.Fields) 12 let content = fields |> Seq.filter(fun f -> f.Name = Some "urlencoded-form") 13 let values = content |> Seq.map(fun c -> c.Showname) 14 let values' = values |> Seq.filter(fun v -> v.Value.Contains("text")) 15 values'

image

So F# makes is very easy to consume the data and traverse it.  I am curious how easy it will be to start applying machine learning to these files using F#.  That is up next…

Geocoding Using Texas A&M Service and F#

Geocoding is the technique of taking an address and turning it in to a geocordinate (latitude and longitude).  There are plenty of geocoding services out there (notably Google and Bing) but I decided to try a lesser know (though apparently just as good) service from Texas A&M found here.

It took about 30 seconds to create an account with the only downer is that they only give you 2,500 calls (and it is not clear if that is per day/month/forever).  In any event, the documentation was easy enough to work though and since they seem to offer the data in a variety of formats, I picked the json type provider.  Also, their documentation starts off with POST examples, which a bit harder to mange using a TP, but they do support GETs with the parameters in the query string.

I fired up Visual Studio and started a new FSharp project.  The first thing I did was pull down their sample json result which I saved as a local file to the project folder (I also put a sample XML in there in case the json provider didn’t work out.  Ironically, I would not get the XmlProvider working but the Json one worked like a champ):

image

image

I then added in some code to make the request.  A majority of the code is creating the query string:

1 #r "../packages/FSharp.Data.2.2.2/lib/net40/FSharp.Data.dll" 2 3 open System.IO 4 open System.Text 5 open FSharp.Data 6 7 [<Literal>] 8 let sample = "C:\Users\Dixon\Desktop\SampleApp_CSharp\ChickenSoftware.Geolocation.Solution\Data\TAMUHttpGet.json" 9 10 type Context = JsonProvider<sample> 11 12 let streetAddress = "904 Strathorn Drive" 13 let city = "Cary" 14 let state = "NC" 15 let zip = "27519" 16 let apiKey = "XXXXXXX" 17 18 let stringBuilder = new StringBuilder() 19 stringBuilder.Append("https://geoservices.tamu.edu/Services/Geocode/WebService/GeocoderWebServiceHttpNonParsed_V04_01.aspx") |> ignore 20 stringBuilder.Append("?streetAddress=") |> ignore 21 stringBuilder.Append(streetAddress) |> ignore 22 stringBuilder.Append("&city=") |> ignore 23 stringBuilder.Append(city) |> ignore 24 stringBuilder.Append("&state=") |> ignore 25 stringBuilder.Append(state) |> ignore 26 stringBuilder.Append("&zip=") |> ignore 27 stringBuilder.Append(zip) |> ignore 28 stringBuilder.Append("&apiKey=") |> ignore 29 stringBuilder.Append(apiKey) |> ignore 30 stringBuilder.Append("&version=4.01") |> ignore 31 stringBuilder.Append("&format=json") |> ignore 32 33 let searchUri = stringBuilder.ToString() 34 let searchResult = Context.Load(searchUri) 35 36 let firstResult = searchResult.OutputGeocodes |> Seq.head 37 firstResult.OutputGeocode.Latitude 38 firstResult.OutputGeocode.Longitude 39 firstResult.OutputGeocode.MatchScore 40 41 42 43 44

And sure enough: data that is correct

image

FSharp made it stupid simple to consume this service and the only real gotchas I found were in the documentation itself:

1) The MatchScore is a decimal but the json sample has it as “100” so it was inferred as an int.  I replaced the value as “98.4023668639053” to force the correct type

2) The documentation’s formats are listed as this

image

But since they had samples in json below, I just added in

1 stringBuilder.Append("&format=json") |> ignore

and it worked fine.

You can see the gist here.

//Build Word Count!

I started working with HadoopFs last week to see if I could get a better understanding of how to write FSharp mappers.  Since everyone uses word counts when doing a “Hello World” using hadoop, I thought I would also.

image

I decided to compare Satya’s //Build keynote from 2014 and 2015 to see if there was any shift in his focus between last year and this.  Isaac Abraham managed to reduce the 20+ lines of catastrophic C# code in the Azure HDInsight tutorial into 2 lines of F# code

C#

1 static void Main(string[] args) 2 { 3 if (args.Length > 0) 4 { 5 Console.SetIn(new StreamReader(args[0])); 6 } 7 8 string line; 9 string[] words; 10 11 while ((line = Console.ReadLine()) != null) 12 { 13 words = line.Split(' '); 14 15 foreach (string word in words) 16 Console.WriteLine(word.ToLower()); 17 } 18 }

F#

1 let result = testString.Split([| ' ' |], StringSplitOptions.RemoveEmptyEntries) |> Seq.countBy id 2 result 3

I added the data files to my solution and then added way to locate those files via a relative path.

1 let baseDirectory = __SOURCE_DIRECTORY__ 2 let baseDirectory' = Directory.GetParent(baseDirectory) 3 let filePath = "Data\Build_Keynote2014.txt" 4 let fullPath = Path.Combine(baseDirectory'.FullName, filePath) 5 let buildKeynote = File.ReadAllText(fullPath)

I then ran the mapper that Isaac created and got what I expected

1 buildKeynote.Split([| ' ' |], StringSplitOptions.RemoveEmptyEntries) 2 |> Seq.countBy id 3 |> Seq.sortBy(fun (w,c) -> c) 4 |> Seq.toList 5 |> List.rev

Capture

Interestingly, the 1st word that really jumps out is “Windows” at 26 times.

I then loaded in the 2015 Build keynote and ran the same function

1 let filePath' = "Data\Build_Keynote2015.txt" 2 let fullPath' = Path.Combine(baseDirectory'.FullName, filePath') 3 let buildKeynote' = File.ReadAllText(fullPath') 4 5 buildKeynote'.Split([| ' ' |], StringSplitOptions.RemoveEmptyEntries) 6 |> Seq.countBy id 7 |> Seq.sortBy(fun (w,c) -> c) 8 |> Seq.toList 9 |> List.rev

Capture2

And the 1st interesting word is “Platform” at 9 mentions.  “Windows” fell to 2 mentions.

1 result |> Seq.filter(fun (w,c) -> w = "Windows")

image

And just because I couldn’t resist

1 result |> Seq.filter(fun (w,c) -> w = "F#") 2 result |> Seq.filter(fun (w,c) -> w = "C#") 3

image

So I am feeling pretty good about HadoopFs and will now start trying to implement it on my instance of Azure this weekend.