Functional Bioinformatics Algorithms: Part 2

Pressing on with more bioinformatic algorithms implemented in a functional style, the next algorithm found in Bioinformatics Algorithms by Compeau and Pevzner is to find the most frequent pattern in a string of text.

I started writing in the imperative style from the book like so (the length of the substring to be found is called the “k-mer” so it gets the parameter name “k”)

type Count = {Text:string; PatternCount:int}
let frequentWords (text:string) (k:int) =
    let patternCount (text:string) (pattern:string) =
        |> Seq.windowed pattern.Length 
        |> c -> new string(c))
        |> Seq.filter(fun s -> s = pattern)
        |> Seq.length

    let counts = new List<Count>()
    for i = 0 to text.Length-k do
        let pattern = text.Substring(i,k)
        let patternCount = patternCount text pattern
        let count = {Text=text; PatternCount=patternCount}
        counts |> Seq.orderByDesc(fun c -> c.PatternCount)
        let maxCount = counts|> Seq.head
    let frequentPatterns = new List<Count>()
    for i = 0 to counts.length
        if count.[i].PatternCount = maxCount then

But I gave up because, well, the code is ridiculous. I went back to the original pattern count algorithms written in F# and then added in a block to find the most frequent patterns:

let frequentWords (text:string) (k:int) =
    let patternCounts =
        |> Seq.windowed k
        |> c -> new string(c))
        |> Seq.countBy(fun s -> s)
        |> Seq.sortByDescending(fun (s,c) -> c)
    let maxCount = patternCounts |> Seq.head |> snd
        |> Seq.filter(fun (s,c) -> c = maxCount)
        |> (s,c) -> s)

The VS Code linter was not happy with my Seq.countBy implementation… but it works. I think the code is explanatory:

  1. window the string for the length of k

2. do a countyBy on the resulting substrings

3. sort it by descending, find the top substring amount

4. filter the substring list by that top substring count.

The last map returns just the pattern and leaves off the frequency, which I think is a mistake but is how the book implements it. Here is an example of the frequentWords function in action:

let getRandomNuclotide () =
    let dictionary = ["A";"C";"G";"T"]
    let random = new Random()

let getRandomSequence (length:int) =
    let nuclotides = [ for i in 0 .. length -> getRandomNuclotide() ]
    String.Join("", nuclotides)

let largerText = getRandomSequence 1000000

let currentFrequentWords = frequentWords largerText 9

I didn’t set the seed value for generating the largerText string so the results will be different each time.

Gist is here

Functional Bioinformatics Algorithms

I have been looking at bioinformatics much more seriously recently by taking Algorithms for DNA Sequencing by Ben Langmead on Cousera and working though Bioinformatics Algorithms by Compeau and Pevzner. 

I noticed in both cases that the code samples are very much imperative focused: lots of loops, lots of mutable variables, lots of mess. I endeavored to re-write the code in a more functional style using immutable variables and pipelining functions

Consider the code for Pattern Count, a function that counts how often a pattern appears in a larger string. For example, the pattern “AAA” appears in the larger string “AAATTTAAA” twice. If the larger string was “AAAA”, the pattern “AAA” is also twice since AAA appears in index 0-2 and index 1-3.

Here is the code that appears in the book:

let patternCount (text:string) (pattern:string) =
    let mutable count = 0
    for i = 0 to (text.Length - pattern.Length) do
        let subString = text.Substring(i,pattern.Length)
        if subString = pattern then
            count <- count + 1

Contrast that to a more functional style:

let patternCount (text:string) (pattern:string) =
    |> Seq.windowed pattern.Length 
    |> c -> new string(c))
    |> Seq.filter(fun s -> s = pattern)
    |> Seq.length

There are three main benefits of the functional style:

  1. The code is much more readable. Each step of the transformation is explicit as a single line of the pipeline. There is almost a one to one match between the text from the book and the code written. The only exception is the because the windowed function outputs as a char array and we need to transform it back into a string.
  2. The code is auditable. The pipeline can be stopped at each step and output reviewed for correctness.
  3. The code is reproducible. Because each of the steps uses immutable values, pattern count will produce the same result for a given input regardless of processor, OS, or other external factors.

In practice, the pattern count can be used like this:

let pattern = "ACTAT"
let counts = patternCount text pattern

val counts : int = 2

In terms of performance, I added in a function to make a sequence of ten million nucleotides and then searched it:

let getRandomNuclotide () =
    let dictionary = ["A";"C";"G";"T"]
    let random = new Random()

let getRandomSequence (length:int) =
    let nuclotides = [ for i in 0 .. length -> getRandomNuclotide() ]
    String.Join("", nuclotides)

let largerText = getRandomSequence 10000000
let counts = patternCount largerText pattern

Real: 00:00:00.814, CPU: 00:00:00.866, GC gen0: 173, gen1: 1, gen2: 0
val counts : int = 9816

It ran in about one second on my macbook using only 1 processor. If I wanted to make it faster and run this for the four billion nucleotides found in human DNA, I would use the Parallel Seq library, which is a single letter change to the code. That would be a post for another time…

The gist is here

Web Crawling Using F#

(This post is part of the FSharp Advent Calendar Series. Thanks Sergy Thion for organizing it)

Recently, I had the need to get articles from some United States government websites. You would think in 2019 that these sites might have apis and you would think wrong. In each case, I needed to crawl the site’s HTML and then extract the information. I started doing this with Python and its beautiful soup library but I can into the fundamental problem that getting the html was much harder than parsing the site. To illustrate, consider this website

I need to go through all 8 pages of the grid and download the .pdfs that are associated with the “View Report” link. The challenge in this particular site is that they didn’t do any url parameters so there is no way to go through the grid via the uri. Looking at the page source, they are using ASP.NET and in typical enterprise-derpy manner, named their table “GridView1”

The way to get to the next page is to press on the “Next” link defined like this:

They over-achieved in the bloated View State for a simple page category though.



So as bad as this site is, F# made getting the data a snap. I fired up Visual Studio and created a new .NET Core F# project. I added a script file. I realized that the button-press to get to the next page was going to be a pain to program, so I decided to use the .NET framework WebBrowser class. It’s nice because it has all of the apis I needed for the traversal and I didn’t have to make the control visible.

My first function was to get the uris from the grid – easy enough using the HtmlDocument and HtmlElement classes:

let getPdfUris (document:HtmlDocument) =

    let collection = document.GetElementsByTagName(“a”)


    |> Seq.cast

    |> Seq.filter(fun e -> e.OuterText = “View Report”)

    |> e -> e.GetAttribute(“href”))

Note the key word I used to filter was “View Report”, so at least the web designer stayed consistent there.

Next, I used basically the same logic to find the Next button in the DOM. Note that I am using the TryFind function so if the button is not found, a None is returned:

let getNextButton (document:HtmlDocument) =

    let collection = document.GetElementsByTagName(“a”)


    |> Seq.cast

    |> Seq.tryFind(fun e -> e.InnerText = “Next”)

So the next function was my giggle moment for this project. To “press” that button to invoke the javascript to go to the next page of the grid, I ised the InvokeMember method of the HtmlClass

let invokeNextButton (nextButton: HtmlElement) =

    nextButton.InvokeMember(“Click”) |> ignore

    printfn “Next Button Invoked”

Yup, that works! I was worried that I was going to have to screw around with the javascript or, worse, that beast called View State. Fortunately, that InvokeMember method worked fine. Another reason why I love the .NET framework.

So with these three functions set up, I created a method to be called each time the document is refreshed

let handlePage (browser:WebBrowser) (totalUris:List) =

    let document = browser.Document

    let uris = getPdfUris document


    let nextButton = getNextButton document

    match nextButton with

    | Some b ->

        invokeNextButton b

    | None -> ()

My C#-only friends spend waaaay to much time worrying about the last page and having no button and how to code it. I used the option type – is there a Next button? Press it and do work. No Button? Do nothing.

I put in a function to save the .pdf to my local file system

let downloadPdf (uri:string) =

    let client = new WebClient();

    let targetFilePath = @”C:\Temp\” + Guid.NewGuid().ToString() + “.pdf”;


And now I can write this all together:

let browser = new WebBrowser()

let uris = new List()

browser.DocumentCompleted.Add(fun _ -> handlePage browser uris)

let uri =;


printf “Links Done”

uris |> Seq.iter(fun uri -> downloadPdf uri)

printf “Downloads Done”

So I new up the browser, send handlePage to the DocumentCompleted event handler. Every time the Next button is pressed, the document loads and the DocumentCompleted event fires, and the .pdfs are downloaded and the next button is pressed. Until the last page, when there is no button to press.

And it worked like a champ:


Gist is found here 

The Counted Part 3: Law Enforcement Officers Killed In Line Of Duty

As a follow up to this post, I decided to look on the other side of the gun –> police officers killed in the line of duty.  Fortunately, the FBI collects this data here.  It looks like the FBI is a bit behind on their summary reports:


So taking the 2013 data as the closest data point to The Counted 2015 data, it took a couple of minutes to download the excel spreadsheet and format it as a useable .csv:

image  to image

After importing in the data in R studio, I did a quick summary on the data frame.  The most striking thing out of the gate is how few Officers are killed.  There were 27 in 2013, compared to over 500 people killed by police officers in the 1st half of 2015:

1 officers.killed <- read.csv("./Data/table_1_leos_fk_region_geographic_division_and_state_2013.csv") 2 sum(officers.killed$OfficersKilled) 3

I then added in the state population to do a similar ratio and map:

1 officers.killed.2 <- merge(x=officers.killed, 2 y=state.population.3, 3 by.x="StateName", 4 by.y="NAME") 5 6 officers.killed.2$AdjustedPopulation <- officers.killed.2$POPESTIMATE2014/10000 7 officers.killed.2$KilledRatio <- officers.killed.2$OfficersKilled/officers.killed.2$AdjustedPopulation 8 officers.killed.2$AdjKilledRatio <- officers.killed.2$KilledRatio * 10 9 officers.killed.2$StateName <- tolower(officers.killed.2$StateName) 10 11 choropleth.3 <- merge(x=all.states, 12 y=officers.killed.2, 13 sort = FALSE, 14 by.x = "region", 15 by.y = "StateName", 16 all.x=TRUE) 17 choropleth.3 <- choropleth.3[order(choropleth.3$order), ] 18 summary(choropleth.3) 19 20 qplot(long, lat, data = choropleth.3, group = group, fill = AdjKilledRatio, 21 geom = "polygon") 22


So Louisiana and West Virginia seem to have the highest number of officers killed per capita.  I am not surprised, being that I had no expectations about states that would have higher and lower numbers.  It seems likely a case of “gee-wiz” data.

Since there is so few instances, I decided to forgo any more analysis on police killed and instead combined this data with the people who were killed by police:

1 the.counted.state.5 <- merge(x=the.counted.state.4, 2 y=officers.killed.2, 3 by.x="StateName", 4 by.y="StateName") 5 6 names(the.counted.state.5)[names(the.counted.state.5)=="AdjKilledRatio.x"] <- "NonPoliceKillRatio" 7 names(the.counted.state.5)[names(the.counted.state.5)=="AdjKilledRatio.y"] <- "PoliceKillRatio" 8 9 the.counted.state.6 <- data.frame(the.counted.state.5$NonPoliceKillRatio, 10 the.counted.state.5$PoliceKillRatio, 11 log(the.counted.state.5$NonPoliceKillRatio), 12 log(the.counted.state.5$PoliceKillRatio)) 13 14 colnames(the.counted.state.6) <- c("NonPoliceKilledRatio","PoliceKilledRatio","LoggedNonPoliceKilledRatio","LoggedPoliceKilledRatio") 15 16 plot(the.counted.state.6) 17

and certainly the log helps out and there seems to be a relationship between states that have police killed and people being killed by police (my hand-drawn red lines added):


With that in mind, I created a couple of  linear models

1 non.police <- the.counted.state.6$LoggedNonPoliceKilledRatio 2 police <- the.counted.state.6$LoggedPoliceKilledRatio 3 police[police==-Inf] <- NA 4 5 model <- lm( non.police ~ police ) 6 summary(model) 7 8 model.2 <- lm( police ~ non.police) 9 summary(model.2) 10

Since there are only 2 variables, the adjusted R square is the same for x~y and y~x.



The interesting thing is the model has to account that many states had 0 police fatalities but had at least 1 person killed by the police.  The next interesting thing is the value of the coefficient: in starts where there was at least 1 police fatality and 1 person killed by the police, every police fatality increases the number of people killed by police .96 –> and this .96 is the log of the ratio of population.  So it shows that the police are better at killing then getting killed, which makes sense.

The full gist is found here.

Analytics in the Microsoft Stack

Disclaimer:  I really don’t know what I am talking about

I received an email from a coworker/friend yesterday with this in the body:

So, I have a friend who works for a major supermarket chain. In IT, they are straight out of the year 2000. They have tons and tons of data in SQL Server and I think Oracle. The industrial engineers (who do all of the planning) ask the IT group to run queries throughout the day, which takes hours to run. They use Excel for most of their processing. On the weekends, they run reporting queries which take hours and hours to run – all to get just basic information.

This got my wheels spinning about how I would approach the problem with the analytics toolset that I know is available.  The supermarket chain has a couple of problems

  • Lots of data that takes too long to munge through
  • The planners are dependent on IT group for processing the data

I would expect the official Microsoft answer is that they should implement Sql Server Analytics with Power BI.  I would assume if the group threw enough resources at this solution, it would work.  I then thought of a couple of alternative paths:

The first thing that comes to mind is using HDInsight (Microsoft’s Hadoop product)  on Azure.  That way the queries can run in a distributed manner and they can provision machines as they need them -> and when they are not running their queries, they can de-allocate the machines.

The second thought is using AzureML to do their model generation.  However, depending on the size of the datasets, AzureML may not be able to scale.  I have only used Azure ML on smaller datasets.

The third thought was using R?  I don’t think R is the best answer here.  Everything I know about R is that it is designed for data exploration and analysis of datasets that comfortably fit into the local machine’s memory.  Performance on R is horrible and scaling R is a real challenge. 

What about F#?  So this might be a good answer.  If you use the Hive Type Provider, you can get the benefits of HDInsight to do the processing and then have the goodness of the language syntax and REPL for data exploration.  Also, the group could look at MBrace for some kick-butt distributed processing that can scale on Azure. Finally, if they don come up with some kind of insight that lends itself for building analytics or models into an app, you can take the code out of the script file and stick it into a compliable assembly all within Visual Studio. 

What about Python?  No idea, I don’t enough about it

What about Matlab, SAS, etc..  No idea.  I stopped using those tools when R showed up.

What about Watson?  No idea.  I think I will have a better idea once I go to this.

F# Record Types with Entity Framework Code-Last

So based on the experience with code-first, I decided to look at using EF code-last (OK, database first).   I considered three different possibilities

  1. 1) Use AutoMapper
  2. 2) Use Reflection
  3. 3) Hand-Roll everything


If you are not familiar, Automapper is a library to allow you to,well, map types. The first thing I did was to create a database schema like this:

1 use FamilyDomain 2 3 CREATE TABLE Family 4 ( 5 Id int NOT NULL IDENTITY(1,1) PRIMARY KEY, 6 LastName varchar(255) NOT NULL 7 ) 8 9 CREATE TABLE Parent 10 ( 11 Id int NOT NULL IDENTITY(1,1) PRIMARY KEY, 12 FamilyId int NOT NULL, 13 FirstName varchar(255) NOT NULL 14 ) 15 16 CREATE TABLE Child 17 ( 18 Id int NOT NULL IDENTITY(1,1) PRIMARY KEY, 19 FamilyId int NOT NULL, 20 FirstName varchar(255) NOT NULL, 21 Gender varchar(10) NOT NULL, 22 Grade int NOT NULL 23 ) 24 25 CREATE TABLE Pet 26 ( 27 Id int NOT NULL IDENTITY(1,1) PRIMARY KEY, 28 ChildId int NOT NULL, 29 GivenName varchar(255) NOT NULL 30 ) 31 32 CREATE TABLE HomeAddress 33 ( 34 Id int NOT NULL IDENTITY(1,1) PRIMARY KEY, 35 FamilyId int NOT NULL, 36 StateCode varchar(2) NOT NULL, 37 County varchar(255) NOT NULL, 38 City varchar(255) NOT NULL 39 ) 40 41 ALTER TABLE Parent 42 ADD CONSTRAINT fk_Parent_Family 43 FOREIGN KEY (FamilyId) 44 REFERENCES Family(Id) 45 46 ALTER TABLE HomeAddress 47 ADD CONSTRAINT fk_HomeAddress_Family 48 FOREIGN KEY (FamilyId) 49 REFERENCES Family(Id) 50 51 ALTER TABLE Child 52 ADD CONSTRAINT fk_Child_Family 53 FOREIGN KEY (FamilyId) 54 REFERENCES Family(Id) 55 56 ALTER TABLE Pet 57 ADD CONSTRAINT fk_Pet_Child 58 FOREIGN KEY (ChildId) 59 REFERENCES Child(Id) 60 61 62 INSERT Family VALUES 63 ('Andersen') 64 65 INSERT Parent VALUES 66 (1,'Thomas'), 67 (1,'Mary Kay') 68 69 INSERT Child VALUES 70 (1,'Henriette Thaulow','Female',5) 71 72 INSERT Pet VALUES 73 (1,'Fluffy') 74 75 INSERT HomeAddress VALUES 76 (1,'WA','King','Seattle') 77

I then  installed automapper and entity framework type provider to a FSharp project.

1 #r @"../packages/AutoMapper.3.3.0/lib/net40/AutoMapper.dll" 2 #r "FSharp.Data.TypeProviders.dll" 3 #r "System.Data.Entity.dll" 4 5 open Microsoft.FSharp.Data.TypeProviders 6 open System.Data.Entity 7 open AutoMapper 8 9 //Entity Framework Types via Type Provider 10 let connectionString = @"Server=.;Initial Catalog=FamilyDomain;Integrated Security=SSPI;MultipleActiveResultSets=true" 11 type EntityConnection = SqlEntityConnection<ConnectionString="Server=.;Initial Catalog=FamilyDomain;Integrated Security=SSPI;MultipleActiveResultSets=true",Pluralize=true> 12

I then created some local FSharp record types the reflect the domain:

1 type Pet = {Id:int; GivenName:string} 2 type Child = {Id:int; FirstName:string; Gender:string; Grade:int; Pets: Pet list} 3 type Address = {Id:int; State:string; County:string; City:string} 4 type Parent = {Id:int; FirstName:string} 5 type Family = {Id:int; Parents:Parent list; Children: Child list; Address:Address}

So then I was ready to start mapping.  I started with a basic GET to a single type:

1 //AutoMapper setup 2 Mapper.CreateMap<EntityConnection.ServiceTypes.HomeAddress, Address>() 3 4 //Get one from the database 5 let context = EntityConnection.GetDataContext() 6 let addressQuery = query {for address in context.HomeAddresses do select address} 7 let address = Seq.head addressQuery 8 9 //map database to record type 10 let address' = Mapper.Map<Address>(address) 11

And I got a fail:

Source value:

SqlEntityConnection1.HomeAddress —> System.ArgumentException: Type needs to have a constructor with 0 args or only optional args

Parameter name: type

So I added [<CLIMutable>] to the record types like so

1 [<CLIMutable>] 2 type Pet = {Id:int; GivenName:string} 3 [<CLIMutable>] 4 type Child = {Id:int; FirstName:string; Gender:string; Grade:int; Pets: Pet list} 5 [<CLIMutable>] 6 type Address = {Id:int; State:string; County:string; City:string} 7 [<CLIMutable>] 8 type Parent = {Id:int; FirstName:string} 9 [<CLIMutable>] 10 type Family = {Id:int; Parents:Parent list; Children: Child list; Address:Address} 11

And I get the expected results


With one thing kinda interesting.  The State is null because it is defined as “StateCode” on the server and “State” in the domain.  Autopmapper is customizable to allow field name differences so that was a small issue.  Feeling confident, I went ahead and created maps to all of the domain types and pulled down a complex type from the database

1 //AutoMapper setup 2 Mapper.CreateMap<EntityConnection.ServiceTypes.Pet, Pet>() 3 Mapper.CreateMap<EntityConnection.ServiceTypes.Child, Child>() 4 Mapper.CreateMap<EntityConnection.ServiceTypes.HomeAddress, Address>() 5 Mapper.CreateMap<EntityConnection.ServiceTypes.Parent, Parent>() 6 Mapper.CreateMap<EntityConnection.ServiceTypes.Family, Family>() 7 8 //Get Family from the database 9 let context = EntityConnection.GetDataContext() 10 let familyQuery = query {for family in context.Families do select family} 11 let family = Seq.head familyQuery 12 13 //map database to record type 14 let family' = Mapper.Map<Family>(family)

When I attempted to map it, I got a pretty ugly exception

Source value:


   at AutoMapper.MappingEngine.AutoMapper.IMappingEngineRunner.Map(ResolutionContext context)

So the problem is that automapper is not picking up on the foreign keys, which means I have to write the associations by hand.  Ugh!  I then tried to auto map to F# choice types like this:

1 type Gender = Male | Female

No dice.


I quickly spun up another project that uses System.Reflection to map the types.

1 #r "System.Data.Entity.dll" 2 #r "FSharp.Data.TypeProviders.dll" 3 4 open System.Reflection 5 open System.Data.Entity 6 open Microsoft.FSharp.Data.TypeProviders 7 8 let connectionString = "Server=.;Database=FamilyDomain;Trusted_Connection=True;" 9 10 type entityConnection = SqlEntityConnection<ConnectionString = "Server=.;Database=FamilyDomain;Trusted_Connection=True;"> 11 12 let context = entityConnection.GetDataContext() 13 14 //Local Idomatic Types 15 [<CLIMutable>] 16 type Pet = {Id:int; ChildId:int; GivenName:string} 17 [<CLIMutable>] 18 type Child = {Id:int; FirstName:string; Gender:string; Grade:int; Pets: Pet list} 19 [<CLIMutable>] 20 type Address = {Id:int; State:string; County:string; City:string} 21 [<CLIMutable>] 22 type Parent = {Id:int; FirstName:string} 23 [<CLIMutable>] 24 type Family = {Id:int; LastName:string; Parents:Parent list; Children: Child list; Address:Address} 25 26 //Reflection 27 let AssignMatchingPropertyValues sourceObject targetObject = 28 let sourceType = sourceObject.GetType() 29 let targetType = targetObject.GetType() 30 let sourcePropertyInfos = sourceType.GetProperties(BindingFlags.Public ||| BindingFlags.Instance) 31 sourcePropertyInfos 32 |> spi -> spi, targetObject.GetType().GetProperty(spi.Name)) 33 |> Seq.iter(fun (spi,tpi) -> tpi.SetValue(targetObject, spi.GetValue(sourceObject,null),null)) 34 targetObject 35 36 37 let newEfPet = entityConnection.ServiceTypes.Pet() 38 let newPet = {Id=0;ChildId=1;GivenName="Duke"} 39 40 AssignMatchingPropertyValues newPet newEfPet 41 42 context.DataContext.AddObject("Pet",newEfPet) 43 context.DataContext.SaveChanges()

Sure enough, reflection does what it is supposed to do:


The problem quickly becomes that by using reflection, I have to hand roll all of the relations.  I might as well use Automapper (though apparently reflection is much faster than Automapper, even on a per-call basis).

Another problem with using reflection is that the field names in the database need to match the domain naming exactly.  Finally, like automapper, there is not out of the box way to map choice types

Hand Roll

On my last stop of the entity framework code-last hit parade, I looked at what it would take to roll my own mappings.  This has the greatest amount of yak shaving because I would have to spin up mapping from the domain and to the domain.  The nice thing is that with that kind of detail, naming mismatches can be handled and the nested hierarchy and choice types are accounted for.  I first started with a basic script that handled the gettting and setting as well as nested types:

1 #r "System.Data.Entity.dll" 2 #r "FSharp.Data.TypeProviders.dll" 3 4 open System.Linq 5 open System.Data.Entity 6 open Microsoft.FSharp.Data.TypeProviders 7 8 let connectionString = "Server=.;Database=FamilyDomain;Trusted_Connection=True;" 9 type entity = SqlEntityConnection<ConnectionString = "Server=.;Database=FamilyDomain;Trusted_Connection=True;"> 10 let context = entity.GetDataContext() 11 12 type Pet = {Id:int; ChildId: int; GivenName:string} 13 type Child = {Id:int; FirstName:string; Gender:string; Grade:int; Pets: Pet list} 14 type Address = {Id:int; State:string; County:string; City:string} 15 type Parent = {Id:int; FirstName:string} 16 type Family = {Id:int; LastName:string; Parents:Parent list; Children: Child list; Address:Address} 17 18 let MapPet(efPet: entity.ServiceTypes.Pet) = 19 {Id=efPet.Id; ChildId=efPet.ChildId; GivenName=efPet.GivenName} 20 21 let MapChild(efChild: entity.ServiceTypes.Child) = 22 let pets = efChild.Pet |> p -> MapPet(p)) 23 |> Seq.toList 24 {Id=efChild.Id; FirstName=efChild.FirstName; 25 Gender=efChild.Gender;Grade=efChild.Grade;Pets=pets} 26 27 let GetPet(id: int)= 28 let efPet = context.Pet.FirstOrDefault(fun p -> p.Id = id) 29 MapPet(efPet) 30 31 let GetChild(id: int)= 32 let efChild = context.Child.FirstOrDefault(fun c -> c.Id = id) 33 MapChild(efChild) 34 35 let myPet = GetPet(1) 36 37 let myChild = GetChild(1) 38

Of all of the implementations, the hand-rolled actually made the most sense to me.  it was clean and, most importantly, it worked.


I then swapped out a Choice type for gender (was a string)

1 type Gender = Male | Female 2 type Pet = {Id:int; ChildId: int; GivenName:string} 3 type Child = {Id:int; FirstName:string; Gender:Gender; Grade:int; Pets: Pet list} 4 type Address = {Id:int; State:string; County:string; City:string} 5 type Parent = {Id:int; FirstName:string} 6 type Family = {Id:int; LastName:string; Parents:Parent list; Children: Child list; Address:Address} 7

And then added the choice type mapping and then updated child mapping

1 let MapGender(efGender) = 2 match efGender with 3 | "Male" -> Male 4 | _ -> Female 5 6 let MapChild(efChild: entity.ServiceTypes.Child) = 7 let pets = efChild.Pet |> p -> MapPet(p)) 8 |> Seq.toList 9 {Id=efChild.Id; FirstName=efChild.FirstName; 10 Gender=MapGender(efChild.Gender); 11 Grade=efChild.Grade;Pets=pets} 12

Sure enough, it worked like a champ


And finally, I tested the add on both the happy path and an expected exception.

1 let SavePet(pet: Pet)= 2 let efPet = entity.ServiceTypes.Pet() 3 efPet.ChildId <- pet.ChildId 4 efPet.GivenName <- pet.GivenName 5 context.DataContext.AddObject("Pet",efPet) 6 context.DataContext.SaveChanges() 7 8 let newPet = {Id=0;ChildId=1;GivenName="Lucky Sue"} 9 SavePet(newPet) 10 11 let failurePet = {Id=0;ChildId=0;GivenName="Should Fail"} 12 SavePet(failurePet)

  Both worked as expected.  Here is the exception case where there is not a child to be associated to a pet:

System.Data.UpdateException: An error occurred while updating the entries. See the inner exception for details. —> System.Data.SqlClient.SqlException: The INSERT statement conflicted with the FOREIGN KEY constraint "fk_Pet_Child". The conflict occurred in database "FamilyDomain", table "dbo.Child", column ‘Id’.

The statement has been terminated.

   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

So of all three ways, hand-rolling worked the best for me.

F# Record Types With Entity Framework Code-First

I was spinning up a data layer in a new FSharp project and I thought I would take EF Code-first out for a test drive.  I have use EF-CF in a couple of C# projects so I am familiar with the premise (and the promise) of code-first.  The FSharp project uses record types, nested record types, and choice types exclusively so I I thought of attaching each of these types for code first in turn.  The first article that I ran across was this one, which seemed like a good start.  I went ahead a created a family record type like so, matching the example verbatim except I swapped out the class implementation with a record type:


1 #r "../packages/EntityFramework.6.1.2/lib/net45/EntityFramework.dll" 2 3 open System.Collections.Generic 4 open System.ComponentModel.DataAnnotations 5 open System.Data.Entity 6 7 type Family = {Id:int; LastName:string; IsRegistered:bool} 8 9 type CLFamily() = 10 inherit DbContext() 11 [<DefaultValue>] 12 val mutable m_families: DbSet<Family> 13 member public this.Families with get() = this.m_families 14 and set v = this.m_families <- v 15 16 let db = new CLFamily() 17 let family = {Id=0;LastName="New Family"; IsRegistered=true} 18 db.Families.Add(family) |> ignore 19 db.SaveChanges() |> ignore 20

But I ran into this:



So I added the Key attribute to the Record type


So I hit up stack overflow with this question and sure enough, I forgot to add a reference to that assembly.  Once I added it, then it compiled.  I then ran the script and I got the following error message:

1 <add name="CLFamily" 2 connectionString="Server=.;Database=FamilyDomain;Trusted_Connection=True;" 3 providerName="System.Data.SqlClient"/> 4



Ugh!  It was still hitting the default connection string.    I went ahead and adjusted my script to account for the connection string and I swapped out the backing values with CLIMutable:

1 #r "../packages/EntityFramework.6.1.2/lib/net45/EntityFramework.dll" 2 #r "C:/Program Files (x86)/Reference Assemblies/Microsoft/Framework/.NETFramework/v4.5.1/System.ComponentModel.DataAnnotations.dll" 3 4 open System.Collections.Generic 5 open System.ComponentModel.DataAnnotations 6 open System.Data.Entity 7 8 [<CLIMutable>] 9 type Family = {[<Key>]Id:int; LastName:string; IsRegistered:bool;} 10 11 12 type FamilyContext() = 13 inherit DbContext() 14 [<DefaultValue>] val mutable families: DbSet<Family> 15 member this.Families with get() = this.families and set f = this.families <- f 16 17 let context = new FamilyContext() 18 let connectionString = "Server=.;Database=FamilyDomain;Trusted_Connection=True;" 19 context.Database.Connection.ConnectionString <- connectionString 20 let family = {Id=0; LastName="Test"; IsRegistered=true} 21 context.Families.Add(family) |> ignore 22 context.SaveChanges() |> ignore

And sure enough, the table is created in the database and the record is persisted:

image image

And the cool thing is that even though this is a record type, the Id does adjust to the identity value given by the database.

With that out of the way, I went to tackle nested types.  I added a Child class and a list of children to the family class. 

1 [<CLIMutable>] 2 type Child = {[<Key>]Id:int; FamilyId: int; FirstName:string; Gender:string; Grade:int} 3 4 [<CLIMutable>] 5 type Family = {[<Key>]Id:int; LastName:string; IsRegistered:bool; Children:Child list} 6 7 type FamilyContext() = 8 inherit DbContext() 9 [<DefaultValue>] val mutable families: DbSet<Family> 10 member this.Families with get() = this.families and set f = this.families <- f 11 [<DefaultValue>] val mutable children: DbSet<Child> 12 member this.Chidlren with get() = this.children and set c = this.children <- c 13 14 let context = new FamilyContext() 15 let connectionString = "Server=.;Database=FamilyDomain;Trusted_Connection=True;" 16 context.Database.Connection.ConnectionString <- connectionString 17 let children = [{Id=0; FamilyId=0; FirstName="Test"; Gender="Male"; Grade=5}] 18 let family = {Id=0; LastName="Test"; IsRegistered=true; Children=children } 19 context.Families.Add(family) |> ignore 20 context.SaveChanges() |> ignore

Everything compiled and  ran, but the Children table was not added to the database –> though the new record was added.

image image image

Going back to stack overflow, it looks like EF Code First will not auto-update the schema unless you add some more glue code.  Ugh.  At that point, I might as well give up on code-first if all it brings is not having to write sql scripts…

Implementing (Parts Of) ASP.NET Identity Using F#

I started working though this and this article for implementing security in a new Web Api 2 site I am spinning up.  Everything is working out OK – not great but better than prior ASP.NET implementations I have done.  I do think the ASP.NET team has made security better and I am really excited about token based security.  My biggest gripe is that there is still too much magic going on and it is still hard to introduce non-out of the box implementations.  For example, the sample code that you can add via Nuget has a placeholder for en email provider.  The article has the code for a specific implementation of a company called SendGrid (the sample is here and the email is here).
The CSharp implementation looks like this (I did add some constructor injection b/c I am opposed to touching the Configuration (or any part of the file system for that matter) outside of Main on sanity grounds):
1 public class SendGridEmailProvider: IIdentityMessageService 2 { 3 String _mailAccount = String.Empty; 4 String _mailPassword = String.Empty; 5 String _fromAddress = String.Empty; 6 7 public SendGridEmailProvider(String mailAccount, String mailPassword, String fromAddress) 8 { 9 _mailAccount = mailAccount; 10 _mailPassword = mailPassword; 11 _fromAddress = fromAddress 12 } 13 14 public System.Threading.Tasks.Task SendAsync(IdentityMessage message) 15 { 16 var sendGridMessage = new SendGridMessage(); 17 sendGridMessage.From = new MailAddress(_fromAddress); 18 List<String> recipients = new List<string>(); 19 recipients.Add(message.Destination); 20 sendGridMessage.AddTo(recipients); 21 sendGridMessage.Subject = message.Subject; 22 sendGridMessage.Html = message.Body; 23 sendGridMessage.Text = message.Body; 24 25 var credentials = new NetworkCredential(_mailAccount, _mailPassword); 26 var transportWeb = new Web(credentials); 27 if (transportWeb != null) 28 return transportWeb.DeliverAsync(sendGridMessage); 29 else 30 return Task.FromResult(0); 31 } 32 }

I then decided to implement a SMS Text Provider along the same lines.  The 1st company I came to was CDyne but when I went to their sample code, they are still using a SOAP-based service and the last thing I wanted to do was to clutter up my project with all of the WSDL and files that you needs when consuming a service like that.
I then thought, this is stupid.  I might was well use FSharp type providers to do the implementation.  Less Code, less files, less clutter, more goodness.  I first swapped out the Email to a FSharp implementation:
1 type SendGridEmailService(account:string, password:string, fromAddress:string) = 2 interface IIdentityMessageService with 3 member this.SendAsync(identityMessage) = 4 let sendGridMessage = new SendGridMessage() 5 sendGridMessage.From = new MailAddress(fromAddress) |> ignore 6 let recipients = new List<string>() 7 recipients.Add(identityMessage.Destination) 8 sendGridMessage.AddTo(recipients) 9 sendGridMessage.Subject <- identityMessage.Subject 10 sendGridMessage.Html <- identityMessage.Body 11 sendGridMessage.Text <- identityMessage.Body 12 13 let credentials = new NetworkCredential(account, password) 14 let transportWeb = new Web(credentials) 15 match transportWeb with 16 | null -> Task.FromResult(0):> Task 17 | _ -> transportWeb.DeliverAsync(sendGridMessage)

I then did a SMS text provider using type providers.
1 type cDyneService = Microsoft.FSharp.Data.TypeProviders.WsdlService<""> 2 3 type CDyneSMSService(licenseKey:Guid) = 4 interface IIdentityMessageService with 5 member this.SendAsync(identityMessage) = 6 let cDyneClient = cDyneService.Getsms2SOAPbasicHttpBinding 7 let client = cDyneService.Getsms2SOAPbasicHttpBinding() 8 match client with 9 | null -> Task.FromResult(0):> Task 10 | _ -> client.SimpleSMSsendAsync(identityMessage.Destination,identityMessage.Body,licenseKey):> Task

Compared to a CSharp implementation, this is joyous.  Less noise, more signal.  And thanks to Lee on Stack Overflow for helping with the upcast of Task…

Consuming Azure ML web api endpoint from an array

Last week, I blogged about creating an Azure ML experiment, publishing it as a web service, and then consuming it from F#.  I then wanted to consume the web service using an array – passing in several values and seeing the results.  I created added on to my existing F #script with the following code

1 let input1 = new Dictionary<string,string>() 2 input1.Add("Zip Code","27519") 3 input1.Add("Race","W") 4 input1.Add("Party","UNA") 5 input1.Add("Gender","M") 6 input1.Add("Age","45") 7 input1.Add("Voted Ind","1") 8 9 let input2 = new Dictionary<string,string>() 10 input2.Add("Zip Code","27519") 11 input2.Add("Race","W") 12 input2.Add("Party","D") 13 input2.Add("Gender","F") 14 input2.Add("Age","47") 15 input2.Add("Voted Ind","1") 16 17 let inputs = new List<Dictionary<string,string>>() 18 inputs.Add(input1) 19 inputs.Add(input2) 20 21 inputs 22 |> i -> invokeService(i)) 23 |> Async.Parallel 24 |> Async.RunSynchronously 25

And sure enough, I can run the model using multiple inputs: