Trigrams and F#

Rob Seder wrote a great post of trigrams last week.  He then asked me how the same functionality would be implemented in F# – specifically dropping the for..each.  Challenge accepted!.

The first thing I did was hit Stack Overflow to see if there is a built in function to parse a string by groups and  I had a answer within minutes for exactly what I was looking for (thanks MattNewport).

So to match Rob’s BuildTrigram function, I wrote this:

  1. type TrigramBuilder() =
  2.     member this.BuildTrigrams(inputString: string) =
  3.         inputString
  4.             |> Seq.windowed 3
  5.             |> Seq.map(fun a -> System.String a)
  6.             |> Seq.toArray

And I had a covering unit test already created:

  1. [TestMethod]
  2. public void GetTrigrams_ReturnsExpectedValue()
  3. {
  4.     var builder = new TrigramBuilder();
  5.     String inputString = "ABCDEFG";
  6.  
  7.     String[] expected = new String[] { "ABC", "BCD", "CDE", "DEF", "EFG" };
  8.     String[] actual = builder.BuildTrigrams(inputString);
  9.  
  10.     CollectionAssert.AreEqual(expected, actual);
  11. }

I then Implemented a function that matches his double loops (can’t tell the function name from the code snippet on the blog post):

  1. member this.GetMatchPercent(baseString: string, compareString: string) =
  2.     let trigrams = this.BuildTrigrams(compareString)
  3.     let matchCount = trigrams
  4.                         |> Seq.map(fun t -> match baseString.Contains(t) with
  5.                                                 | true -> 1
  6.                                                 | false -> 0)
  7.                         |> Seq.sum
  8.     let totalCount = trigrams.Length
  9.     float matchCount/float totalCount

And throwing in some covering unit tests:

  1. public void GetMatchPercentageOfExactMatch_ReturnsExpectedValue()
  2. {
  3.     var builder = new TrigramBuilder();
  4.     
  5.     String baseString = "ABCDEF";
  6.     String compareString = "ABCDEF";
  7.  
  8.     double expected = 1.0;
  9.     double actual = builder.GetMatchPercent(baseString, compareString);
  10.  
  11.     Assert.AreEqual(expected, actual);
  12. }
  13.  
  14. [TestMethod]
  15. public void GetMatchPercentageOf50PercentMatch_ReturnsExpectedValue()
  16. {
  17.     var builder = new TrigramBuilder();
  18.  
  19.     String baseString = "ABCD";
  20.     String compareString = "ABCDEF";
  21.  
  22.     double expected = 0.5;
  23.     double actual = builder.GetMatchPercent(baseString, compareString);
  24.  
  25.     Assert.AreEqual(expected, actual);
  26. }

Sure enough, green across the board:

image

 

2 Responses to Trigrams and F#

  1. Pingback: F# Weekly #8, 2014 | Sergey Tihon's Blog

  2. Grant Crofton says:

    Nice. I haven’t come across Trigrams before. Perfect kind of problem for F#, I’d say!

    Here’s a handy trick for calculating the accuracy with a couple less lines:

    trigrams
    |> Seq.averageBy (fun t -> match baseString.Contains(t) with
    | true -> 1.0
    | false -> 0.0)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: