Trigrams and F#
February 18, 2014 2 Comments
Rob Seder wrote a great post of trigrams last week. He then asked me how the same functionality would be implemented in F# – specifically dropping the for..each. Challenge accepted!.
The first thing I did was hit Stack Overflow to see if there is a built in function to parse a string by groups and I had a answer within minutes for exactly what I was looking for (thanks MattNewport).
So to match Rob’s BuildTrigram function, I wrote this:
- type TrigramBuilder() =
- member this.BuildTrigrams(inputString: string) =
- inputString
- |> Seq.windowed 3
- |> Seq.map(fun a -> System.String a)
- |> Seq.toArray
And I had a covering unit test already created:
- [TestMethod]
- public void GetTrigrams_ReturnsExpectedValue()
- {
- var builder = new TrigramBuilder();
- String inputString = "ABCDEFG";
- String[] expected = new String[] { "ABC", "BCD", "CDE", "DEF", "EFG" };
- String[] actual = builder.BuildTrigrams(inputString);
- CollectionAssert.AreEqual(expected, actual);
- }
I then Implemented a function that matches his double loops (can’t tell the function name from the code snippet on the blog post):
- member this.GetMatchPercent(baseString: string, compareString: string) =
- let trigrams = this.BuildTrigrams(compareString)
- let matchCount = trigrams
- |> Seq.map(fun t -> match baseString.Contains(t) with
- | true -> 1
- | false -> 0)
- |> Seq.sum
- let totalCount = trigrams.Length
- float matchCount/float totalCount
And throwing in some covering unit tests:
- public void GetMatchPercentageOfExactMatch_ReturnsExpectedValue()
- {
- var builder = new TrigramBuilder();
- String baseString = "ABCDEF";
- String compareString = "ABCDEF";
- double expected = 1.0;
- double actual = builder.GetMatchPercent(baseString, compareString);
- Assert.AreEqual(expected, actual);
- }
- [TestMethod]
- public void GetMatchPercentageOf50PercentMatch_ReturnsExpectedValue()
- {
- var builder = new TrigramBuilder();
- String baseString = "ABCD";
- String compareString = "ABCDEF";
- double expected = 0.5;
- double actual = builder.GetMatchPercent(baseString, compareString);
- Assert.AreEqual(expected, actual);
- }
Sure enough, green across the board:
Pingback: F# Weekly #8, 2014 | Sergey Tihon's Blog
Nice. I haven’t come across Trigrams before. Perfect kind of problem for F#, I’d say!
Here’s a handy trick for calculating the accuracy with a couple less lines:
trigrams
|> Seq.averageBy (fun t -> match baseString.Contains(t) with
| true -> 1.0
| false -> 0.0)