Association Rule Learning Via F# (Part 2)
May 20, 2014 1 Comment
Continuing on the path of re-writing the association rule learning code found in this month’s MSDN, I started with the next function on list:
MakeAntecedent
Here is the original code C#
- public static int[] MakeAntecedent(int[] itemSet, int[] comb)
- {
- // if item-set = (1 3 4 6 8) and combination = (0 2)
- // then antecedent = (1 4)
- int[] result = new int[comb.Length];
- for (int i = 0; i < comb.Length; ++i)
- {
- int idx = comb[i];
- result[i] = itemSet[idx];
- }
- return result;
- }
and the F# code:
- static member MakeAntecedent(itemSet:int[] , comb:int[]) =
- comb |> Array.map(fun x -> itemSet.[x])
It is much easier to figure out what is going on via the F# code. The function takes in 2 arrays. Array #1 has values, Array #2 has the index of array #1 that is needed. Using the Array.Map, I return an array where the index number is swapped out to the actual value. The unit tests run green:
- [TestMethod]
- public void MakeAntecedentCSUsingExample_ReturnsExpectedValue()
- {
- int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
- int[] combo = new int[2] { 0, 2 };
- int[] expected = new int[2] { 1, 4 };
- var actual = CS.AssociationRuleProgram.MakeAntecedent(itemSet, combo);
- Assert.AreEqual(expected.Length, actual.Length);
- Assert.AreEqual(expected[0], actual[0]);
- Assert.AreEqual(expected[1], actual[1]);
- }
- [TestMethod]
- public void MakeAntecedentFSUsingExample_ReturnsExpectedValue()
- {
- int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
- int[] combo = new int[2] { 0, 2 };
- int[] expected = new int[2] { 1, 4 };
- var actual = FS.AssociationRuleProgram.MakeAntecedent(itemSet, combo);
- Assert.AreEqual(expected.Length, actual.Length);
- Assert.AreEqual(expected[0], actual[0]);
- Assert.AreEqual(expected[1], actual[1]);
- }
MakeConsequent
Here is the original C# code:
- public static int[] MakeConsequent(int[] itemSet, int[] comb)
- {
- // if item-set = (1 3 4 6 8) and combination = (0 2)
- // then consequent = (3 6 8)
- int[] result = new int[itemSet.Length – comb.Length];
- int j = 0; // ptr into combination
- int p = 0; // ptr into result
- for (int i = 0; i < itemSet.Length; ++i)
- {
- if (j < comb.Length && i == comb[j]) // we are at an antecedent
- ++j; // so continue
- else
- result[p++] = itemSet[i]; // at a consequent so add it
- }
- return result;
- }
Here is the F# Code:
- static member MakeConsequent(itemSet:int[] , comb:int[])=
- let isNotInComb x = not(Array.exists(fun elem -> elem = x) comb)
- itemSet
- |> Array.mapi(fun indexer value -> value,indexer )
- |> Array.filter(fun (value,indexer) -> isNotInComb indexer)
- |> Array.map(fun x -> fst x)
Again, it is easier to look at the F# code to figure out what is going on. In this case, we have to take all of the items in the first array that are not in the second array. The trick is that the second array does not contain values to be checked, rather the index position. If you add the Antecedent and the Consequent, you have the total original array.
This code code me a bit of time to figure out because I kept trying to use the out of the box Array features (including slicing) for F# when it hit me that it would be much easier to create a tuple from the original array –> the value and the index. I would then look up the index in the second array and confirm it is not there and then filter the ones that are not there. The map function at the end removes the index part of the tuple because it is not needed anymore.
Sure enough, my unit tests ran green:
- [TestMethod]
- public void MakeConsequentCSUsingExample_ReturnsExpectedValue()
- {
- int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
- int[] combo = new int[2] { 0, 2 };
- int[] expected = new int[3] { 3, 6, 8 };
- var actual = CS.AssociationRuleProgram.MakeConsequent(itemSet, combo);
- Assert.AreEqual(expected.Length, actual.Length);
- Assert.AreEqual(expected[0], actual[0]);
- Assert.AreEqual(expected[1], actual[1]);
- Assert.AreEqual(expected[2], actual[2]);
- }
- [TestMethod]
- public void MakeConsequentFSUsingExample_ReturnsExpectedValue()
- {
- int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
- int[] combo = new int[2] { 0, 2 };
- int[] expected = new int[3] { 3, 6, 8 };
- var actual = FS.AssociationRuleProgram.MakeConsequent(itemSet, combo);
- Assert.AreEqual(expected.Length, actual.Length);
- Assert.AreEqual(expected[0], actual[0]);
- Assert.AreEqual(expected[1], actual[1]);
- Assert.AreEqual(expected[2], actual[2]);
- }
IndexOf
I then decided to tackle the remaining three functions in reverse because they depend on each other (CountInTrans –> IsSubSetOf –> IndexOf). IndexOf did not have any code comments of example cases, but the C# code is clear
- public static int IndexOf(int[] array, int item, int startIdx)
- {
- for (int i = startIdx; i < array.Length; ++i)
- {
- if (i > item) return -1; // i is past where the target could possibly be
- if (array[i] == item) return i;
- }
- return -1;
- }
What is even clearer is the F# code that does the same thing (yes, I am happy that FindIndex returns a –1 when not found and so did McCaffey):
- static member IndexOf(array:int[] , item:int, startIdx:int) =
- Array.FindIndex(array, fun x -> x=item)
And I built some unit tests that run green that I think reflect McCaffey’s intent:
- [TestMethod]
- public void IndexOfCSUsingExample_ReturnsExpectedValue()
- {
- int[] itemSet = new int[4] { 0, 1, 4, 5 };
- Int32 item = 1;
- Int32 startIndx = 1;
- int expected = 1;
- int actual = CS.AssociationRuleProgram.IndexOf(itemSet, item, startIndx);
- Assert.AreEqual(expected, actual);
- }
- public void IndexOfFSUsingExample_ReturnsExpectedValue()
- {
- int[] itemSet = new int[4] { 0, 1, 4, 5 };
- Int32 item = 1;
- Int32 startIndx = 1;
- int expected = 1;
- int actual = FS.AssociationRuleProgram.IndexOf(itemSet, item, startIndx);
- Assert.AreEqual(expected, actual);
- }
IsSubsetOf
In the C# implementation, IndexOf is called to keep track of where the search is currently pointed.
- public static bool IsSubsetOf(int[] itemSet, int[] trans)
- {
- // 'trans' is an ordered transaction like [0 1 4 5 8]
- int foundIdx = -1;
- for (int j = 0; j < itemSet.Length; ++j)
- {
- foundIdx = IndexOf(trans, itemSet[j], foundIdx + 1);
- if (foundIdx == -1) return false;
- }
- return true;
- }
In The F# one, that is not needed:
- static member IsSubsetOf(itemSet:int[] , trans:int[]) =
- let isInTrans x = (Array.exists(fun elem -> elem = x) trans)
- let filteredItemSet = itemSet
- |> Array.map(fun value -> value, isInTrans value)
- |> Array.filter(fun (value, trans) -> trans = false)
- if filteredItemSet.Length = 0 then true
- else false
CountInTrans
Here is the original C# code uses the IsSubsetOf function.
- public static int CountInTrans(int[] itemSet, List<int[]> trans, Dictionary<int[], int> countDict)
- {
- //number of times itemSet occurs in transactions, using a lookup dict
- if (countDict.ContainsKey(itemSet) == true)
- return countDict[itemSet]; // use already computed count
- int ct = 0;
- for (int i = 0; i < trans.Count; ++i)
- if (IsSubsetOf(itemSet, trans[i]) == true)
- ++ct;
- countDict.Add(itemSet, ct);
- return ct;
- }
And here is the F# Code that also uses that subfunction
- static member CountInTrans(itemSet: int[], trans: List<int[]>, countDict: Dictionary<int[], int>) =
- let trans' = trans |> Seq.map(fun value -> value, AssociationRuleProgram.IsSubsetOf (itemSet,value))
- trans' |> Seq.filter(fun item -> snd item = true)
- |> Seq.length
GetHighConfRules
With the subfunctions created and running green, I then tackled the point of the exercise –> GetHighConfRules. The C# implementation is pretty verbose and there are lots of things happening:
- public static List<Rule> GetHighConfRules(List<int[]> freqItemSets, List<int[]> trans, double minConfidencePct)
- {
- // generate candidate rules from freqItemSets, save rules that meet min confidence against transactions
- List<Rule> result = new List<Rule>();
- Dictionary<int[], int> itemSetCountDict = new Dictionary<int[], int>(); // count of item sets
- for (int i = 0; i < freqItemSets.Count; ++i) // each freq item-set generates multiple candidate rules
- {
- int[] currItemSet = freqItemSets[i]; // for clarity only
- int ctItemSet = CountInTrans(currItemSet, trans, itemSetCountDict); // needed for each candidate rule
- for (int len = 1; len <= currItemSet.Length – 1; ++len) // antecedent len = 1, 2, 3, . .
- {
- int[] c = NewCombination(len); // a mathematical combination
- while (c != null) // each combination makes a candidate rule
- {
- int[] ante = MakeAntecedent(currItemSet, c);
- int[] cons = MakeConsequent(currItemSet, c); // could defer this until known if needed
- int ctAntecendent = CountInTrans(ante, trans, itemSetCountDict); // use lookup if possible
- double confidence = (ctItemSet * 1.0) / ctAntecendent;
- if (confidence >= minConfidencePct) // we have a winner!
- {
- Rule r = new Rule(ante, cons, confidence);
- result.Add(r); // if freq item-sets are distinct, no dup rules ever created
- }
- c = NextCombination(c, currItemSet.Length);
- } // while each combination
- } // len each possible antecedent for curr item-set
- } // i each freq item-set
- return result;
- } // GetHighConfRules
In the F# code, It decided to work inside out and get the rule for 1 itemset. I think the code reads pretty clear where each step is laid out
- static member GetHighConfRules(freqItemSets:List<int[]>, trans:List<int[]>, minConfidencePct:float) =
- let returnValue = new List<Rule>()
- freqItemSets
- |> Seq.map (fun i -> i, AssociationRuleProgram.CountInTrans'(i,trans))
- |> Seq.filter(fun (i,c) -> (float)c > minConfidencePct)
- |> Seq.map(fun (i,mcp) -> i,mcp,AssociationRuleProgram.MakeAntecedent(i, trans.[0]))
- |> Seq.map(fun (i,mcp,a) -> i,mcp, a, AssociationRuleProgram.MakeConsequent(i, trans.[0]))
- |> Seq.iter(fun (i,mcp,a,c) -> returnValue.Add(new Rule(a,c,mcp)))
- returnValue
I then attempted to put this block into a larger block (trans.[0]) but then I realized that I was going about this the wrong way. Instead of using the C# code as a my base line, I need to approach the problem from a functional viewpoint. That will be the subject of my blog next week…
Pingback: F# Weekly #21, 2014 | Sergey Tihon's Blog