Association Rule Learning Via F# (Part 2)

Continuing on the path of re-writing the association rule learning code found in this month’s MSDN, I started with the next function on list:

MakeAntecedent

Here is the original code C#

  1. public static int[] MakeAntecedent(int[] itemSet, int[] comb)
  2. {
  3.   // if item-set = (1 3 4 6 8) and combination = (0 2)
  4.   // then antecedent = (1 4)
  5.   int[] result = new int[comb.Length];
  6.   for (int i = 0; i < comb.Length; ++i)
  7.   {
  8.     int idx = comb[i];
  9.     result[i] = itemSet[idx];
  10.   }
  11.   return result;
  12. }

 

and the F# code:

  1. static member MakeAntecedent(itemSet:int[] , comb:int[]) =
  2.     comb |> Array.map(fun x -> itemSet.[x])

 

It is much easier to figure out what is going on via the F# code.  The function takes in 2 arrays.  Array #1 has values, Array #2 has the index of array #1 that is needed.  Using the Array.Map, I return an array where the index number is swapped out to the actual value.  The unit tests run green:

  1. [TestMethod]
  2. public void MakeAntecedentCSUsingExample_ReturnsExpectedValue()
  3. {
  4.     int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
  5.     int[] combo = new int[2] { 0, 2 };
  6.     int[] expected = new int[2] { 1, 4 };
  7.     var actual = CS.AssociationRuleProgram.MakeAntecedent(itemSet, combo);
  8.     Assert.AreEqual(expected.Length, actual.Length);
  9.     Assert.AreEqual(expected[0], actual[0]);
  10.     Assert.AreEqual(expected[1], actual[1]);
  11. }
  12.  
  13. [TestMethod]
  14. public void MakeAntecedentFSUsingExample_ReturnsExpectedValue()
  15. {
  16.     int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
  17.     int[] combo = new int[2] { 0, 2 };
  18.     int[] expected = new int[2] { 1, 4 };
  19.     var actual = FS.AssociationRuleProgram.MakeAntecedent(itemSet, combo);
  20.     Assert.AreEqual(expected.Length, actual.Length);
  21.     Assert.AreEqual(expected[0], actual[0]);
  22.     Assert.AreEqual(expected[1], actual[1]);
  23. }

 

MakeConsequent

Here is the original C# code:

  1. public static int[] MakeConsequent(int[] itemSet, int[] comb)
  2. {
  3.   // if item-set = (1 3 4 6 8) and combination = (0 2)
  4.   // then consequent = (3 6 8)
  5.   int[] result = new int[itemSet.Length – comb.Length];
  6.   int j = 0; // ptr into combination
  7.   int p = 0; // ptr into result
  8.   for (int i = 0; i < itemSet.Length; ++i)
  9.   {
  10.     if (j < comb.Length && i == comb[j]) // we are at an antecedent
  11.       ++j; // so continue
  12.     else
  13.       result[p++] = itemSet[i]; // at a consequent so add it
  14.   }
  15.   return result;
  16. }

 

Here is the F# Code:

  1. static member MakeConsequent(itemSet:int[] , comb:int[])=   
  2.     let isNotInComb x = not(Array.exists(fun elem -> elem = x) comb)
  3.     itemSet
  4.         |> Array.mapi(fun indexer value -> value,indexer )
  5.         |> Array.filter(fun (value,indexer) -> isNotInComb indexer)
  6.         |> Array.map(fun x -> fst x)

 

Again, it is easier to look at the F# code to figure out what is going on.  In this case, we have to take all of the items in the first array that are not in the second array.  The trick is that the second array does not contain values to be checked, rather the index position.  If you add the Antecedent and the Consequent, you have the total original array.

This code code me a bit of time to figure out because I kept trying to use the out of the box Array features (including slicing) for F# when it hit me that it would be much easier to create a tuple from the original array –> the value and the index.  I would then look up the index in the second array and confirm it is not there and then filter the ones that are not there.  The map function at the end removes the index part of the tuple because it is not needed anymore.

Sure enough, my unit tests ran green:

  1. [TestMethod]
  2. public void MakeConsequentCSUsingExample_ReturnsExpectedValue()
  3. {
  4.     int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
  5.     int[] combo = new int[2] { 0, 2 };
  6.     int[] expected = new int[3] { 3, 6, 8 };
  7.     var actual = CS.AssociationRuleProgram.MakeConsequent(itemSet, combo);
  8.     Assert.AreEqual(expected.Length, actual.Length);
  9.     Assert.AreEqual(expected[0], actual[0]);
  10.     Assert.AreEqual(expected[1], actual[1]);
  11.     Assert.AreEqual(expected[2], actual[2]);
  12. }
  13.  
  14. [TestMethod]
  15. public void MakeConsequentFSUsingExample_ReturnsExpectedValue()
  16. {
  17.     int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
  18.     int[] combo = new int[2] { 0, 2 };
  19.     int[] expected = new int[3] { 3, 6, 8 };
  20.     var actual = FS.AssociationRuleProgram.MakeConsequent(itemSet, combo);
  21.     Assert.AreEqual(expected.Length, actual.Length);
  22.     Assert.AreEqual(expected[0], actual[0]);
  23.     Assert.AreEqual(expected[1], actual[1]);
  24.     Assert.AreEqual(expected[2], actual[2]);
  25. }

 

IndexOf

I then decided to tackle the remaining three functions in reverse because they depend on each other (CountInTrans –> IsSubSetOf –> IndexOf).  IndexOf did not have any code comments of example cases, but the C# code is clear

  1. public static int IndexOf(int[] array, int item, int startIdx)
  2. {
  3.   for (int i = startIdx; i < array.Length; ++i)
  4.   {
  5.     if (i > item) return -1; // i is past where the target could possibly be
  6.     if (array[i] == item) return i;
  7.   }
  8.   return -1;
  9. }

 

What is even clearer is the F# code that does the same thing (yes, I am happy that FindIndex returns a –1 when not found and so did McCaffey):

  1. static member IndexOf(array:int[] , item:int, startIdx:int) =
  2.     Array.FindIndex(array, fun x -> x=item)

 

And I built some unit tests that run green that I think reflect McCaffey’s intent:

  1. [TestMethod]
  2. public void IndexOfCSUsingExample_ReturnsExpectedValue()
  3. {
  4.     int[] itemSet = new int[4] { 0, 1, 4, 5 };
  5.     Int32 item = 1;
  6.     Int32 startIndx = 1;
  7.  
  8.     int expected = 1;
  9.     int actual = CS.AssociationRuleProgram.IndexOf(itemSet, item, startIndx);
  10.  
  11.     Assert.AreEqual(expected, actual);
  12. }
  13. public void IndexOfFSUsingExample_ReturnsExpectedValue()
  14. {
  15.     int[] itemSet = new int[4] { 0, 1, 4, 5 };
  16.     Int32 item = 1;
  17.     Int32 startIndx = 1;
  18.  
  19.     int expected = 1;
  20.     int actual = FS.AssociationRuleProgram.IndexOf(itemSet, item, startIndx);
  21.  
  22.     Assert.AreEqual(expected, actual);
  23. }

 

IsSubsetOf

In the C# implementation, IndexOf is called to keep track of where the search is currently pointed. 

  1. public static bool IsSubsetOf(int[] itemSet, int[] trans)
  2. {
  3.   // 'trans' is an ordered transaction like [0 1 4 5 8]
  4.   int foundIdx = -1;
  5.   for (int j = 0; j < itemSet.Length; ++j)
  6.   {
  7.     foundIdx = IndexOf(trans, itemSet[j], foundIdx + 1);
  8.     if (foundIdx == -1) return false;
  9.   }
  10.   return true;
  11. }

In The F# one, that is not needed:

  1. static member IsSubsetOf(itemSet:int[] , trans:int[]) =
  2.     let isInTrans x = (Array.exists(fun elem -> elem = x) trans)
  3.     let filteredItemSet = itemSet
  4.                             |> Array.map(fun value -> value, isInTrans value)
  5.                             |> Array.filter(fun (value, trans) -> trans = false)
  6.     if filteredItemSet.Length = 0 then true
  7.         else false

CountInTrans

Here is the original C# code uses the IsSubsetOf function.

  1. public static int CountInTrans(int[] itemSet, List<int[]> trans, Dictionary<int[], int> countDict)
  2. {
  3.    //number of times itemSet occurs in transactions, using a lookup dict
  4.  
  5.     if (countDict.ContainsKey(itemSet) == true)
  6.     return countDict[itemSet]; // use already computed count
  7.  
  8.   int ct = 0;
  9.   for (int i = 0; i < trans.Count; ++i)
  10.     if (IsSubsetOf(itemSet, trans[i]) == true)
  11.       ++ct;
  12.   countDict.Add(itemSet, ct);
  13.   return ct;
  14. }

And here is the F# Code that also uses that subfunction

  1. static member CountInTrans(itemSet: int[], trans: List<int[]>, countDict: Dictionary<int[], int>) =
  2.     let trans' = trans |> Seq.map(fun value -> value, AssociationRuleProgram.IsSubsetOf (itemSet,value))
  3.     trans' |> Seq.filter(fun item -> snd item = true)
  4.            |> Seq.length

 

 GetHighConfRules

With the subfunctions created and running green, I then tackled the point of the exercise –> GetHighConfRules.  The C# implementation is pretty verbose and there are lots of things happening: 

  1.     public static List<Rule> GetHighConfRules(List<int[]> freqItemSets, List<int[]> trans, double minConfidencePct)
  2.     {
  3.       // generate candidate rules from freqItemSets, save rules that meet min confidence against transactions
  4.       List<Rule> result = new List<Rule>();
  5.  
  6.       Dictionary<int[], int> itemSetCountDict = new Dictionary<int[], int>(); // count of item sets
  7.  
  8.       for (int i = 0; i < freqItemSets.Count; ++i) // each freq item-set generates multiple candidate rules
  9.       {
  10.         int[] currItemSet = freqItemSets[i]; // for clarity only
  11.         int ctItemSet = CountInTrans(currItemSet, trans, itemSetCountDict); // needed for each candidate rule
  12.         for (int len = 1; len <= currItemSet.Length – 1; ++len) // antecedent len = 1, 2, 3, . .
  13.         {
  14.           int[] c = NewCombination(len); // a mathematical combination
  15.  
  16.           while (c != null) // each combination makes a candidate rule
  17.           {
  18.             int[] ante = MakeAntecedent(currItemSet, c);
  19.             int[] cons = MakeConsequent(currItemSet, c); // could defer this until known if needed
  20.           
  21.             int ctAntecendent = CountInTrans(ante, trans, itemSetCountDict); // use lookup if possible
  22.             double confidence = (ctItemSet * 1.0) / ctAntecendent;
  23.  
  24.             if (confidence >= minConfidencePct) // we have a winner!
  25.             {
  26.               Rule r = new Rule(ante, cons, confidence);
  27.               result.Add(r); // if freq item-sets are distinct, no dup rules ever created
  28.             }
  29.             c = NextCombination(c, currItemSet.Length);
  30.           } // while each combination
  31.         } // len each possible antecedent for curr item-set
  32.       } // i each freq item-set
  33.  
  34.       return result;
  35.     } // GetHighConfRules

In the F# code, It decided to work inside out and get the rule for 1 itemset.  I think the code reads pretty clear where each step is laid out

  1. static member GetHighConfRules(freqItemSets:List<int[]>, trans:List<int[]>,  minConfidencePct:float) =
  2.     let returnValue = new List<Rule>()
  3.     freqItemSets
  4.         |> Seq.map (fun i -> i, AssociationRuleProgram.CountInTrans'(i,trans))
  5.         |> Seq.filter(fun (i,c) -> (float)c > minConfidencePct)
  6.         |> Seq.map(fun (i,mcp) -> i,mcp,AssociationRuleProgram.MakeAntecedent(i, trans.[0]))
  7.         |> Seq.map(fun (i,mcp,a) -> i,mcp, a, AssociationRuleProgram.MakeConsequent(i, trans.[0]))
  8.         |> Seq.iter(fun (i,mcp,a,c) -> returnValue.Add(new Rule(a,c,mcp)))
  9.     returnValue

I then attempted to put this block into a larger block (trans.[0]) but then I realized that I was going about this the wrong way.  Instead of using the C# code as a my base line, I need to approach the problem from a functional viewpoint.  That will be the subject of my blog next week…

 

 

 

 

 

 

 

 

 

 

 

 

One Response to Association Rule Learning Via F# (Part 2)

  1. Pingback: F# Weekly #21, 2014 | Sergey Tihon's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: