← Association Rule Learning Via F# (Part 1)

Association Rule Learning Via F# (Part 2)

May 20, 2014 1 Comment

Continuing on the path of re-writing the association rule learning code found in this month’s MSDN, I started with the next function on list:

MakeAntecedent

Here is the original code C#

public static int[] MakeAntecedent(int[] itemSet, int[] comb)
{
  // if item-set = (1 3 4 6 8) and combination = (0 2) 
  // then antecedent = (1 4)
  int[] result = new int[comb.Length];
  for (int i = 0; i < comb.Length; ++i)
  {
    int idx = comb[i];
    result[i] = itemSet[idx];
  }
  return result;
}

and the F# code:

static member MakeAntecedent(itemSet:int[] , comb:int[]) =
    comb |> Array.map(fun x -> itemSet.[x])

It is much easier to figure out what is going on via the F# code. The function takes in 2 arrays. Array #1 has values, Array #2 has the index of array #1 that is needed. Using the Array.Map, I return an array where the index number is swapped out to the actual value. The unit tests run green:

[TestMethod]
public void MakeAntecedentCSUsingExample_ReturnsExpectedValue()
{
    int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
    int[] combo = new int[2] { 0, 2 };
    int[] expected = new int[2] { 1, 4 };
    var actual = CS.AssociationRuleProgram.MakeAntecedent(itemSet, combo);
    Assert.AreEqual(expected.Length, actual.Length);
    Assert.AreEqual(expected[0], actual[0]);
    Assert.AreEqual(expected[1], actual[1]);
}
 
[TestMethod]
public void MakeAntecedentFSUsingExample_ReturnsExpectedValue()
{
    int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
    int[] combo = new int[2] { 0, 2 };
    int[] expected = new int[2] { 1, 4 };
    var actual = FS.AssociationRuleProgram.MakeAntecedent(itemSet, combo);
    Assert.AreEqual(expected.Length, actual.Length);
    Assert.AreEqual(expected[0], actual[0]);
    Assert.AreEqual(expected[1], actual[1]);
}

MakeConsequent

Here is the original C# code:

public static int[] MakeConsequent(int[] itemSet, int[] comb)
{
  // if item-set = (1 3 4 6 8) and combination = (0 2) 
  // then consequent = (3 6 8)
  int[] result = new int[itemSet.Length – comb.Length];
  int j = 0; // ptr into combination
  int p = 0; // ptr into result
  for (int i = 0; i < itemSet.Length; ++i)
  {
    if (j < comb.Length && i == comb[j]) // we are at an antecedent
      ++j; // so continue
    else
      result[p++] = itemSet[i]; // at a consequent so add it
  }
  return result;
}

Here is the F# Code:

static member MakeConsequent(itemSet:int[] , comb:int[])=   
    let isNotInComb x = not(Array.exists(fun elem -> elem = x) comb)
    itemSet 
        |> Array.mapi(fun indexer value -> value,indexer )
        |> Array.filter(fun (value,indexer) -> isNotInComb indexer)
        |> Array.map(fun x -> fst x)

Again, it is easier to look at the F# code to figure out what is going on. In this case, we have to take all of the items in the first array that are not in the second array. The trick is that the second array does not contain values to be checked, rather the index position. If you add the Antecedent and the Consequent, you have the total original array.

This code code me a bit of time to figure out because I kept trying to use the out of the box Array features (including slicing) for F# when it hit me that it would be much easier to create a tuple from the original array –> the value and the index. I would then look up the index in the second array and confirm it is not there and then filter the ones that are not there. The map function at the end removes the index part of the tuple because it is not needed anymore.

Sure enough, my unit tests ran green:

[TestMethod]
public void MakeConsequentCSUsingExample_ReturnsExpectedValue()
{
    int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
    int[] combo = new int[2] { 0, 2 };
    int[] expected = new int[3] { 3, 6, 8 };
    var actual = CS.AssociationRuleProgram.MakeConsequent(itemSet, combo);
    Assert.AreEqual(expected.Length, actual.Length);
    Assert.AreEqual(expected[0], actual[0]);
    Assert.AreEqual(expected[1], actual[1]);
    Assert.AreEqual(expected[2], actual[2]);
}
 
[TestMethod]
public void MakeConsequentFSUsingExample_ReturnsExpectedValue()
{
    int[] itemSet = new int[5] { 1, 3, 4, 6, 8 };
    int[] combo = new int[2] { 0, 2 };
    int[] expected = new int[3] { 3, 6, 8 };
    var actual = FS.AssociationRuleProgram.MakeConsequent(itemSet, combo);
    Assert.AreEqual(expected.Length, actual.Length);
    Assert.AreEqual(expected[0], actual[0]);
    Assert.AreEqual(expected[1], actual[1]);
    Assert.AreEqual(expected[2], actual[2]);
}

IndexOf

I then decided to tackle the remaining three functions in reverse because they depend on each other (CountInTrans –> IsSubSetOf –> IndexOf). IndexOf did not have any code comments of example cases, but the C# code is clear

public static int IndexOf(int[] array, int item, int startIdx)
{
  for (int i = startIdx; i < array.Length; ++i)
  {
    if (i > item) return -1; // i is past where the target could possibly be
    if (array[i] == item) return i;
  }
  return -1;
}

What is even clearer is the F# code that does the same thing (yes, I am happy that FindIndex returns a –1 when not found and so did McCaffey):

static member IndexOf(array:int[] , item:int, startIdx:int) =
    Array.FindIndex(array, fun x -> x=item)

And I built some unit tests that run green that I think reflect McCaffey’s intent:

[TestMethod]
public void IndexOfCSUsingExample_ReturnsExpectedValue()
{
    int[] itemSet = new int[4] { 0, 1, 4, 5 };
    Int32 item = 1;
    Int32 startIndx = 1;
 
    int expected = 1;
    int actual = CS.AssociationRuleProgram.IndexOf(itemSet, item, startIndx);
 
    Assert.AreEqual(expected, actual);
}
public void IndexOfFSUsingExample_ReturnsExpectedValue()
{
    int[] itemSet = new int[4] { 0, 1, 4, 5 };
    Int32 item = 1;
    Int32 startIndx = 1;
 
    int expected = 1;
    int actual = FS.AssociationRuleProgram.IndexOf(itemSet, item, startIndx);
 
    Assert.AreEqual(expected, actual);
}

IsSubsetOf

In the C# implementation, IndexOf is called to keep track of where the search is currently pointed.

public static bool IsSubsetOf(int[] itemSet, int[] trans)
{
  // 'trans' is an ordered transaction like [0 1 4 5 8]
  int foundIdx = -1;
  for (int j = 0; j < itemSet.Length; ++j)
  {
    foundIdx = IndexOf(trans, itemSet[j], foundIdx + 1);
    if (foundIdx == -1) return false;
  }
  return true;
}

In The F# one, that is not needed:

static member IsSubsetOf(itemSet:int[] , trans:int[]) =
    let isInTrans x = (Array.exists(fun elem -> elem = x) trans)
    let filteredItemSet = itemSet
                            |> Array.map(fun value -> value, isInTrans value)
                            |> Array.filter(fun (value, trans) -> trans = false)
    if filteredItemSet.Length = 0 then true
        else false

CountInTrans

Here is the original C# code uses the IsSubsetOf function.

public static int CountInTrans(int[] itemSet, List<int[]> trans, Dictionary<int[], int> countDict)
{
   //number of times itemSet occurs in transactions, using a lookup dict
 
    if (countDict.ContainsKey(itemSet) == true)
    return countDict[itemSet]; // use already computed count
 
  int ct = 0;
  for (int i = 0; i < trans.Count; ++i)
    if (IsSubsetOf(itemSet, trans[i]) == true)
      ++ct;
  countDict.Add(itemSet, ct);
  return ct;
}

And here is the F# Code that also uses that subfunction

static member CountInTrans(itemSet: int[], trans: List<int[]>, countDict: Dictionary<int[], int>) =
    let trans' = trans |> Seq.map(fun value -> value, AssociationRuleProgram.IsSubsetOf (itemSet,value))
    trans' |> Seq.filter(fun item -> snd item = true)
           |> Seq.length

GetHighConfRules

With the subfunctions created and running green, I then tackled the point of the exercise –> GetHighConfRules. The C# implementation is pretty verbose and there are lots of things happening:

    public static List<Rule> GetHighConfRules(List<int[]> freqItemSets, List<int[]> trans, double minConfidencePct)
    {
      // generate candidate rules from freqItemSets, save rules that meet min confidence against transactions
      List<Rule> result = new List<Rule>();
 
      Dictionary<int[], int> itemSetCountDict = new Dictionary<int[], int>(); // count of item sets
 
      for (int i = 0; i < freqItemSets.Count; ++i) // each freq item-set generates multiple candidate rules
      {
        int[] currItemSet = freqItemSets[i]; // for clarity only
        int ctItemSet = CountInTrans(currItemSet, trans, itemSetCountDict); // needed for each candidate rule
 
        for (int len = 1; len <= currItemSet.Length – 1; ++len) // antecedent len = 1, 2, 3, . .
        {
          int[] c = NewCombination(len); // a mathematical combination
 
          while (c != null) // each combination makes a candidate rule
          {
            int[] ante = MakeAntecedent(currItemSet, c);
            int[] cons = MakeConsequent(currItemSet, c); // could defer this until known if needed
          
            int ctAntecendent = CountInTrans(ante, trans, itemSetCountDict); // use lookup if possible 
            double confidence = (ctItemSet * 1.0) / ctAntecendent;
 
            if (confidence >= minConfidencePct) // we have a winner!
            {
              Rule r = new Rule(ante, cons, confidence); 
              result.Add(r); // if freq item-sets are distinct, no dup rules ever created
            }
            c = NextCombination(c, currItemSet.Length);
          } // while each combination
        } // len each possible antecedent for curr item-set
      } // i each freq item-set
 
      return result;
    } // GetHighConfRules

In the F# code, It decided to work inside out and get the rule for 1 itemset. I think the code reads pretty clear where each step is laid out

static member GetHighConfRules(freqItemSets:List<int[]>, trans:List<int[]>,  minConfidencePct:float) =
    let returnValue = new List<Rule>()
    freqItemSets 
        |> Seq.map (fun i -> i, AssociationRuleProgram.CountInTrans'(i,trans))
        |> Seq.filter(fun (i,c) -> (float)c > minConfidencePct)
        |> Seq.map(fun (i,mcp) -> i,mcp,AssociationRuleProgram.MakeAntecedent(i, trans.[0]))
        |> Seq.map(fun (i,mcp,a) -> i,mcp, a, AssociationRuleProgram.MakeConsequent(i, trans.[0]))
        |> Seq.iter(fun (i,mcp,a,c) -> returnValue.Add(new Rule(a,c,mcp)))
    returnValue

I then attempted to put this block into a larger block (trans.[0]) but then I realized that I was going about this the wrong way. Instead of using the C# code as a my base line, I need to approach the problem from a functional viewpoint. That will be the subject of my blog next week…

Filed under Analytics, F#

One Response to Association Rule Learning Via F# (Part 2)

Pingback: F# Weekly #21, 2014 | Sergey Tihon's Blog

Jamie Dixon's Home

Association Rule Learning Via F# (Part 2)

One Response to Association Rule Learning Via F# (Part 2)

Leave a comment Cancel reply

Categories

Recent Posts

Archives

Blogroll

Meta

Jamie Dixon's Home

Association Rule Learning Via F# (Part 2)

Share this:

Related

One Response to Association Rule Learning Via F# (Part 2)

Leave a comment Cancel reply

Categories

Recent Posts

Archives

Blogroll

Meta