← Elevator App: Part 1 – Data Layer Using F#

Apriori Algorithm and F# Using Elevator Inspection Data

March 18, 2014 1 Comment

Now that I have the elevator dataset in a workable state, I wanted to see what I could see with the data. I was reading Machine Learning In Action and the authors suggested that an Apriori Algorithm as a way to quantify associations among data points. I read both Harrington’s code and Wikipedia’s description and I found both the be impenetrable – the former because their code was unreadable and the later because the mathematical formulas depended on a level of algebra that I don’t have.

Fortunately, I found a C# project on Codeproject that had both an excellent example/introduction and C# code. I used the examples on the website to formulate my F# implementation.

The first thing I did was create a class that matched the 1st grid in the example

namespace ChickenSoftware.ElevatorChicken.Analysis
 
open System.Collections.Generic
 
type Transaction = {TID: string; Items: List<string> }
 
type Apriori(database: List<Transaction>, support: float, confidence: float) = 
    member this.Database = database
    member this.Support = support
    member this.Confidence = confidence

Note that because F# is immutable by default, the properties are read-only. I then created a unit test project that makes sure the constructor works without exceptions. The data matches the example:

public AprioriTests()
{
    var database = new List<Transaction>();
    database.Add(new Transaction("100", new List<string>() { "A", "C", "D" }));
    database.Add(new Transaction("200", new List<string>() { "B", "C", "E" }));
    database.Add(new Transaction("300", new List<string>() { "A", "B", "C", "E" }));
    database.Add(new Transaction("400", new List<string>() { "B", "E" }));
 
    _apriori = new Apriori(database, .5, .80);
 
}
 
[TestMethod]
public void ConstructorUsingValidArguments_ReturnsExpected()
{
    Assert.IsNotNull(_apriori);
}

I then need a function to count up all of the items in the Itemsets. I refused to use loops, so I first started using Seq.Fold, but I was having zero luck because I was trying to fold a Seq of List. I then started experimenting with other functions when I found Seq.Collect – which was perfect. So I created a function like this:

member this.GetC1() =
    database
 
member this.GetL1() =
    let numberOfTransactions = this.GetC1().Count
 
    this.GetC1()
        |> Seq.collect(fun d -> d.Items)
        |> Seq.countBy(fun i -> i)
        |> Seq.map(fun (t,i) -> t, i, float i/ float numberOfTransactions)
        |> Seq.filter(fun (t,i,p) -> p >= support)
        |> Seq.map(fun (t,i,p) -> t,i)
        |> Seq.sort
        |> Seq.toList

Note that the numberOfTransactions is for the database, not the individual items in the List<Item>. And the results match the example:

So this is great. My next stop was to build a list of pair combinations of the remaining values

The trick is that is not a Cartesian join of the original sets – it is only the surviving sets that are needed. My first attempt looked like:

let C1 = database
 
let L1 = C1
        |> Seq.map(fun t -> t.Items)
        |> Seq.collect(fun i -> i)
        |> Seq.countBy(fun i -> i)
        |> Seq.map(fun (t,i) -> t, i, float i/ float numberOftransactions)
        |> Seq.filter(fun (t,i,p) -> p >= support)
        |> Seq.toArray
let C2A = L1 
            |> Seq.map(fun (x,y,z) -> x)
            |> Seq.toArray
let C2B = L1 
            |> Seq.map(fun (x,y,z) -> x)
            |> Seq.toArray
let C2 = C2A |> Seq.collect(fun x -> C2B |> Seq.map(fun y -> x+y))
C2   

With the output like this:

I was running out of Saturday morning so I went over to stack overflow and got a couple of responses. I was on the right track with the concat, but I didn’t think about the List.Filter(), which would prune my list. With this in mind, I copied Mark’s code and got what I was looking for

member this.GetC2() =
    let l1Itemset = this.GetL1() 
                    |> Seq.map(fun (i,s) -> i)
 
    let itemset = 
        l1Itemset
            |> Seq.map(fun x -> l1Itemset |> Seq.map(fun y -> (x,y)))
            |> Seq.concat
            |> Seq.filter(fun (x,y) -> x < y)
            |> Seq.sort
            |> Seq.toList         
    
    let listContainsItem(l:List<string>, a,b) =
            l.Contains(a) && l.Contains(b)
    
    let someFunctionINeedToRename(l1:List<string>, l2)=
            l2 |> Seq.map(fun (x,y) -> listContainsItem(l1,x,y))
 
    let itemsetMatches = this.GetC1()
                            |> Seq.map(fun t -> t.Items)
                            |> Seq.map(fun i -> someFunctionINeedToRename(i,itemset))
 
    let itemSupport = itemsetMatches
                            |> Seq.map(Seq.map(fun i -> if i then 1 else 0))
                            |> Seq.reduce(Seq.map2(+))
 
    itemSupport
        |> Seq.zip(itemset)
        |> Seq.toList

So now I have C2 filling correctly:

Taking the results, I needed to get L2.

That was much simpler that getting C2 –> here is the code:

member this.GetL2() = 
    let numberOfTransactions = this.GetC1().Count
    
    this.GetC2()
            |> Seq.map(fun (i,n) -> i,n,float n/float numberOfTransactions)
            |> Seq.filter(fun (i,n,p) -> p >= support)
            |> Seq.map(fun (t,i,p) -> t,i)
            |> Seq.sort
            |> Seq.toList    

And when I run it – it matches this example exactly:

Finally, I added in a C# and L3. This code is identical to the C2/L2 code with one exception: mapping a triple and not a tuple: The C2 code maps like this

let itemset = 
    l1Itemset
        |> Seq.map(fun x -> l1Itemset |> Seq.map(fun y -> (x,y)))
        |> Seq.concat
        |> Seq.filter(fun (x,y) -> x < y)
        |> Seq.sort
        |> Seq.toList     

and the C3 code looks like this (took me 15 minutes to figure out line 3 below):

let itemset = 
    l2Itemset
        |> Seq.map(fun x -> l2Itemset |> Seq.map(fun y-> l2Itemset |> Seq.map(fun z->(fst x,fst y,snd z))))
        |> Seq.concat
        |> Seq.collect(fun d -> d)
        |> Seq.filter(fun (x,y,z) -> x < y && y < z)
        |> Seq.distinct
        |> Seq.sort
        |> Seq.toList    

With the C3 and L3 matching the example also:

I was now ready to put in the elevator data into the analysis. I think I am getting better at F# because I did the mapping, filtering, and transformation of the data from the server without looking at any other material and it look only 15 minutes.

type public ElevatorBuilder() = 
    let connectionString = ConfigurationManager.ConnectionStrings.["localData2"].ConnectionString;
 
    member public this.GetElevatorTransactions() =
        let transactions = this.GetElevators() 
                              |> Seq.map(fun e ->this.ConvertElevatorToTransaction(e))
        let transactionsList = new System.Collections.Generic.List<Transaction>(transactions)
        transactionsList
 
    member public this.ConvertElevatorToTransaction(i: string, t:string, c:string, s:string) =
        let items = new System.Collections.Generic.List<String>()
        items.Add(t)
        items.Add(c)
        items.Add(s)
        let transaction = {TID=i; Items=items}
        transaction
 
    member public this.GetElevators () =
        SqlConnection.GetDataContext(connectionString).ElevatorData201402
            |> Seq.map(fun e -> e.ID, e.EquipType,e.Capacity,e.Speed)
            |> Seq.filter(fun (i,et,c,s) -> not(String.IsNullOrEmpty(et)))
            |> Seq.filter(fun (i,et,c,s) -> c.HasValue)
            |> Seq.filter(fun (i,et,c,s) -> s.HasValue)
            |> Seq.map(fun (i,t,c,s) -> i, this.CatagorizeEquipmentType(t),c,s)
            |> Seq.map(fun (i,t,c,s) -> i,t,this.CatagorizeCapacity(c.Value),s)
            |> Seq.map(fun (i,t,c,s) -> i,t,c,this.CatagorizeSpeed(s.Value))
            |> Seq.map(fun (i,t,c,s) -> i.ToString(),t,c,s)

The longest part was aggregating the free-form text of the Equipment Type field (here is partial snip, you get the idea…)

member public this.CatagorizeEquipmentType(et: string) =
    match et.Trim() with 
        | "OTIS" -> "OTIS"
        | "OTIS (1-2)" -> "OTIS"
        | "OTIS (2-1)" -> "OTIS"
        | "OTIS hydro" -> "OTIS"
        | "OTIS, HYD" -> "OTIS"
        | "OTIS/ ASHEVILLE " -> "OTIS"
        | "OTIS/ MOUNTAIN " -> "OTIS"
        | "OTIS/#1" -> "OTIS"
        | "OTIS/#19 " -> "OTIS"

Assigning categories for speed and capacity was a snap using F#

member public this.CatagorizeCapacity(c: int) =
    let lowerBound = (c/25 * 25) + 1
    let upperBound = lowerBound + 24
    lowerBound.ToString() + "-" + upperBound.ToString()        
 
member public this.CatagorizeSpeed(s: int) =
    let lowerBound = (s/50 * 50) + 1
    let upperBound = lowerBound + 49
    lowerBound.ToString() + "-" + upperBound.ToString()    

With this in hand, I created a Console app that takes the 27K records and pushes them though the apriori algorithm:

private static void RunElevatorAnalysis()
{
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();
    ElevatorBuilder builder = new ElevatorBuilder();
    var transactions = builder.GetElevatorTransactions();
    stopwatch.Stop();
    Console.WriteLine("Building " + transactions.Count + " transactions took: " + stopwatch.Elapsed.TotalSeconds);
    var apriori = new Apriori(transactions, .1, .75);
    var c2 = apriori.GetC2();
    stopwatch.Reset();
    stopwatch.Start();
    var l1 = apriori.GetL1();
    Console.WriteLine("Getting L1 took: " + stopwatch.Elapsed.TotalSeconds);
    var l2 = apriori.GetL2();
    Console.WriteLine("Getting L2 took: " + stopwatch.Elapsed.TotalSeconds);
    var l3 = apriori.GetL3();
    Console.WriteLine("Getting L3 took: " + stopwatch.Elapsed.TotalSeconds);
    stopwatch.Stop();
    Console.WriteLine("–L1");
    foreach (var t in l1)
    {
        Console.WriteLine(t.Item1 + ":" + t.Item2);
    }
    Console.WriteLine("–L2");
    foreach (var t in l2)
    {
        Console.WriteLine(t.Item1 + ":" + t.Item2);
    }
    Console.WriteLine("–L3");
    foreach (var t in l3)
    {
        Console.WriteLine(t.Item1 + ":" + t.Item2);
    }
}

I then made an offering to the F# Gods and hit F5:

Doh! The gods were not pleased. I then went back to my initial filtering function and added a Seq.Take(25000) and the results:

So there a couple of things to draw from this exercise.

1) Apriori Algorithm is the wrong classification technique for this dataset. I had to bring the support way down (10%) to even get any readings. Also, there is too much dispersion of the values. This kind of algorithm is much better with N number of a smaller set of data values versus a fixed number of large values.

2) Even so, how cool is this? Compare the files just to make the C#/OO work versus with F#

And the Total LOC is 539 for C# versus 120 for F# – and the F# can be optimized using a better way to create search and itemsets. Hard-coding each level was a hack I did to get thing working and give me an understanding of how AA works. I bet this can be consolidated to well under 75 lines without sacrificing readability

3) I think the StackOverflow exception is because I am doing a Cartesian join and then paring the result. Using one of the other techniques suggested on SO will give much better results.

I any event, what a fun project! I can’t wait to optimize this and perhaps throw a different algorithm at the dataset in the coming weeks.

Filed under Analytics, F#, Open Data

One Response to Apriori Algorithm and F# Using Elevator Inspection Data

Pingback: F# Weekly #12, 2014 | Sergey Tihon's Blog

Jamie Dixon's Home

Apriori Algorithm and F# Using Elevator Inspection Data

One Response to Apriori Algorithm and F# Using Elevator Inspection Data

Leave a comment Cancel reply

Categories

Recent Posts

Archives

Blogroll

Meta

Jamie Dixon's Home

Apriori Algorithm and F# Using Elevator Inspection Data

Share this:

Related

One Response to Apriori Algorithm and F# Using Elevator Inspection Data

Leave a comment Cancel reply

Categories

Recent Posts

Archives

Blogroll

Meta