Neural Networks

I picked up James McCaffrey’s Neural Networks Using C# a couple of weeks ago and decided to see if I could rewrite the code in F#.  Unfortunately, the source code is not available (as far as I could tell), so I did some C# then F# coding to see if I could get functional equivalence.

My first stop was chapter one.  I made the decision to get the F# code working for the sample data that McCaffrey provided first and then refactor it to a more general program that would work with inputs and values of different datasets.  My final upgrade will be use Deedle instead of any other data structure.  But first things first, I want to get the examples working so I fired up a script file and opened my REPL.

McCaffrey defines a sample dataset like this

  1. string[] sourceData = new string[] { "Sex Age Locale Income Politics",
  2.     "==============================================",
  3.     "Male 25 Rural 63,000.00 Conservative",
  4.     "Female 36 Suburban 55,000.00 Liberal", "Male 40 Urban 74,000.00 Moderate",
  5.     "Female 23 Rural 28,000.00 Liberal" };

He then creates a parser for the comma-delimited string values into a double[][].  I just created the dataset as a List of tuples.

  1. let chapter1TestData = [("Male",25.,"Rural",63000.00,"Conservative");
  2.                 ("Female",36.,"Suburban",55000.00,"Liberal");
  3.                 ("Male",40.,"Urban",74000.00,"Moderate");
  4.                 ("Female",23.,"Rural",28000.00,"Liberal")]

 

I did try an implementation using a record type but for reasons below, I am using Tuples.  With the equivalent data loaded into  the REPL, I tackled the first supporting function: MinMax.  Here is the C# code that McCaffrey wrote:

  1. static void MinMaxNormal(double[][] data, int column)
  2. {
  3.     int j = column;
  4.     double min = data[0][j];
  5.     double max = data[0][j];
  6.     for (int i = 0; i < data.Length; ++i)
  7.     {
  8.         if (data[i][j] < min) min = data[i][j];
  9.         if (data[i][j] > max) max = data[i][j];
  10.     }
  11.     double range = max – min;
  12.     if (range == 0.0) // ugly
  13.     { for (int i = 0; i < data.Length; ++i)
  14.         data[i][j] = 0.5;
  15.         return; }
  16.     for (int i = 0; i < data.Length; ++i)
  17.         data[i][j] = (data[i][j] – min) / range;
  18. }

and here is the equivalent F# code.

  1. let minMax (fullSet, i) =
  2.     let min = fullSet |> Seq.min
  3.     let max = fullSet |> Seq.max
  4.     (i-min)/(max-min)

 

Note that McCaffrey does not have any unit tests but when I ran the dummy data through the F# implementation, the results matched his screen shots so that will work well enough.  If you ever need a reason to use F#, consider those 2 code samples.  Granted McCaffrey’s code is more abstract because it can be any column in double array, but my counterpoint is that the function is really doing too much and it is trivial in F# to pick a given column.  Is there any doubt what the F# code is doing?  Is there any certainty of what the C# code is doing?

In any event, moving along to the next functions, McCaffrey created two functions that do all of the encoding of the string values to appropriate numeric ones.  Depending on if the value is a X value (independent) or Y value (dependent), there is a different encoding scheme:

  1.  static string EffectsEncoding(int index, int N)
  2.  {
  3.      // If N = 3 and index = 0 -> 1,0.
  4.      // If N = 3 and index = 1 -> 0,1.
  5.      // If N = 3 and index = 2 -> -1,-1.
  6.      if (N == 2)
  7.      // Special case.
  8.      { if (index == 0) return "-1"; else if (index == 1) return "1"; }
  9.      int[] values = new int[N – 1];
  10.      if (index == N – 1)
  11.      // Last item is all -1s.
  12.      { for (int i = 0; i < values.Length; ++i) values[i] = -1; }
  13.      else
  14.      {
  15.          values[index] = 1;
  16.          // 0 values are already there.
  17.      } string s = values[0].ToString();
  18.      for (int i = 1; i < values.Length; ++i) s += "," + values[i]; return s;
  19.  }
  20.  
  21.  static string DummyEncoding(int index, int N)
  22.  {
  23.      int[] values = new int[N]; values[index] = 1;
  24.      string s = values[0].ToString();
  25.      for (int i = 1; i < values.Length; ++i) s += "," + values[i];
  26.      return
  27. }

In my F# project, I decided to domain-specific encoding.  I plan to refactor this to something more abstract. 

  1. //Transform Sex
  2. let testData' = chapter1TestData |> Seq.map(fun (s,a,l,i,p) -> match s with
  3.                                                                | "Male"-> -1.0,a,l,i,p
  4.                                                              | "Female" -> 1.0,a,l,i,p
  5.                                                              | _ -> failwith "Invalid sex")
  6. //Normalize Age
  7. let testData'' =
  8.     let fullSet =  testData' |> Seq.map(fun (s,a,l,i,p) -> a)
  9.     testData' |> Seq.map(fun (s,a,l,i,p) -> s,minMax(fullSet,a),l,i,p)
  10.  
  11. //Transform Locale
  12. let testData''' = testData'' |> Seq.map(fun (s,a,l,i,p) -> match l with
  13.                                                                 | "Rural" -> s,a,1.,0.,i,p
  14.                                                                 | "Suburban" -> s,a,0.,1.,i,p
  15.                                                                 | "Urban" -> s,a,-1.,-1.,i,p
  16.                                                                 | _ -> failwith "Invalid locale")
  17. //Transform and Normalize Income
  18. let testData'''' =
  19.     let fullSet =  testData''' |> Seq.map(fun (s,a,l0,l1,i,p) -> i)
  20.     testData''' |> Seq.map(fun (s,a,l0,l1,i,p) -> s,a,l0,l1,minMax(fullSet,i),p)
  21.  
  22. //Transform Politics
  23. let testData''''' = testData'''' |> Seq.map(fun (s,a,l0,l1,i,p) -> match p with
  24.                                                                 | "Conservative" -> s,a,l0,l1,i,1.,0.,0.
  25.                                                                 | "Liberal" -> s,a,l0,l1,i,0.,1.,0.
  26.                                                                 | "Moderate" -> s,a,l0,l1,i,0.,0.,1.
  27.                                                                 | _ -> failwith "Invalid politics")

When I execute the script:

image

Which is the same as McCaffrey’s.

image

Note that he used Gaussian normalization on column 2 and I did Min/Max based on his advice in the book.

 

 

One Response to Neural Networks

  1. Pingback: F# Weekly #29, 2014 | Sergey Tihon's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: