Neural Networks
July 15, 2014 1 Comment
I picked up James McCaffrey’s Neural Networks Using C# a couple of weeks ago and decided to see if I could rewrite the code in F#. Unfortunately, the source code is not available (as far as I could tell), so I did some C# then F# coding to see if I could get functional equivalence.
My first stop was chapter one. I made the decision to get the F# code working for the sample data that McCaffrey provided first and then refactor it to a more general program that would work with inputs and values of different datasets. My final upgrade will be use Deedle instead of any other data structure. But first things first, I want to get the examples working so I fired up a script file and opened my REPL.
McCaffrey defines a sample dataset like this
- string[] sourceData = new string[] { "Sex Age Locale Income Politics",
- "==============================================",
- "Male 25 Rural 63,000.00 Conservative",
- "Female 36 Suburban 55,000.00 Liberal", "Male 40 Urban 74,000.00 Moderate",
- "Female 23 Rural 28,000.00 Liberal" };
He then creates a parser for the comma-delimited string values into a double[][]. I just created the dataset as a List of tuples.
- let chapter1TestData = [("Male",25.,"Rural",63000.00,"Conservative");
- ("Female",36.,"Suburban",55000.00,"Liberal");
- ("Male",40.,"Urban",74000.00,"Moderate");
- ("Female",23.,"Rural",28000.00,"Liberal")]
I did try an implementation using a record type but for reasons below, I am using Tuples. With the equivalent data loaded into the REPL, I tackled the first supporting function: MinMax. Here is the C# code that McCaffrey wrote:
- static void MinMaxNormal(double[][] data, int column)
- {
- int j = column;
- double min = data[0][j];
- double max = data[0][j];
- for (int i = 0; i < data.Length; ++i)
- {
- if (data[i][j] < min) min = data[i][j];
- if (data[i][j] > max) max = data[i][j];
- }
- double range = max – min;
- if (range == 0.0) // ugly
- { for (int i = 0; i < data.Length; ++i)
- data[i][j] = 0.5;
- return; }
- for (int i = 0; i < data.Length; ++i)
- data[i][j] = (data[i][j] – min) / range;
- }
and here is the equivalent F# code.
- let minMax (fullSet, i) =
- let min = fullSet |> Seq.min
- let max = fullSet |> Seq.max
- (i-min)/(max-min)
Note that McCaffrey does not have any unit tests but when I ran the dummy data through the F# implementation, the results matched his screen shots so that will work well enough. If you ever need a reason to use F#, consider those 2 code samples. Granted McCaffrey’s code is more abstract because it can be any column in double array, but my counterpoint is that the function is really doing too much and it is trivial in F# to pick a given column. Is there any doubt what the F# code is doing? Is there any certainty of what the C# code is doing?
In any event, moving along to the next functions, McCaffrey created two functions that do all of the encoding of the string values to appropriate numeric ones. Depending on if the value is a X value (independent) or Y value (dependent), there is a different encoding scheme:
- static string EffectsEncoding(int index, int N)
- {
- // If N = 3 and index = 0 -> 1,0.
- // If N = 3 and index = 1 -> 0,1.
- // If N = 3 and index = 2 -> -1,-1.
- if (N == 2)
- // Special case.
- { if (index == 0) return "-1"; else if (index == 1) return "1"; }
- int[] values = new int[N – 1];
- if (index == N – 1)
- // Last item is all -1s.
- { for (int i = 0; i < values.Length; ++i) values[i] = -1; }
- else
- {
- values[index] = 1;
- // 0 values are already there.
- } string s = values[0].ToString();
- for (int i = 1; i < values.Length; ++i) s += "," + values[i]; return s;
- }
- static string DummyEncoding(int index, int N)
- {
- int[] values = new int[N]; values[index] = 1;
- string s = values[0].ToString();
- for (int i = 1; i < values.Length; ++i) s += "," + values[i];
- return
- }
In my F# project, I decided to domain-specific encoding. I plan to refactor this to something more abstract.
- //Transform Sex
- let testData' = chapter1TestData |> Seq.map(fun (s,a,l,i,p) -> match s with
- | "Male"-> -1.0,a,l,i,p
- | "Female" -> 1.0,a,l,i,p
- | _ -> failwith "Invalid sex")
- //Normalize Age
- let testData'' =
- let fullSet = testData' |> Seq.map(fun (s,a,l,i,p) -> a)
- testData' |> Seq.map(fun (s,a,l,i,p) -> s,minMax(fullSet,a),l,i,p)
- //Transform Locale
- let testData''' = testData'' |> Seq.map(fun (s,a,l,i,p) -> match l with
- | "Rural" -> s,a,1.,0.,i,p
- | "Suburban" -> s,a,0.,1.,i,p
- | "Urban" -> s,a,-1.,-1.,i,p
- | _ -> failwith "Invalid locale")
- //Transform and Normalize Income
- let testData'''' =
- let fullSet = testData''' |> Seq.map(fun (s,a,l0,l1,i,p) -> i)
- testData''' |> Seq.map(fun (s,a,l0,l1,i,p) -> s,a,l0,l1,minMax(fullSet,i),p)
- //Transform Politics
- let testData''''' = testData'''' |> Seq.map(fun (s,a,l0,l1,i,p) -> match p with
- | "Conservative" -> s,a,l0,l1,i,1.,0.,0.
- | "Liberal" -> s,a,l0,l1,i,0.,1.,0.
- | "Moderate" -> s,a,l0,l1,i,0.,0.,1.
- | _ -> failwith "Invalid politics")
When I execute the script:
Which is the same as McCaffrey’s.
Note that he used Gaussian normalization on column 2 and I did Min/Max based on his advice in the book.
Pingback: F# Weekly #29, 2014 | Sergey Tihon's Blog