The Big Ten and F#

I was talking to fellow TRINUGer David Green about football schools a couple of weeks ago.  He went to Northwestern and I went to Michigan and we were discussing the relative merits of universities doing well in football.  Assuming Goro was counting, on one hand, it is great to have a sport that can bring in tons of money to the school to fund non-football sports and activities, on the second hand it keeps alumni interested in their school, on the 3rd hand it can give locals a source of pride in the school, and on the last hand it can take the focus away from the other parts of the academic institution.

I then was talking to a professor at Ohio State University – she cares absolutely zero about the football team.  I made the comment that the smartest kids in Ohio don’t go to OSU.  They will go and root for their gladiators on Saturday but when it comes down to their academic and subsequent professional success, they look elsewhere.  She agreed.

Putting those two conversations together, it put OSU and MSU’s continued success in the Big Ten in context – as the inevitable bellyaching that those teams get the short stick when compared to the SEC.  For example, OSU and MSU both would be undefeated in the Ivy League in 2013– does that mean they should be considered in the same conversation as Alabama and Auburn for the national championship?  I think the biggest problem that OSU and MSU have is that they are in the Big Ten – which historically has been about geography, academic success, and athletic competition (in that order). 

Looking at the Big Ten Schools, I pulled their most recent academic ranking for US News and World Report and their BCS Ranking.  I then went over to MathIsFun to get the recipe for correlation:

image

I then went over to Visual Studio and created a solution like so:

image

Learning from my last project, I created my unit test first to verify that the calculation is correct:

  1. [TestMethod]
  2. public void FindCorrelationUsingStandardInput_ReturnsExpectedValue()
  3. {
  4.     Double[] tempatures = new Double[12] { 14.2, 16.4, 11.9, 15.2, 18.5, 22.1, 19.4, 25.1, 23.4, 18.1, 22.6, 17.2 };
  5.     Double[] sales = new Double[12] { 215, 325, 185, 332, 406, 522, 412, 614, 544, 421, 445, 408 };
  6.  
  7.     Double expected = .9575;
  8.     Double actual = Calculations.Correlation(tempatures, sales);
  9.     Assert.AreEqual(expected, actual);
  10. }

I then hopped over to my working code and started coding:

  1. type Calculations() =
  2.     static member Correlation(x:IEnumerable<double>, y:IEnumerable<double>) =
  3.         let meanX = Seq.average x
  4.         let meanY = Seq.average y
  5.         
  6.         let a = Seq.map(fun x -> x-meanX) x
  7.         let b = Seq.map(fun y -> y-meanY) y
  8.  
  9.         let ab = Seq.zip a b
  10.         let abProduct = Seq.map(fun (a,b) -> a * b) ab
  11.  
  12.         let aSquare = Seq.map(fun a -> a * a) a
  13.         let bSquare = Seq.map(fun b -> b * b) b
  14.         
  15.         let abSum = Seq.sum abProduct
  16.         let aSquareSum = Seq.sum aSquare
  17.         let bSquareSum = Seq.sum bSquare
  18.  
  19.         let sums = aSquareSum * bSquareSum
  20.         let squareRootOfSums = sqrt(sums)
  21.  
  22.         abSum/squareRootOfSums

What I noticed is that those intermediate variables make the code much more wordy than they need to be  – so a mathematician might think that the code is too verbose– but a developer might appreciate that each step is laid out.  In fact, I would argue that a better component design would be to break out each of the steps into their own function that can be independently testable (and perhaps reused by other functions):

  1. [TestMethod]
  2. public void GetMeanUsingStandardInputReturnsExpectedValue()
  3. {
  4.     Double[] tempatures = new Double[12] { 14.2, 16.4, 11.9, 15.2, 18.5, 22.1, 19.4, 25.1, 23.4, 18.1, 22.6, 17.2 };
  5.     Double expected = 18.675;
  6.     Double actual = Calculations.Mean(tempatures);
  7.     Assert.AreEqual(expected, actual);
  8. }
  9.  
  10. [TestMethod]
  11. public void GetBothMeansProductUsingStandardInputReturnsExpectedValue()
  12. {
  13.     Double[] tempatures = new Double[12] { 14.2, 16.4, 11.9, 15.2, 18.5, 22.1, 19.4, 25.1, 23.4, 18.1, 22.6, 17.2 };
  14.     Double[] sales = new Double[12] { 215, 325, 185, 332, 406, 522, 412, 614, 544, 421, 445, 408 };
  15.  
  16.     Double expected = 5325;
  17.     Double actual = Calculations.MeanProduct(tempatures);
  18.     Assert.AreEqual(expected, actual);
  19. }
  20.  
  21. [TestMethod]
  22. public void GetMeanSquareUsingStandardInputReturnsExpectedValue()
  23. {
  24.     Double[] tempatures = new Double[12] { 14.2, 16.4, 11.9, 15.2, 18.5, 22.1, 19.4, 25.1, 23.4, 18.1, 22.6, 17.2 };
  25.     Double[] sales = new Double[12] { 215, 325, 185, 332, 406, 522, 412, 614, 544, 421, 445, 408 };
  26.  
  27.     Double expected = 177;
  28.     Double actual = Calculations.MeanSquared(tempatures);
  29.     Assert.AreEqual(expected, actual);
  30. }

I’ll leave that implementation for another day as it is already getting late.  In any event, I ran the unit test and I got red (pink, really):

image

The spreadsheet rounded and my calculation does not.  I adjusted the unit test appropriately:

  1. [TestMethod]
  2. public void FindCorrelationUsingStandardInput_ReturnsExpectedValue()
  3. {
  4.     Double[] tempatures = new Double[12] { 14.2, 16.4, 11.9, 15.2, 18.5, 22.1, 19.4, 25.1, 23.4, 18.1, 22.6, 17.2 };
  5.     Double[] sales = new Double[12] { 215, 325, 185, 332, 406, 522, 412, 614, 544, 421, 445, 408 };
  6.  
  7.     Double correlation = Calculations.Correlation(tempatures, sales);
  8.     Double expected = .9575;
  9.  
  10.     Double actual = Math.Round(correlation, 4);
  11.     Assert.AreEqual(expected, actual);
  12. }

And now I am green:

image

So going back to the original question, I took the current Big Ten Schools and put their academic rankings and football rankings side by side:

image

 

I then made a revised Big Ten that had a much higher academic ranking based on schools that play in a power football conference but still maintain high academics.

image

Note that I left Penn State out of both of these lists b/c they have a NaN for their football ranking – but they certainly have a high enough academic score to be part of the revised Big Ten.

And then when I put those values through the correlation function via a Console UI:

  1. static void Main(string[] args)
  2. {
  3.     Console.WriteLine("Start");
  4.  
  5.     Double[] academicRanking = new Double[12] { 12,28,41,41,52,62,68,69,73,73,75,101 };
  6.     Double[] footballRanking = new Double[12] { 65,41,82,19,7,61,105,36,4,34,63,37 };
  7.  
  8.     Double originalCorrelation = Calculations.Correlation(academicRanking, footballRanking);
  9.     Console.WriteLine("Original BigTen Correlation {0}", originalCorrelation);
  10.  
  11.     academicRanking = new Double[10] { 7,12,17,18,23,23,28,30,41,41 };
  12.     footballRanking = new Double[10] { 24, 65, 32, 26, 94, 84, 41, 58, 82, 19 };
  13.     Double revisedCorrelation = Calculations.Correlation(academicRanking, footballRanking);
  14.     Console.WriteLine("Revised BigTen Correlation {0}", revisedCorrelation);
  15.  
  16.     
  17.     Console.WriteLine("End");
  18.     Console.ReadKey();
  19. }

I get:

image

And just looking at the data seems to support this.  There is a negative correlation between academics and football success in the current Big Ten – Higher the academics = lower the football ranking and vice versa.  In the revised Big Ten, there is positive correlation of the same magnitude – higher academics and higher (relative) football rankings.  Put another way, the new Big Ten has a much stronger academic ranking and pretty much the same football ranking.

Looking at a map, this new conference is like a doughnut with Ohio, West Virginia, and Kentucky in the middle.  Perhaps they can have a football championship sponsored by Krispie Kreeme?  In any event, OSU and MSU are much closer academically and football-wise to the Alabamas and Auburns than the Northwesterns and Michigans of the world.  In terms of geographic proximity, Columbus, Ohio is closer to Tuscalosa, AL than Lincoln, NB.  So perhaps the OSU and MSU fans would be better served in a conference that is more aligned with their University’s priorities?  If they went undefeated or even 1 loss, they would still be in the national championship discussion.

 

 

 

3 Responses to The Big Ten and F#

  1. Pingback: F# Weekly #50, 2013 | Sergey Tihon's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: