Signature Project: Slope of the Signature (Part 1)
July 17, 2012 Leave a comment
Continuing the signature project, I then wanted to calculate the slope of each signature to then normalize the different slopes to determine the match % of the signatures’ scatterplots. To that end, I needed to figure out the “center” line of each signature’s scatterplot. Remembering the “Sum of Least Square” method from biostatistics 20 years ago, I realized that I needed to plot my best “guess of the line with 50% of the points above the line and 50% below. I could then create the line based on this Y coordinate and the far left and right X Coordinate. Note that I could then figure out the dispersion of the points away from this line. Perhaps that could then tell me if someone signed the second signature more messily and were perhaps under duress. Leaving that analysis for another day – I just wanted to figure out the center line.
Following TDD, I created a test:
[TestMethod()] [DeploymentItem("Tff.Signature.Comparison.dll")] public void GetCenterYPoint_TwoPoints_ReturnsAverage_Test() { ScatterplotComparisonFactory_Accessor target = new ScatterplotComparisonFactory_Accessor(); List<Point> points = new List<Point>(); points.Add(new Point(0, 0)); points.Add(new Point(0, 2)); int expected = 1; int actual = target.GetCenterYPoint(points); Assert.AreEqual(expected, actual); }
I then wrote the GetCenterYPoint method (note the syntax to access private methods):
private Int32 GetCenterYPoint(List<Point> points) { int totalYValue = 0; foreach (Point point in points) { totalYValue += point.Y; } return (Int32)(totalYValue / points.Count); }
The test ran green so I tackled the Left and Right X Points:
[TestMethod()] [DeploymentItem("Tff.Signature.Comparison.dll")] public void GetLeftMostXPoint_TwoPoints_ReturnsLowest_Test() { ScatterplotComparisonFactory_Accessor target = new ScatterplotComparisonFactory_Accessor(); List<Point> points = new List<Point>(); points.Add(new Point(1, 0)); points.Add(new Point(0, 2)); int expected = 0; int actual = target.GetLeftMostXPoint(points); Assert.AreEqual(expected, actual); } [TestMethod()] [DeploymentItem("Tff.Signature.Comparison.dll")] public void GetRightMostXPoint_TwoPoints_ReturnsHighest_Test() { ScatterplotComparisonFactory_Accessor target = new ScatterplotComparisonFactory_Accessor(); List<Point> points = new List<Point>(); points.Add(new Point(0, 0)); points.Add(new Point(1, 2)); int expected = 1; int actual = target.GetRightMostXPoint(points); Assert.AreEqual(expected, actual); }
The methods are pretty much what you expect:
private Int32 GetLeftMostXPoint(List<Point> points) { int mostLeftXValue = points[0].X; foreach (Point point in points) { if (point.X < mostLeftXValue) { mostLeftXValue = point.X; } } return mostLeftXValue; } private Int32 GetRightMostXPoint(List<Point> points) { int mostRightXValue = points[0].X; foreach (Point point in points) { if (point.X > mostRightXValue) { mostRightXValue = point.X; } } return mostRightXValue; }
Those also ran green so I created the CenterLineOfPoints tests. The first was a Horizontal line should return the same value:
[TestMethod()] [DeploymentItem("Tff.Signature.Comparison.dll")] public void GetCenterLineOfPoints_HorizontalLine_ReturnsSame_Test() { ScatterplotComparisonFactory_Accessor target = new ScatterplotComparisonFactory_Accessor(); List<Point> points = new List<Point>(); points.Add(new Point(0, 0)); points.Add(new Point(4, 0)); Line expected = new Line(0, 0, 4, 0); Line actual = target.GetCenterLineOfPoints(points); Assert.AreEqual(expected, actual); }
I wrote my method:
private Line GetCenterLineOfPoints(List<Point> points) { Point leftPoint = new Point(); leftPoint.X = GetLeftMostXPoint(points); leftPoint.Y = GetCenterYPoint(points); Point rightPoint = new Point(); rightPoint.X = GetRightMostXPoint(points); rightPoint.Y = leftPoint.Y; return new Line(leftPoint, rightPoint); }
It ran green, so I ran the next test – two vertical lines should return the average distance between them:
[TestMethod()] [DeploymentItem("Tff.Signature.Comparison.dll")] public void GetCenterLineOfPoints_VerticalLine_ReturnsSamePointInMiddle_Test() { ScatterplotComparisonFactory_Accessor target = new ScatterplotComparisonFactory_Accessor(); List<Point> points = new List<Point>(); points.Add(new Point(0, 0)); points.Add(new Point(0, 4)); Line expected = new Line(0, 2, 0, 2); Line actual = target.GetCenterLineOfPoints(points); Assert.AreEqual(expected, actual); }
That also ran green (Note that some TDDers put both of those tests in the same test method. I don’t like to do that – even if it means red/green/refactor crosses several tests. In any event, I then added the final test to confirm the awesomeness of my calculation. A line at 45 degree angle should return the same line with a .5 slope.
[TestMethod()] [DeploymentItem("Tff.Signature.Comparison.dll")] public void GetCenterLineOfPoints_45degreeSlopeLine_ReturnsAverage_Test() { ScatterplotComparisonFactory_Accessor target = new ScatterplotComparisonFactory_Accessor(); List<Point> points = new List<Point>(); points.Add(new Point(0, 0)); points.Add(new Point(4, 4)); Line expected = new Line(0, 0, 4, 4); Line actual = target.GetCenterLineOfPoints(points); Assert.AreEqual(expected, actual); }
And I ran it and got RED! Crap!
I am doing something wrong in my calculation. It was pretty obvious, the Y point for that center line is NOT the same for both points. If it was, the slope would always be 0.0. What I need to do was channel my inner Gauss and use his formula for calculating SLS. I then binged around and found this great step by step article to the finding the center line.
Forgetting TDD for a second, I attempted to implement the formula using C#. I came up with this:
private Line GetCenterLineViaSumOfLeastSquares(List<Point> points) { Int32 sumOfXValue = 0; Int32 sumOfYValue = 0; Int32 sumOfXValueSquared = 0; Int32 sumOfXValueMultipledByYValue = 0; Int32 numberOfPoints = 0; foreach (Point point in points) { sumOfXValue += point.X; sumOfYValue += point.Y; sumOfXValueSquared += point.X ^ 2; sumOfXValueMultipledByYValue += point.X * point.Y; numberOfPoints ++; } Double xMean = sumOfXValue / numberOfPoints; Double yMean = sumOfYValue / numberOfPoints; Double numerator = sumOfXValueMultipledByYValue - ((sumOfXValue * sumOfYValue) / numberOfPoints); Double denomiator = sumOfXValueSquared - ((sumOfXValue ^ 2)/numberOfPoints); Double slope = numerator / denomiator; Double yIntercept = yMean - (slope) * sumOfXValue; Point startPoint = new Point(0, (Int32)yIntercept); Point endPoint = new Point(0, 0); return new Line(startPoint, endPoint); }
Note that I didn’t know how to calculate the Xaxis intercept yet, so I left it as 0,0. Some of you might ask why did I do this
Double xMean = sumOfXValue / numberOfPoints;
When I could have done this?
Double xMean = sumOfXValue / points.Count;
In a word. Readability. Using explanatory variables does not waste any meaningful processing time, and the code is more readable – it matches the mathematical formula’s text.
In any event, I then realized that I don’t need a line to represent the middle line – I just needed the slope. And lookie right there – I have a variable called… slope. I changed the method to return the slope of the middle line:
private Double GetSlopeOfScatterplot(List<Point> points) { //Bunch of code return numerator / denomiator; }
Jumping back to TDD, I realized I need a bunch of tests to verify that my slope calculation is correct. I deleted all of my prior work – thank goodness for source control and here is what I came up with:
[TestMethod()] [DeploymentItem("Tff.Signature.Comparison.dll")] public void GetSlopeOfScatterplot_HorizontalLine_ReturnsZero_Test() { ScatterplotComparisonFactory_Accessor target = new ScatterplotComparisonFactory_Accessor(); List<Point> points = new List<Point>(); points.Add(new Point(0, 2)); points.Add(new Point(1, 2)); points.Add(new Point(3, 2)); points.Add(new Point(4, 2)); Double expected = 0.0; Double actual = target.GetSlopeOfScatterplot(points); Assert.AreEqual(expected, actual); } [TestMethod()] [ExpectedException(typeof(DivideByZeroException))] [DeploymentItem("Tff.Signature.Comparison.dll")] public void GetSlopeOfScatterplot_VerticalLine_ReturnsUndefined_Test() { ScatterplotComparisonFactory_Accessor target = new ScatterplotComparisonFactory_Accessor(); List<Point> points = new List<Point>(); points.Add(new Point(2, 0)); points.Add(new Point(2, 1)); points.Add(new Point(2, 2)); points.Add(new Point(2, 3)); Double expected = 0.0; Double actual = target.GetSlopeOfScatterplot(points); Assert.AreEqual(expected, actual); } [TestMethod()] [DeploymentItem("Tff.Signature.Comparison.dll")] public void GetSlopeOfScatterplot_45DegreeLine_ReturnsOne_Test() { ScatterplotComparisonFactory_Accessor target = new ScatterplotComparisonFactory_Accessor(); List<Point> points = new List<Point>(); points.Add(new Point(0, 0)); points.Add(new Point(1, 1)); points.Add(new Point(2, 2)); points.Add(new Point(3, 3)); Double expected = 1.0; Double actual = target.GetSlopeOfScatterplot(points); Assert.AreEqual(expected, actual); }
The thing – The vertical line is coming back as 0 – when it should be divideByZero. Crap X 2!
I then went back and looked at my code and I realized I made a rookie mistake!
sumOfXValueSquared += (point.X ^ 2);
is wrong and
sumOfXValueSquared += (point.X * point.X);
is right. I then got this exception:
NaN. Wahoo!!!! I thought I would get a DividByZeroException, but I was wrong. I then used the Double.Nan function in that unit test
[TestMethod()] [DeploymentItem("Tff.Signature.Comparison.dll")] public void GetSlopeOfScatterplot_VerticalLine_ReturnsUndefined_Test() { ScatterplotComparisonFactory_Accessor target = new ScatterplotComparisonFactory_Accessor(); List<Point> points = new List<Point>(); points.Add(new Point(2, 0)); points.Add(new Point(2, 1)); points.Add(new Point(2, 2)); points.Add(new Point(2, 3)); Boolean expected = true; Boolean actual = Double.IsNaN(target.GetSlopeOfScatterplot(points)); Assert.AreEqual(expected, actual); }
and I was “green to go” <trademark pending>
Now I can compare 2 signatures and analyze the slope between them.