Parsing Wake County Tax Site With F#

Based on the response of my last post on Wake County School scores, I decided to look at each school’s revenue base.   Instead of looking at free and reduced lunch as a correlating factor for school scores, I wanted to look at the aggregate home valuations of each school’s population.

To do that, I thought of Wake County Tax Department’s web site found here, which you can look up an address and see the tax value of the property.  Although they don’t have an api, their web site’s search result page has a predictable uri like this: so by placing in a 7-character integer, I could theoretically look at all of the tax records for the county.  Also, the HTML of the result page is standardized so parsing it should be fairly straightforward.

So I fired up Visual Studio and opened up the F# REPL. The first thing I did was to bring in the Html type provider and wire up a standard page for the type.

1 #r "../packages/FSharp.Data.2.1.1/lib/net40/FSharp.Data.dll" 2 open FSharp.Data 3 type context = HtmlProvider<"../data/RealEstateSample.html"> 4

I then could bring down all of the DOM elements for the page: and find all of the <Table> elements

1 let uri = "" 2 let body = context.Load(uri).Html.Body() 3 let tables = body.Descendants("TABLE") |> Seq.toList 4 tables |> Seq.length 5


So there are 14 tables on the page.  After some manual inspection, the table that holds the address information is table number 7:

1 let addressTable = tables.[7] 2


My first thought was to parse the text to see if there are key words that I can search on

1 let baseText = taxTable.ToString() 2 let marker = baseText.IndexOf("Total Value Assessed") 3 let remainingText = baseText.Substring(marker) 4 let marker' = remainingText.IndexOf("$") 5 let remainingText' = remainingText.Substring(marker') 6 let marker'' = remainingText'.IndexOf("<") 7 let finalText = remainingText'.Substring(0,marker'')

I then thought, “Jamie you are being stupid”.  Since the DOM is structured consistently,  I can just use the type provider and search on tags:

1 let addressTable = tables.[7] 2 let fonts = addressTable.Descendants("font") |> Seq.toList 3 let addressOne = fonts.[1].InnerText() 4 let addressTwo = fonts.[2].InnerText() 5 let addressThree = fonts.[3].InnerText() 6

and sure enough


And then going to table number 11, I can get the assessed value:

1 let taxTable = tables.[11] 2 let fonts' = taxTable.Descendants("font") |> Seq.toList 3 let assessedValue = fonts'.[3].InnerText() 4

and how cool is this?


So with the data elements in place, I need a way of saving the data.  Fortunately, the Json type provider is also in FSharp.Data so I could do this:

1 let valuation = JsonValue.Record [| 2 "addressOne", JsonValue.String addressOne 3 "addressTwo", JsonValue.String addressTwo 4 "addressThree", JsonValue.String addressThree 5 "assessedValue", JsonValue.String assessedValue |] 6 open System.IO 7 File.AppendAllText(@"C:\Data\dataTest.json",valuation.ToString()) 8

And in the file:


So now I have the pieces to make requests to the Wake County site and put the values into a json file.  I decided to push the data to the file after each request so if there is a reentrant fault, I would not lose everything:  So here is the gist and here is the results:


I then decided to see how long it will take to download the 1st 1,000 Ints.

1 #time 2 [1..100] |> Seq.iter(fun id -> doValuation id)

and with fiddler running


It took about 5 minutes for 1,000 ints


so extrapolating the max possible (9,999,999), it would take 83 hours.


Two thoughts come to mind for the next step

1) Use MBrace with some VMs on Azure to do the requests in parallel

2) Do a binary search to see the actual upper number for Wake County.

Tune in next week so see if that works.

One Response to Parsing Wake County Tax Site With F#

  1. Pingback: F# Weekly #8, 2015 | Sergey Tihon's Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: