Parsing Microsoft MVP Pages Part 2

As a final piece of the Terminator App (V1) is to associate MVP Names to the pictures I uploaded to Sky Biometry via the MVPId.  I already blogged about how to parse the MVP search page and get the photos for sky biometry and this was a similar task.  The key for each photo is the MVPId.  Once a person’s photo is sent to Sky Biometry, the response is the photo used to match and their Id.  Ideally, we would also see the person’s name

The first step was to parse the MVP list the same way I did before:

1 let getPageContents(pageNumber:int) = 2 let uri = new Uri("http://mvp.microsoft.com/en-us/search-mvp.aspx?lo=United+States&sl=0&browse=False&sc=s&ps=36&pn=" + pageNumber.ToString()) 3 let request = WebRequest.Create(uri) 4 request.Method <- "GET" 5 let response = request.GetResponse() 6 let stream = response.GetResponseStream() 7 let reader = new StreamReader(stream) 8 reader.ReadToEnd()

Next, once the page is laoded, I needed a way of parsing the name.  I used the tag like this <a href="/en-us/mvp/Jamie%20Dixon-5000814" to identify MVPs.  I then layered in a regex like this

1 let getMVPInfoFromPageContents(pageContents:string) = 2 let pattern = "(us\\/mvp\\/)([A-Z])(.+?)(-)(\\d+)" 3 let matchCollection = Regex.Matches(pageContents, pattern) 4 matchCollection 5 |> Seq.cast 6 |> Seq.map(fun (m:Match) -> m.Value) 7 |> Seq.map(fun s -> s.Split('-')) 8 |> Seq.map(fun a -> a.[0],a.[1]) 9 |> Seq.map(fun (n,i) -> n.Substring(7),n,i) 10 |> Seq.map(fun (n,ln,i) -> n.Replace("%20"," "),ln,i) 11 |> Seq.map(fun (n,ln,i) -> n,"mvp.microsoft.com/en-"+ln+"-"+i,i) 12 |> Seq.distinctBy(fun (n,uri,i) -> n) 13

And this is a great site in terms of building regexs.

With the list parsed, I then put each page together and saved it to disk

1 let getGetMVPInfos(pageNumber: int) = 2 let pageContents = getPageContents(pageNumber) 3 getMVPInfoFromPageContents pageContents 4 5 let pageList = [1..17] 6 let mvpInfos = pageList 7 |>Seq.collect(fun i -> getGetMVPInfos(i)) 8 9 let outFile = new StreamWriter(@"c:\data\mvpList.csv") 10 mvpInfos |> Seq.iter(fun (n,uri,i) -> outFile.WriteLine(sprintf "%s,%s,%s" n uri i)) 11 outFile.Flush 12 outFile.Close()

And with that in place, the terminator can use the FSharp csv provider to load the list (and also find Esther Lee, the one non-MVP the terminiator is scanning for)

1 namespace ChickenSoftware.Terminator.Core 2 3 open System 4 open FSharp.Data 5 6 type nameMappingContext = CsvProvider<"C:/data/mvpList.csv"> 7 8 type LocalFileSystemMvpProvider () = 9 member this.GetMVPInfo (mvpId:int) = 10 if mvpId = 1 then 11 new MVPInfo(1,"Esther Lee","NA","https://pbs.twimg.com/profile_images/2487129558/3DSC_0379.jpg") 12 else 13 let nameList = nameMappingContext.Load("C:/data/mvpList.csv") 14 let foundInfo = nameList.Rows 15 |> Seq.filter(fun r -> r.``21505`` = mvpId.ToString()) 16 |> Seq.map(fun r -> new MVPInfo(Int32.Parse(r.``21505``),r.``Bill Jelen``, 17 r.``mvp.microsoft.com/en-us/mvp/Bill%20Jelen-21505``, 18 "http://mvp.microsoft.com/private/en-us/PublicProfile/Photo/" + r.``21505``)) 19 |> Seq.toArray 20 if foundInfo.Length > 0 then 21 foundInfo.[0] 22 else 23 new MVPInfo(-1,"None","None","None") 24

And then compare the 2 photos and get the person’s name

1 LocalFileSystemMvpProvider mvpProvider = new LocalFileSystemMvpProvider(); 2 var mvpInfo = mvpProvider.GetMVPInfo(mvpId); 3 4 compareImage.Source = new BitmapImage(new Uri(mvpInfo.PhotoUri)); 5 facialRecognitionTextBox.Text = mvpInfo.FullName + " identified with a " + matchValue.Confidence + "% confidence."; 6

And it (kinda works)

image

and kinda not

image

One Response to Parsing Microsoft MVP Pages Part 2

  1. Pingback: F# Weekly #44, 2014 | Sergey Tihon's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: