Parsing Microsoft MVP Pages and Uploading Photos to Sky Biometry
October 21, 2014 2 Comments
As a piece of the Terminator project that I am bringing to the MVP Summit, I wanted to load in all of the MVP photographs to Sky Biometry and if a person matches the photo at a high level, terminate them. I asked my Microsoft contact if I could get all of the MVP photos to load into the app and they politely told me no.
Not being one who takes no lightly, I decided to see if I could load the photos from the MVP website. Each MVP has a profile photo like here and all of the MVPs are listed here with their MVP IDs specified. So if I can get the Id from the search page and then create a Uri to the photo, I can then load it into Sky Biometry.
I first created a new FSharp project and fired up a script window. I created a function that gets the entire contents of a page with the only variable being the index number of the pagination.
1 let getPageContents(pageNumber:int) = 2 let uri = new Uri("http://mvp.microsoft.com/en-us/search-mvp.aspx?lo=United+States&sl=0&browse=False&sc=s&ps=36&pn=" + pageNumber.ToString()) 3 let request = WebRequest.Create(uri) 4 request.Method <- "GET" 5 let response = request.GetResponse() 6 let stream = response.GetResponseStream() 7 let reader = new StreamReader(stream) 8 reader.ReadToEnd() 9
I then parsed the page for all instances of the MVPId. Fortunately, I found this post that helped me understand how the pattern match works in .NET. Note that the regex for the tag mvpid=123456 is “mvpid=\d+”
1 let getMVPIdsFromPageContents(pageContents:string) = 2 let pattern = "mvpid=\d+" 3 let matchCollection = Regex.Matches(pageContents, pattern) 4 matchCollection 5 |> Seq.cast 6 |> Seq.map(fun (m:Match) -> m.Value) 7 |> Seq.map(fun s -> s.Split('=')) 8 |> Seq.map(fun a -> a.[1]) 9
With that out of the way, I could get a Seq of all MVP IDs (at least from America and then collect each of the pages together:
1 let getGetMVPIds(pageNumber: int) = 2 let pageContents = getPageContents(pageNumber) 3 getMVPIdsFromPageContents pageContents 4 5 let pageList = [1..17] 6 let mvpIds = pageList 7 |>Seq.collect(fun i -> getGetMVPIds(i)) 8
so far so good:
I then could create a method that generates the MVP Photo Uri:
1 let getMvpImageUri(mvpId: int) = 2 new Uri("http://mvp.microsoft.com/private/en-us/PublicProfile/Photo/" + mvpId.ToString()) 3
With that out of the way, it was time to point the photos to Sky Biometry for facial detection and tagging. I used the code found in this post with a couple of changes to account that a face might not be found in the photo (hence the choice type) and that bad things might happen (like too big of a photo)
1 type skybiometryFaceDetection = JsonProvider<".\SkyBiometryImageJson\FaceDetection.json"> 2 type skybiometryAddTags = JsonProvider<".\SkyBiometryImageJson\AddTags.json"> 3 type skybiometryFaceTraining = JsonProvider<".\SkyBiometryImageJson\FaceTraining.json"> 4 5 let detectFace (imageUri:string) = 6 let stringBuilder = new StringBuilder() 7 stringBuilder.Append(skyBiometryUri) |> ignore 8 stringBuilder.Append("/fc/faces/detect.json?urls=") |> ignore 9 stringBuilder.Append(imageUri) |> ignore 10 stringBuilder.Append("&api_key=") |> ignore 11 stringBuilder.Append(skyBiometryApiKey) |> ignore 12 stringBuilder.Append("&api_secret=") |> ignore 13 stringBuilder.Append(skyBiometryApiSecret) |> ignore 14 try 15 let faceDetection = skybiometryFaceDetection.Load(stringBuilder.ToString()) 16 if faceDetection.Photos.[0].Tags.Length > 0 then 17 Some faceDetection.Photos.[0].Tags.[0].Tid 18 else 19 None 20 with | :? System.Exception -> None 21
I then added the other two methods to tag and recognize
1 let saveTag(uid:string, tid:string)= 2 let stringBuilder = new StringBuilder() 3 stringBuilder.Append(skyBiometryUri) |> ignore 4 stringBuilder.Append("/fc/tags/save.json?uid=") |> ignore 5 stringBuilder.Append(uid) |> ignore 6 stringBuilder.Append("&tids=") |> ignore 7 stringBuilder.Append(tid) |> ignore 8 stringBuilder.Append("&api_key=") |> ignore 9 stringBuilder.Append(skyBiometryApiKey) |> ignore 10 stringBuilder.Append("&api_secret=") |> ignore 11 stringBuilder.Append(skyBiometryApiSecret) |> ignore 12 let tags = skybiometryAddTags.Load(stringBuilder.ToString()) 13 tags.Status 14 15 let trainFace(uid:string)= 16 let stringBuilder = new StringBuilder() 17 stringBuilder.Append(skyBiometryUri) |> ignore 18 stringBuilder.Append("/fc/faces/train.json?uids=") |> ignore 19 stringBuilder.Append(uid) |> ignore 20 stringBuilder.Append("&api_key=") |> ignore 21 stringBuilder.Append(skyBiometryApiKey) |> ignore 22 stringBuilder.Append("&api_secret=") |> ignore 23 stringBuilder.Append(skyBiometryApiSecret) |> ignore 24 let training = skybiometryFaceTraining.Load(stringBuilder.ToString()) 25 training.Status 26
Upon reflection, this would have been a perfect place for Scott W’s ROP, but I just created a covering function
1 let saveToSkyBiometry(mvpId:string, imageUri:string) = 2 let tid = detectFace(imageUri) 3 match tid with 4 | Some x -> saveTag(mvpId + "@terminatorChicken",x) |> ignore 5 trainFace(mvpId + "@terminatorChicken") 6 | None -> "Failure" 7 8 let results = mvpIds 9 |> Seq.map(fun mvpId -> mvpId, getMvpImageUri(Int32.Parse(mvpId))) 10
I then created a Seq.Map to call all of the photos in order but I quickly ran into this:
So I changed my Seq.Map to a Loop so I could throttle the requests:
1 for (mvpId,uri) in results do 2 let result= saveToSkyBiometry(mvpId, uri.ToString()) 3 printfn "%s" result 4 Thread.Sleep(TimeSpan.FromMinutes(1.)) 5
And sure enough
And you can see the load every hour
You can see the full code here.
Cool, the Sky Biometry service looks fun!
Next time you’re making the calls, try Http.fs – implement your getPageContents in one line, and make the URI construction a bit nicer too! ;-D. FSharp.Data has something similar.
(I guess you know about at least one of those and didn’t want any dependencies in your blog code, but I hate to miss a chance to plug my thing..)
Pingback: Anniversary edition of F# Weekly #43, 2014 – Two years together | Sergey Tihon's Blog