Predicting Physician Gender Using AzureML and F#
December 2, 2014 1 Comment
I am working with a couple of friends in a 2 week hackathon where the main subject is health care provider quality. One of the datasets that we are using is the national registry of physician information found here. One of the team members loaded it into Azure Sql Server and it is a dog. It is a about 1 gig of data and takes a couple of minutes to scan the entire dataset. I decided to take a small slice of the data (Connecticut physicians) and do some analysis on it .
My first step was to bring the data into AzureML via the Data Reader
Note that it took about 3 minutes to bring the data down. I then saved this data as a local dataset to do my experiments:
I then fired up another experiment using the dataset as the base. I first dragged in a Project Column module to only grab the columns I was interested in
I then pulled in a Missing Values Scrubber module where I would drop any row where there was a value missing
I then brought in a Metadata Editor module To change all of the fields to Categorical data types
With the data ready to go, I created a 70/30 (train/test) split of the data and added a Multiclass Decision Forest model with Gender as the Dependent variable
I then added a Score Model module and fed in the 30%. I finally added an Evaluate Model module
And the results were interesting, if not unsurprising:
Basically, if I know your age, your specialty, and your medical school, we can predict if you are a man 85% of the time. Encouragingly, we can only do it 62% of the time for a woman. I then published the experiment and created a quick script to consume the data:
1 #r @"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.5\System.Net.Http.dll" 2 #r @"..\packages\Microsoft.AspNet.WebApi.Client.5.2.2\lib\net45\System.Net.Http.Formatting.dll" 3 4 open System 5 open System.Net.Http 6 open System.Net.Http.Headers 7 open System.Net.Http.Formatting 8 open System.Collections.Generic 9 10 type scoreData = {FeatureVector:Dictionary<string,string>;GlobalParameters:Dictionary<string,string>} 11 type scoreRequest = {Id:string; Instance:scoreData} 12 13 let invokeService () = async { 14 let apiKey = "" 15 let uri = "https://ussouthcentral.services.azureml.net/workspaces/19a2e623b6a944a3a7f07c74b31c3b6d/services/6c4bbb43456e4d7e8a9196f2899f717d/score" 16 use client = new HttpClient() 17 client.DefaultRequestHeaders.Authorization <- new AuthenticationHeaderValue("Bearer",apiKey) 18 client.BaseAddress <- new Uri(uri) 19 20 let input = new Dictionary<string,string>() 21 input.Add("Gender","U") 22 input.Add("MedicalSchoolName","OTHER") 23 input.Add("GraduationYear","1995") 24 input.Add("PrimarySpecialty","INTERNAL MEDICINE") 25 26 let instance = {FeatureVector=input; GlobalParameters=new Dictionary<string,string>()} 27 let scoreRequest = {Id="score00001";Instance=instance} 28 29 let! response = client.PostAsJsonAsync("",scoreRequest) |> Async.AwaitTask 30 let! result = response.Content.ReadAsStringAsync() |> Async.AwaitTask 31 32 if response.IsSuccessStatusCode then 33 printfn "%s" result 34 else 35 printfn "FAILED: %s" result 36 response |> ignore 37 } 38 39 invokeService() |> Async.RunSynchronously
And I have a way of predicting genders:
U,OTHER,1995,INTERNAL MEDICINE,0.651031798112075,0.348968201887925,0,F
Pingback: F# Weekly #49, 2014 | Sergey Tihon's Blog