Predicting Physician Gender Using AzureML and F#

I am working with a couple of friends in a 2 week hackathon where the main subject is health care provider quality.  One of the datasets that we are using is the national registry of physician information found here.  One of the team members loaded it into Azure Sql Server and it is a dog.  It is a about 1 gig of data and takes a couple of minutes to scan the entire dataset.  I decided to take a small slice of the data (Connecticut physicians) and do some analysis on it .

My first step was to bring the data into AzureML via the Data Reader


Note that it took about 3 minutes to bring the data down.  I then saved this data as a local dataset to do my experiments:


I then fired up another experiment using the dataset as the base.  I first dragged in a Project Column module to only grab the columns I was interested in

image image

I then pulled in a Missing Values Scrubber module where I would drop any row where there was a value missing

image image

I then brought in a Metadata Editor module To change all of the fields to Categorical data types

image image

With the data ready to go, I created a 70/30 (train/test) split of the data and added a Multiclass Decision Forest model with Gender as the Dependent variable

image image

I then added a Score Model module and fed in the 30%.  I finally added an Evaluate Model module


And the results were interesting, if not unsurprising:


Basically, if I know your age, your specialty, and your medical school, we can predict if you are a man 85% of the time.  Encouragingly, we can only do it 62% of the time for a woman.   I then published the experiment and created a quick script to consume the data:

1 #r @"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.5\System.Net.Http.dll" 2 #r @"..\packages\Microsoft.AspNet.WebApi.Client.5.2.2\lib\net45\System.Net.Http.Formatting.dll" 3 4 open System 5 open System.Net.Http 6 open System.Net.Http.Headers 7 open System.Net.Http.Formatting 8 open System.Collections.Generic 9 10 type scoreData = {FeatureVector:Dictionary<string,string>;GlobalParameters:Dictionary<string,string>} 11 type scoreRequest = {Id:string; Instance:scoreData} 12 13 let invokeService () = async { 14 let apiKey = "" 15 let uri = "" 16 use client = new HttpClient() 17 client.DefaultRequestHeaders.Authorization <- new AuthenticationHeaderValue("Bearer",apiKey) 18 client.BaseAddress <- new Uri(uri) 19 20 let input = new Dictionary<string,string>() 21 input.Add("Gender","U") 22 input.Add("MedicalSchoolName","OTHER") 23 input.Add("GraduationYear","1995") 24 input.Add("PrimarySpecialty","INTERNAL MEDICINE") 25 26 let instance = {FeatureVector=input; GlobalParameters=new Dictionary<string,string>()} 27 let scoreRequest = {Id="score00001";Instance=instance} 28 29 let! response = client.PostAsJsonAsync("",scoreRequest) |> Async.AwaitTask 30 let! result = response.Content.ReadAsStringAsync() |> Async.AwaitTask 31 32 if response.IsSuccessStatusCode then 33 printfn "%s" result 34 else 35 printfn "FAILED: %s" result 36 response |> ignore 37 } 38 39 invokeService() |> Async.RunSynchronously

And I have a way of predicting genders:

U,OTHER,1995,INTERNAL MEDICINE,0.651031798112075,0.348968201887925,0,F

One Response to Predicting Physician Gender Using AzureML and F#

  1. Pingback: F# Weekly #49, 2014 | Sergey Tihon's Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: