Using DocumentDB With F#

DocumentDB is Microsoft’s non-sql offering on Azure.  I have limited experience with non-sql databases in general so I thought it would be a good way to try out no-sql on a real project using F#.  The first thing I noticed is that you can’t get to DocumentDB from the “old” azure portal –> you have to spin it up in the new one:

 Capture

Once I created my DocumentDB instance, I went to the getting started guide and found the code samples to accomplish the basic tasks you would expect to see in  any database product.  The getting started guide does not make it an explicit step, but you need to spin up a new FSharp project in Visual Studio and then use NuGet to get the latest SDK.

Once the NuGet package is installed, I went to a script to add the references:

1 #r "../packages/Microsoft.Azure.Documents.Client.0.9.1-preview/lib/net40/Microsoft.Azure.Documents.Client.dll" 2 #r "../packages/Newtonsoft.Json.4.5.11/lib/net40/Newtonsoft.Json.dll" 3 4 open System 5 open Microsoft.Azure.Documents 6 open Microsoft.Azure.Documents.Client 7 open Microsoft.Azure.Documents.Linq 8

And I was good to go.  The 1st thing the walk through does is to create a database:

 

1 let client = new DocumentClient(new Uri(endpointUrl), authKey) 2 let database = new Database() 3 database.Id <- "FamilyRegistry" 4 let requestOptions = new RequestOptions() 5 let response = client.CreateDatabaseAsync(database,requestOptions).Result 6

image

Interestingly, that new database does not show up in the Azure portal until you do a post back

image  image

which really surprised me –> I figured the new portal would use SignalR.  In any event, with the database created, I went to create a collection, which seems roughly analogous to a table in a RDBMS world:

 

1 let documentCollection = new DocumentCollection() 2 documentCollection.Id <- "FamilyCollection" 3 client.CreateDocumentCollectionAsync(database.CollectionsLink,documentCollection,requestOptions) 4

Unfortunately, I got a oh-so-helpful null ref

System.NullReferenceException: Object reference not set to an instance of an object.

   at Microsoft.Azure.Documents.Database.get_CollectionsLink()

   at <StartupCode$FSI_0007>.$FSI_0007.main@() in C:\Users\Dixon\Desktop\ChickenSoftware.DocumentDb.Solution\

ChickenSoftware.DocumentDb\Script.fsx:line 23

Stopped due to error

So, the CollectionsLink has to be populated, which begs the question “what the hell is a collections link?”  My first thought was to assign it a value

image

But no dice.  I then starting dotting the class and I found that there is not a response.CollectionsLink but there is a response.Resource.CollectionsLink

And sure enough, this did it.  I deleted the database on the azure portal and re-ran the create database, this time capturing the collectionsLink and now I could create a collection

1 let documentCollection = new DocumentCollection() 2 documentCollection.Id <- "FamilyCollection" 3 client.CreateDocumentCollectionAsync(response.Resource.CollectionsLink,documentCollection) 4

image

So now it is time to insert some data.  I went back to the walk-through, created some data structures, and attempted to insert them into the database:

1 type Parent = {firstName:string} 2 type Pet = {givenName:string} 3 type Child = {firstName:string; gender:string; grade: int; pets:Pet list} 4 type Address = {state:string; county:string; city:string} 5 type family = {id:string; lastName:string; parents: Parent list; children: Child list; address: Address; isRegistered:bool} 6 7 let andersenFamily = {id="AndersenFamily"; lastName="Andersen"; 8 parents=[{firstName="Thomas"};{firstName="Mary Kay"}]; 9 children=[{firstName="Henriette Thaulow";gender="female"; 10 grade=5;pets=[{givenName="Fluffy"}]}]; 11 address={state = "WA"; county = "King"; city = "Seattle"}; 12 isRegistered = true} 13 14 client.CreateDocumentAsync(documentCollection'.Resource.DocumentsLink, andersenFamily) 15

And it worked fine.  Note I still needed the documentsLink

image

And finally pulling the data out required both some sql and the documents link:

1 let queryString = "SELECT * FROM Families f WHERE f.id = \"AndersenFamily\"" 2 3 let families = client.CreateDocumentQuery(documentCollection'.Resource.DocumentsLink,queryString) 4 families |> Seq.iter(fun f -> printfn "read %A from SQL" f) 5

Gives us what we want

image

And if I only want 1 part of the results I thought to use seq.Map and case the results

1 let families = client.CreateDocumentQuery(documentCollection'.Resource.DocumentsLink,queryString) 2 families |> Seq.map(fun f -> f :?> family) 3 |> Seq.iter(fun f -> printfn "read %A from SQL" f.lastName) 4

But I am getting an exception, so I need to think about this more

System.InvalidCastException: Unable to cast object of type ‘Microsoft.Azure.Documents.QueryResult’ to type ‘family’.

   at Microsoft.FSharp.Core.LanguagePrimitives.IntrinsicFunctions.UnboxGeneric[T](Object source)

   at Microsoft.FSharp.Collections.IEnumerator.map@107.DoMoveNext(b& )

   at Microsoft.FSharp.Collections.IEnumerator.MapEnumerator`1.System-Collections-IEnumerator-MoveNext()

   at Microsoft.FSharp.Collections.SeqModule.Iterate[T](FSharpFunc`2 action, IEnumerable`1 source)

   at <StartupCode$FSI_0010>.$FSI_0010.main@()

Stopped due to error

 

In any event, one thing profoundly vexed me: “if I don’t have a document link to an existing database, how do I get documents out of the database?”  I started Googling around a bit and found this helpful post on Stack Overflow.

It is makes some sense then to use queries to traverse data base and collections by using queries –> esp because they are using linq.  I fired up a new script, put the stack overflow code in,

1 let client = new DocumentClient(new Uri(endpointUrl), authKey) 2 let database = client.CreateDatabaseQuery().Where(fun db -> db.Id = "FamilyRegistry" ).ToArray().FirstOrDefault() 3 printfn "%s" database.SelfLink

and wammo blamo:

image

I then went back to stack overflow to see if there was a more idiomatic way to interact with the documents and  Panagiotis Kanavos was kind of enough to answer my question here.  Of the different possibilities offered, I settled on this style:

1 let database = client.CreateDatabaseQuery() |> Seq.filter(fun db -> db.Id = "FamilyRegistry") 2 |> Seq.head 3 printfn "%s" database.SelfLink

And it works like a champ.

You can find the gist here

Moving Files Between Azure Blob Storage Using F#

Dear Future Jamie:

In case you forget (again) about how to move files from one container to another on Azure Blob Storage, here is the code:

1 //http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-blobs/#configure-access 2 3 #r "../packages/WindowsAzure.Storage.4.3.0/lib/net40/Microsoft.WindowsAzure.Storage.dll" 4 5 open Microsoft.WindowsAzure.Storage 6 open Microsoft.WindowsAzure.Storage.Auth 7 open Microsoft.WindowsAzure.Storage.Blob 8 open System.IO 9 10 let connectionString = "youconnectionStringHere" 11 let storageAccount = CloudStorageAccount.Parse(connectionString) 12 let blobClient = storageAccount.CreateCloudBlobClient() 13 14 let sourceContainer = blobClient.GetContainerReference("source") 15 let targetContainer = blobClient.GetContainerReference("target") 16 17 let copyBlob (sourceBlob:CloudBlockBlob) = 18 sourceBlob.FetchAttributes() 19 let blobName = sourceBlob.Name 20 let arrayLength = int sourceBlob.Properties.Length 21 let byteArray = Array.zeroCreate(arrayLength) 22 sourceBlob.DownloadToByteArray(byteArray,0) |> ignore 23 24 let targetBlob = targetContainer.GetBlockBlobReference(blobName) 25 targetBlob.UploadFromByteArray(byteArray,0,arrayLength) 26 27 let sourceBlobs = sourceContainer.ListBlobs() 28 sourceBlobs |> Seq.map(fun b -> b :?> CloudBlockBlob) 29 |> Seq.iter(fun b -> copyBlob b) 30 31 32 let result = targetContainer.ListBlobs() 33 Seq.length result

Love,

Jamie from Dec 2014

PS: You really should exercise more….

Introduction to (part of) IBM Watson

Recently, I joined the IBM Watson beta program (you can join too here) to see what it had to offer.  It looks like IBM is using the “Watson” word to cover a broad array of analytical and machine learning capabilities.  One area that Watson is used is to do statistical analysis without knowing any programming and/or statistics.  For example, I went into their portal and uploaded a new dataset that I just got from the Town Of Cary regarding traffic stops:

image image

image

I then hit the “New Exploration” button just to see what would happen and voila, I have graphs!

image 

image

image

 

So this is kind interesting, they seem to use both modeling sweeping and parameter sweeping and then use natural language questions to explore the dataset.  This is quite impressive as it allows someone to who nothing about statistics to ask questions and get answers.  I am not sure if there is a way to drill down into the models to tweet the questions nor does there look to be a way to consume the results.  Instead, it looks like a management dashboard.  So it is a bit like when you view the results of a dataset, they have taken it to the n degree.

I then went back and hit the “Create a Prediction” button

image

I picked a random y variable (“disposition) with the default values and voila, graphs:

image

Interestingly, it does some sweeping and it picked up that the PrimaryKey is correlated with date – which would make sense since the date is part of the PK value 🙂

image

In any event, I think this is a cool entry into the machine learning space from IBM.  They really have done a good job in making data science accessible.  Now, if they could put their weight into “Open Data” so there are lots of really cool datasets to analyze available, they would really position themselves well in an emerging market.  I can’t wait to dig in even more with  Watson…

Using IBM’s Watson With F#

 

I think everyone is aware of IBM’s Watson from its appearance on Jeopardy.  Apparently, IBM has made the Watson Api available for developers if you sign up here.  Well, there goes my Sunday morning!  I signed up and after one email confirm later, I was in. 
IBM has tied Watson to something called “Blue Mix”, which looks to be a full-service suite of applications from deployment to hosting .  When I looked at the api documentation here, I decided to use the language translation service as a good “hello world” project.  Looking at the api help page, I was hoping just to make a request and get a response with a auth token in the header, like every other api in the world.  However, the documentation really leads you down a path of installing the Watson Explorer on your local machine, and a create a blue mix project, etc.. 
Fortunately, the documentation has some pointers to other projects where people have made their own app.  I used thisthis one as a model and set up Fiddler like so

image

The authorization token is the username and password separated by a colon encoded to base 64.
Sure enough, a 200

image

Setting it up in #FSharp was a snap
1 #r @"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.5\System.Net.Http.dll" 2 #r @"..\packages\Microsoft.AspNet.WebApi.Client.5.2.2\lib\net45\System.Net.Http.Formatting.dll" 3 4 open System 5 open System.Net.Http 6 open System.Net.Http.Headers 7 open System.Net.Http.Formatting 8 open System.Collections.Generic 9 10 11 let serviceName = "machine_translation" 12 let baseUrl = "http://wex-mt.mybluemix.net/resources/translate" 13 let userName = "youNameHere@aol.com" 14 let password = "yourPasswordHere" 15 let authKey = userName + ":" + password 16 17 let client = new HttpClient() 18 client.DefaultRequestHeaders.Authorization <- new AuthenticationHeaderValue("Basic",authKey) 19 20 let input = new Dictionary<string,string>() 21 input.Add("text","This is a test") 22 input.Add("sid","mt-enus-eses") 23 let content = new FormUrlEncodedContent(input) 24 25 let result = client.PostAsync(baseUrl,content).Result 26 let resultContent = result.Content.ReadAsStringAsync().Result

And sure enough

image

 

You can see the gist here
So with that simple call/request under my belt, I decided to look at the api that everyone is talking about, the question/answer api.  I fired up Fiddler again and took a look at the docs.  After some tweaking of the Uri, I got a successful request/response:

image

image

With the answers to an empty question kind interesting. if not head-scratching:

image

So passing in a question:

image

image

So we are cooking with gas.  Back into FSI

1 #r @"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.5\System.Net.Http.dll" 2 #r @"..\packages\Microsoft.AspNet.WebApi.Client.5.2.2\lib\net45\System.Net.Http.Formatting.dll" 3 4 open System 5 open System.Net.Http 6 open System.Net.Http.Headers 7 open System.Net.Http.Formatting 8 open System.Collections.Generic 9 10 11 let baseUrl = "http://wex-qa.mybluemix.net/resources/question" 12 let userName = "yourName@aol.com" 13 let password = "yourCreds" 14 let authKey = userName + ":" + password 15 16 let client = new HttpClient() 17 client.DefaultRequestHeaders.Authorization <- new AuthenticationHeaderValue("Basic",authKey) 18 19 let input = new Dictionary<string,string>() 20 input.Add("question","what time is it") 21 let content = new FormUrlEncodedContent(input) 22 23 let result = client.PostAsync(baseUrl,content).Result 24 let resultContent = result.Content.ReadAsStringAsync().Result

With the result like so

image

And since it is Json coming back, why not use the type provider?

1 let client = new HttpClient() 2 client.DefaultRequestHeaders.Authorization <- new AuthenticationHeaderValue("Basic",authKey) 3 4 let input = new Dictionary<string,string>() 5 input.Add("question","How can I quit smoking") 6 let content = new FormUrlEncodedContent(input) 7 8 let result = client.PostAsync(baseUrl,content).Result 9 let resultContent = result.Content.ReadAsStringAsync().Result 10 11 type qaResponse = JsonProvider<".\QAResponseJson.json"> 12 let qaAnswer = qaResponse.Parse(resultContent) 13 14 qaAnswer.Question.Answers 15 |> Seq.ofArray 16 |> Seq.iter(fun a -> printfn "(%s)" a.Text)

Here is Watson’s response:

image

You can see the gist here

Predicting Physician Gender Using AzureML and F#

I am working with a couple of friends in a 2 week hackathon where the main subject is health care provider quality.  One of the datasets that we are using is the national registry of physician information found here.  One of the team members loaded it into Azure Sql Server and it is a dog.  It is a about 1 gig of data and takes a couple of minutes to scan the entire dataset.  I decided to take a small slice of the data (Connecticut physicians) and do some analysis on it .

My first step was to bring the data into AzureML via the Data Reader

image

Note that it took about 3 minutes to bring the data down.  I then saved this data as a local dataset to do my experiments:

image

I then fired up another experiment using the dataset as the base.  I first dragged in a Project Column module to only grab the columns I was interested in

image image

I then pulled in a Missing Values Scrubber module where I would drop any row where there was a value missing

image image

I then brought in a Metadata Editor module To change all of the fields to Categorical data types

image image

With the data ready to go, I created a 70/30 (train/test) split of the data and added a Multiclass Decision Forest model with Gender as the Dependent variable

image image

I then added a Score Model module and fed in the 30%.  I finally added an Evaluate Model module

 image

And the results were interesting, if not unsurprising:

image

Basically, if I know your age, your specialty, and your medical school, we can predict if you are a man 85% of the time.  Encouragingly, we can only do it 62% of the time for a woman.   I then published the experiment and created a quick script to consume the data:

1 #r @"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.5\System.Net.Http.dll" 2 #r @"..\packages\Microsoft.AspNet.WebApi.Client.5.2.2\lib\net45\System.Net.Http.Formatting.dll" 3 4 open System 5 open System.Net.Http 6 open System.Net.Http.Headers 7 open System.Net.Http.Formatting 8 open System.Collections.Generic 9 10 type scoreData = {FeatureVector:Dictionary<string,string>;GlobalParameters:Dictionary<string,string>} 11 type scoreRequest = {Id:string; Instance:scoreData} 12 13 let invokeService () = async { 14 let apiKey = "" 15 let uri = "https://ussouthcentral.services.azureml.net/workspaces/19a2e623b6a944a3a7f07c74b31c3b6d/services/6c4bbb43456e4d7e8a9196f2899f717d/score" 16 use client = new HttpClient() 17 client.DefaultRequestHeaders.Authorization <- new AuthenticationHeaderValue("Bearer",apiKey) 18 client.BaseAddress <- new Uri(uri) 19 20 let input = new Dictionary<string,string>() 21 input.Add("Gender","U") 22 input.Add("MedicalSchoolName","OTHER") 23 input.Add("GraduationYear","1995") 24 input.Add("PrimarySpecialty","INTERNAL MEDICINE") 25 26 let instance = {FeatureVector=input; GlobalParameters=new Dictionary<string,string>()} 27 let scoreRequest = {Id="score00001";Instance=instance} 28 29 let! response = client.PostAsJsonAsync("",scoreRequest) |> Async.AwaitTask 30 let! result = response.Content.ReadAsStringAsync() |> Async.AwaitTask 31 32 if response.IsSuccessStatusCode then 33 printfn "%s" result 34 else 35 printfn "FAILED: %s" result 36 response |> ignore 37 } 38 39 invokeService() |> Async.RunSynchronously

And I have a way of predicting genders:

U,OTHER,1995,INTERNAL MEDICINE,0.651031798112075,0.348968201887925,0,F