Big (Random) Generator
February 1, 2011 Leave a comment
I needed to create a random value generator for working on the Parallel Extension Labs that I blogged about here. The class that the lab has is pretty straight forward:
public class Employee
{
public int Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public string Address { get; set; }
public DateTime HireDate { get; set; }
public override string ToString()
{
return string.Format("Employee{0}: {1} {2} Hired:{3}", Id, FirstName, LastName, HireDate.ToShortDateString());
}
}
(I added the ToString() as a convience). I decided that I would create a WCF Service to provide the data – primarily because I haven’t worked with WCF in 4.0 at all.
So, I created a WCF Service Application, added it to Source Control with my typical 3 branching strategy, and then published it to provider. Everything deployed correctly so I dug into the actual service.
The Service returns native .NET types (Strings, DateTimes, and Guids) as well a Person and Employee classes:
Each of the values need to be random yet close enough to be plausible. I started with Phone Number:
public List<string> GetPhoneNumbers(int numberOfPhoneNumbers)
{
List<string> phoneNumbers = new List<string>();
System.Random random = new System.Random();
int areaCodeNumber = 0;
int prefixNumber = 0;
int suffixNumber = 0;
for (int i = 0; i < numberOfPhoneNumbers; i++)
{
areaCodeNumber = random.Next(100,999);
prefixNumber = random.Next(100,999);
suffixNumber = random.Next(1000,9999);
phoneNumbers.Add(String.Format("{0}-{1}-{2}",
areaCodeNumber, prefixNumber, suffixNumber));
}
return phoneNumbers;
And for the singular:
public string GetPhoneNumber()
{
return GetPhoneNumbers(1).First();
I used this Collection/Singular pattern throughout the service. In addition, I implemented the singular consistently: create the plural and then take the first.
I then added some Unit Tests for each of my methods:
[TestMethod()]
public void GetPhoneNumberTest()
{
string notExpected = string.Empty;
string actual = randomFactory.GetPhoneNumber();
Assert.AreNotEqual(notExpected, actual);
}
[TestMethod()]
public void GetPhoneNumbersTest()
{
int expected = 3;
int actual = randomFactory.GetPhoneNumbers(3).Count;
Assert.AreEqual(expected, actual);
This pattern of testing was also applied consistently across all of the methods.
Once I had the easy mathods done (Get Phone Number, Get Dates, etc..), I tackled the methods that required external data. To generate random names, I started with the US Census where I downloaded the first and last names into an MSAccess database. I then turned around and put the data into a SQL Server database on WinHost. BTW: I ran into this problem, took me 30 minutes to figure it out). Once the data was in the database, I could fire up EF:
The data is composed of actual names, the Frequency that they appear in America, the Cumulative Frequency that each name contains, and the rank of popularity:
(OT: my daughter wrote this:
Who knew?)
Anyway, I then created a method that pulls the records from database below the prevalence of the name and then returns a certain number of the records randomally:
public List<string> GetLastNames(int numberOfNames, int pervalence)
{
if (pervalence > 100 || pervalence < 0)
throw new ArgumentOutOfRangeException("’Pervalence’ needs to be between 0 and 100.");
List<string> lastNames = new List<string>();
var context = new Tff.Random.tffEntities();
List<Census_LastName> lastNameQuery = (from lastName in context.Census_LastName
where lastName.CumlFrequency < pervalence
select lastName).ToList<Census_LastName>();
System.Random random = new System.Random();
int randomIndex = 0;
Census_LastName selectedLastName = null;
for (int i = 0; i < numberOfNames; i++)
{
randomIndex = random.Next(1, lastNameQuery.Count);
selectedLastName = lastNameQuery[randomIndex-1];
lastNames.Add(selectedLastName.LastName);
}
return lastNames;
I am not happy with this implementation – I will add Parallelism to this to speed up the processing later – and I might implement a .Random() extension method to the LINQ. In any event, the data came back and my unit tests passed. I then implemented a similar method for the male and female first names.
With the names out of the way, I need to figure out street addresses. I first thought about using Google’s reverse GPS mapping API and throwing in random GPS coordinates like this:
string uri = @"http://maps.googleapis.com/maps/api/geocode/xml?latlng=40.714224,-73.961452&sensor=false";
WebRequest request = WebRequest.Create(uri);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
string responseFromServer = reader.ReadToEnd();
Console.WriteLine(responseFromServer);
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(responseFromServer);
XmlNodeList xmlNodeList = xmlDocument.GetElementsByTagName("formatted_address");
string address = xmlNodeList[0].InnerText;
reader.Close();
dataStream.Close();
response.Close();
The problem is that I don’t know exact coordinates so I would have to keep generating random ones until I got a hit – which means I would limit my search to a major metro area (doing this in a low-density state would mean many,many requests to find an actual street address). Also, I would have the danger of actually using a real address. Finally, Google limtis the number of requests per day, so I would be throttled – esp with a shotgun approach.
Instead, I went back to the census and found a data table with lots (not all) zip codes, cities, and states. I then realized all I had to do was create a fake street number – easy enough, a fake street name using the last name table, and a random zip code. Volia: a plausible yet random address.
Here is the EF Class:
And here is the code (split across 3 functions):
public List<string> GetStreetAddresses(int numberOfAddresses)
{
List<string> streetAddresses = new List<string>();
List<string> streetNames = GetLastNames(numberOfAddresses, 100);
List<string> streetSuffixs = GetRandomStreetSuffixs(numberOfAddresses);
List<string> zipCodes = GetZipCodes(numberOfAddresses);
string streetNumber = string.Empty;
System.Random random = new System.Random();
for (int i = 0; i < numberOfAddresses; i++)
{
streetNumber = random.Next(10, 999).ToString();
streetAddresses.Add(String.Format("{0} {1} {2} {3}", streetNumber, streetNames[i], streetSuffixs[i], zipCodes[i]));
}
return streetAddresses;
And:
private List<string> GetZipCodes(int numberOfZipCodes)
{
List<string> zipCodes = new List<string>();
var context = new Tff.Random.tffEntities();
List<Census_ZipCode> zipCodeQuery = (from zipCode in context.Census_ZipCode
select zipCode).ToList<Census_ZipCode>();
System.Random random = new System.Random();
int randomIndex = 0;
Census_ZipCode selectZipCode = null;
for (int i = 0; i < numberOfZipCodes; i++)
{
randomIndex = random.Next(1, zipCodeQuery.Count);
selectZipCode = zipCodeQuery[randomIndex-1];
zipCodes.Add(String.Format("{0}, {1} {2}", selectZipCode.City, selectZipCode.StateAbbreviation, selectZipCode.ZipCode));
}
return zipCodes;
Finally:
private List<string> GetRandomStreetSuffixs(int numberOfSuffixes)
{
List<String> suffixes = new List<string>();
List<string> returnValue = new List<string>();
suffixes.Add("STREET");
suffixes.Add("ROAD");
suffixes.Add("DRIVE");
suffixes.Add("WAY");
suffixes.Add("CIRCLE");
System.Random random = new System.Random();
int randomIndex = 0;
for(int i=0; i < numberOfSuffixes; i++)
{
randomIndex = random.Next(1,suffixes.Count);
returnValue.Add(suffixes[randomIndex-1]);
}
return returnValue;
Now, when you hit the service, you can get a plausible yet totally fake dataset of people and employees:
Random.RandomFactoryClient client = new Random.RandomFactoryClient();
List<Random.Employee> employees = client.GetEmployees(20, 50, Random.Gender.Both, 10);
for (int i = 0; i < employees.Count; i++)
{
Add(new Employee
{
Id = i,
FirstName = employees[i].FirstName,
LastName = employees[i].LastName,
HireDate = employees[i].HireDate,
Address = employees[i].StreetAddress
});
Spit out to the Console:
In case you want to use the service, you can find it here.
VERY IMPORTANT: I set the return values to be of type List<T>. I know this is considered bad practice from an interoperability standpoint. If you are using VS2010 and you want to consume the service, make sure you do this when you attach to the reference:
Results may vary.