Adventures in Linq, System.IO.FileInfo, and Unit Testing
February 26, 2010 1 Comment
I am working with another developer on a team website. The shared server that we use has recently been running out of disk space and we don’t know why. I whipped up a quick console program to help figure out what is going on. My first thought was to create a static List of FileInfo in my program and then populate it via a recursive method. I coded it like so:
static List<FileInfo> fileInfos = new List<FileInfo>();
private static void SearchDirectory(string directoryName)
{
DirectoryInfo directoryInfo = new DirectoryInfo(directoryName);
foreach (FileInfo fileInfo in directoryInfo.GetFiles())
{
fileInfos.Add(fileInfo);
}
foreach (DirectoryInfo subDirectoryInfo in directoryInfo.GetDirectories())
{
SearchDirectory(subDirectoryInfo.FullName);
}
}
And then I sorted the fileInfos object via some LINQ:
private static void SortDirectoryInfo()
{
var fileSort = (from fi in fileInfos
orderby fi.Length descending
select fi);
int i = 0;
foreach (FileInfo fileInfo in fileSort)
{
if (i < 40)
{
Console.WriteLine(fileInfo.Name + ":" +fileInfo.Length.ToString());
}
i++;
}
}
This actually worked (and the culprit was the TFS server consuming too much disk space) but it is obviously not good code and does not take advantage of the power of C# 3.0.
My first attempt to clean up the code was to write a unit test for both methods. I changed the scope of the methods to public and then tried to generate the unit test. I then ran into the problem with brittle code – the SortDirectoryInfo method depends on an external class that may or may not exist. Note that I did not even check to see if it exists in the method. So I re-wrote the method to take a parameter of a List of FileInfos. Doing so meant that I could drop the global fileInfos object and stick it into the Main method. I also had to change the SearchDirectory to take in that fileInfos object.
public static void SortDirectoryInfo(List<FileInfo> fileInfos)
{
…Implementation
}
I then generated some Unit Tests and quickly ran into the problem of combining Unit Tests with Integration Tests. Without an adequate Mocking framework, what is a good test? I went with the least common denominator principle – have a directory that I know has, at least, 1 file. The test looks like this:
[TestMethod()]
public void SearchDirectoryInProgramFilesTestReturnsAtLeastOneFile()
{
string directoryName = @"C:\Program Files";
List<FileInfo> fileInfos = new List<FileInfo>();
Program.SearchDirectory(directoryName, fileInfos);
int notExpected = 0;
int actual = fileInfos.Count;
Assert.AreNotEqual(notExpected, actual);
}
This seems like a good 1st pass. I have no idea if the data coming back is right (or if the subdirectory search is working) so I probably need tests to those conditions such as:
· SearchDirectoryInNoFileBaseButFilesInSubdirectoryReturnsAtLeastOneFile
· SearchDirectoryInNoFileBaseandNoFileSubdirectoryReturnsNoFiles
With this test under my belt (it passed) I went about cleaning up my code. I first wanted to remove the recursion because the GetFiles() has an overload that does exactly that and I would rather have Microsoft do something than me. I ripped out the code in the function and replaced it with this:
public static void SearchDirectory(string directoryName, List<FileInfo> fileInfos)
{
DirectoryInfo directoryInfo = new DirectoryInfo(directoryName);
foreach (FileInfo fileInfo in directoryInfo.GetFiles("*.*",
SearchOption.AllDirectories))
{
fileInfos.Add(fileInfo);
}
}
Good news, the test still passed
I then wanted to get rid of that magic number variable in the SortDirectoryInfo function. I changed the LINQ to this :
var fileSort = (from fi in fileInfos
orderby fi.Length descending
select fi).Take(20);
And my tests still passed (tell me again why I wrote code before using Unit Tests)?
Feeling good, I then took aim and the SearchDirectory and SortDirectory functions were separate. I know that Martin et al think that functions should only do 1 thing but in this case, I can use LINQ to directly access the fileInfos without that intermediate variable and function, the code becomes more maintainable. I re-wrote the function to be this
public static void SearchDirectory(string directoryName)
{
DirectoryInfo directoryInfo = new DirectoryInfo(directoryName);
var fileSort = (from fi in directoryInfo.GetFiles("*.*", SearchOption.AllDirectories)
orderby fi.Length descending
select fi).Take(20);
foreach (FileInfo fileInfo in fileSort)
{
Console.WriteLine(fileInfo.Name + ":" + fileInfo.Length.ToString());
}
}
and then realized that my code was no longer testable. I looked that the function and surmised that it is doing two things – just not the two things I originally thought. It is calculating some data AND printing it to the console. To make my code more testable, I separated those two functions
public static IEnumerable<FileInfo> SearchDirectory(string directoryName)
{
DirectoryInfo directoryInfo = new DirectoryInfo(directoryName);
IEnumerable<FileInfo> fileSort = (from fi in directoryInfo.GetFiles("*.*",
SearchOption.AllDirectories)
orderby fi.Length descending
select fi).Take(20);
return fileSort;
}
And
private static void PrintResults(IEnumerable<FileInfo> fileInfos)
{
foreach (FileInfo fileInfo in fileInfos)
{
Console.WriteLine(fileInfo.Name + ":" + fileInfo.Length.ToString());
}
}
With the calling code as so:
static void Main(string[] args)
{
Console.WriteLine("—Start—");
IEnumerable<FileInfo> fileInfos = SearchDirectory(@"C:\Program Files");
PrintResults(fileInfos);
Console.WriteLine("—-End—-");
Console.ReadLine();
}
Note that I changed the return value from var (me being lazy) to the actual class – in this case it is IEnumerable of type FileInfo. The last challenge was thinking of good Unit Tests for the return value – IEnumerable<FileInfo>. I used a for..each and make a counting variable but I wonder if there is a different pattern that I could use to make a cleaner test.
Hahah – that was awesome!! This is a great testimonial for Linq and for unit testing too!!