codethinked (kōdthĭngked) adj. To be consumed by or obsessed with code.

Why asp.net MVC is so awesome.

Because all you have to do is put it in the title of your blog post and you'll be on the front page of dotnetkicks no matter how bad your post is!

Update: Apparently I struck a cord with a few people who decided that I had in some way personally offended them with this post. Let me just say that I do think that asp.net MVC is a good thing (and certainly a step forward) and I was merely pointing out the obsession with asp.net MVC on dotnetkicks. Oh, and I had to turn on comment moderation because someone decided that they were going to keep trying to post XSS attacks over and over in my comments. Probably the same person that tagged this story on dotnetkicks with ThisSiteSux0rz.  Some people need to just put the torch down and step away. :-)

Seeing the Future in ParallelFX

One of the neat features that ParallelFX has is tasks, they are simply wrappers that allow you to easily start parallel operations. To create a task all you have to do is this:

  Task task = Task.Create(n => LongRunningMethod(test));

The "n" parameter that you see is just the state that you can pass to the task. They provide an overload for you to pass this if you would want to, like this (oh and C# should have true optional parameters, I don't care what Anders thinks, overloads are a messy solution </rant>):

  Object state = new Object();
  Task task = Task.Create(n => LongRunningMethod(test), state);

Tasks, once created, start running immediately assuming that a thread is available for it to run on. (ParallelFX defaults the number of threads to the number of processors on your box) You can hook up an event that will allow the Task to notify your application when it has finished or you can also check its "IsCompleted" property to wait for it to finish.

  task.Completed += TaskCompleted;
 
  while (!task.IsCompleted)
  {
    Thread.Sleep(1000);
  }

So, what happens when you need to fire off a Task and then get a resulting value from the Task but you don't know when the Task will finish? Well, this is when you use a Future. A Future (which surprisingly implements the Future pattern) is simply a Task that is wrapped with a result value property that waits for the Task to be completed.

  Future<string> future = 
    Future.Create<string>(() => LongRunningMethod(test));
 
  string result = future.Value;

In this code we will fire off a separate thread running "LongRunningMethod" and then I will get the value for it. If the thread is not done when I request the return value then the call to Value will wait for the thread to finish before continuing.

While playing with this code I realized that the Future class provided no overload for passing state into a Future. So I promptly went to the MSDN forums and asked if they were planning on implementing this. I did get a response and they said that they are certainly considering it. I sure hope so! That would make something like this possible:

  string[] names = { "Richmond, VA", "Washington, DC", "Boston, MA",
                "Los Angeles CA", "Las Vegas, NV", "Seattle, WA" };
 
  List<string> results = new List<string>();
  List<Future<string>> futures = new List<Future<string>>();
  foreach (string name in names)
  {      
    futures.Add(
      Future.Create<string>(n => LongRunningMethod((string)n), name));
  }
 
  foreach (Future<string> future in futures)
  {
    results.Add(future.Value);
  }

Of course you can also accomplish this task with Parallel Linq (another part of ParallelFX), but that is for another post! One last thing to add is that if you need to tell ParallelFX how many threads to use, you can accomplish it like this:

  TaskManagerPolicy policy = new TaskManagerPolicy(0, 10);
  TaskManager manager = new TaskManager(policy);
 
  Future<string>.Create(() => LongRunningMethod("value"), manager);

That code would tell ParallelFX to use a minimum of 0 threads with an ideal thread count of 10.

Well, this post is pretty short, but Tasks and Futures really are that simple. Parallel programming has never been easier!

To close or not to close, that is the question.

I saw someone (identity withheld) today on twitter besmirching Microsoft for making some classes in the .net framework internal. Some people (like Steve Harman) will say that you should make most everything public and virtual, and you know what, I pretty much agree with that (and I certainly respect Steve Harman as a developer, I love his blog). Unless you are dealing with nuclear reactors or pacemakers I again agree with Steve’s quote that you should be “giving sharp tools to sharp people” or as Guido van Rossum put it "we are all adults." In other words, don’t tie someone’s hands just because you think they might choke themselves with them. The problem is that there is a difference between tying someone hands and forcing them to think before they expose public methods on their classes.

On a publicly exposed class I see no problems at all with making almost all methods protected and virtual as long as you understand that someone somewhere is going to blow your class up in all sorts of spectacular ways. As I said in this post we all write crappy code and so it is likely that this person will one day be you. But again, I see no problem with giving developers powerful tools with which they can wreak havoc, but why would you want to litter your object with public methods that may not be needed? Why not allow people who want to inherit from your class to override your methods and even provide public methods to wrap these methods, without polluting the public interface of the class that they are inheriting from? Setting all methods public not only binds the framework developer into an interface and makes the objects harder to use because now we have to wade through a ton of public methods trying to find the ones that we want to call. And most likely you won't want people to call a lot of methods at all, and I would argue that if you don't have lots of methods in classes that you don't think anyone would want to call then likely you haven't broken up your code very well.

But so far all we have talked about is methods, so why don’t I think that all of the internal classes in the .net framework should be public? Well, because these sharp tools we are given allow us to wreak havoc and in most cases we want to be able wreak the same havoc across multiple versions of the framework. Let me reiterate again that I really don't like the idea of tying the hands of the developer, but I am all for making the developer think before they do something. I shouldn't keep the developer from doing something dangerous, but I should make them think before they do it.

One of these internal classes that I came across recently was ExpressionVisitor. This class is used extensively in the Linq namespace for walking expression trees and I needed similar functionality in my code for when I was building my Linq To SimpleDB app. At first I thought that it was a bit ridiculous that they didn’t expose this class for my use, but in reality it is an implementation detail. If they exposed this class for my use, then I would expect in the next version of the .net framework that this class would be there and would have relatively few (if any) breaking changes.

And therein lies the problem. In a public framework there are certain expectations that must be met for stability and maintainability. If Microsoft had exposed this class and then later on decided that the current implementation wasn’t sufficient and that they needed to fundamentally change the way in which the expression trees are traversed then they would have two choices.

1) Change the class and throw caution to the wind. This will cause anyone who has implemented this class to also change their implementation.

2) Leave the class in the framework and then add a new class that will be used in the future. I dub this class BetterExpressionVisitor.

So, what is the best choice? I guess that all depends on what is important to you. In my opinion, neither is a good choice. Choice number 1 leaves developers who used your class high and dry. How much does everyone complain when Microsoft drops support for a feature in an application? A lot. How much do people complain when they change the way the start menu looks? A lot. How much will people complain if they modify a class that thousands of developers are actively using? Well, I think you get the pattern here. Change number 1 will wreak havoc with lots of people and produce lots of whiney forum posts.

So, what about choice 2? Well, choice 2 is almost as bad in my opinion. Who wants a framework littered with obsolete code and cruft from past lives? If you want that, then by all means go use Java or Perl. (Oooooooooooh snap) But seriously, Microsoft completely invented a feature in .net 3.5 (extension methods) solely so that they would not have to heavily alter existing classes. Was *that* a good decision? I’ll leave that for you to decide, but in my opinion it was important for them to maintain compatibility between versions. And not just for this version, but if they added five thousand public methods on the ICollection interface then they would have to carry this through for version after version. I think that backward compatibility can be just as important, if not more so, in dynamic languages where people have a tendency to write code that relies on internal details of classes (see monkey patching).

So there you have it, I honestly think that most internal classes are not meant for public consumption should not be exposed just because someone somewhere *might* want to use them. If they want them that bad then they should just use reflector and steal borrow them. That way Microsoft can version their framework however they want and you won’t have to worry about anything breaking in the future because the class is now part of your codebase. Now, I know that this won’t work in all cases, but for crap’s sake people, can’t we just all get along? So, let me know what you think, should Microsoft make all framework classes public and just try to make as much backward compatible as possible or should they keep as much of their implementation internal so that developers won’t have dependencies on code that could change in the future?

I'm soooo late to the game (twitter)

Ummm, so this is almost embarrassing, but I have just recently started using Twitter. Yeah, I know, old skool. I exude lameness. In my own defense I tried it out a long time ago, but I thought that whole web interface was lame, so I never kept up with it. But now there is about 1.2 million desktop clients for it, and so I downloaded Witty and now I am using it. (Oh, and Witty crashes for me, a lot) So, if you think that there is some possibility that I might say something smart or insightful, feel free to follow me. So, go here to visit my twitter page.

Just Do it! Parallel.Do in ParallelFX

The other night while having the geek dinner I was speaking with a colleague that said they had read about ParallelFX on my blog, but wasn't really sure what use they had for it in their environment. Well, it isn't always about high performance computing and matrix multiplication, sometimes it is about something as simple as sorting strings or making web service calls. So, you know what, I decided to sort some strings. Well, there is an implementation of quicksort on the array class, but there is no parallel implementation of it. So, I first decided that I needed to implement a parallel quicksort, which is actually quite easy. Since quicksort is a recursive algorithm that forks, it is almost as if it is built for being paralleled.

If you don't remember exactly how the quicksort algorithm works, here is a partial listing that shows the main QuickSort method. As you can see we start with the partition method, which picks a value and pushes all values above that values to the right, and then pushes everything below it to the left. Then we just recursively call Quicksort for both sides and continue on getting smaller and smaller until everything is sorted. All we have to do is make these two recursive QuickSort calls in parallel and the algorithm just works. How easy is that?

Quicksort

So, how would we do that with ParallelFX? Well, my first thought was to use Tasks, which are objects that can be created through ParallelFX that allow you to run Action delegates. For example, if I had a method that I needed to call which was named "DoSomething" I could run it on a separate thread like this:

  var task = System.Threading.Tasks.Task.Create(() => DoSomething());

But then I would have had to fire off two tasks and then write code to wait until they finished, but it turns out that ParallelFX already provides an easy way to do this. It is called "Parallel.Do". Using "Parallel.Do" the above code would look like this:

  System.Threading.Parallel.Do(() => DoSomething());

But the best part comes in when you do this:

  System.Threading.Parallel.Do(() => DoSomething(), 
                            () => DoSomethingElse());

This allows us to pass as many actions as we like into this method, and it doesn't return until they are all done. Exactly what we needed. So the quicksort above will now look like this (I also made it generic):

public void QuickSort<T>(T[] list, int left, int right)
    where T : IComparable<T>
{            
  int partitionIndex = partition(list, left, right);
 
  Parallel.Do(
    () => QuickSort(list, left, partitionIndex - 1),
    () => QuickSort(list, partitionIndex + 1, right));            
}

Parallel.Do also tries to reuse current threads, so (by default) you'll never use up more threads than the number of processors in your system. So, what do the numbers look like when sorting arrays of strings? Well, I created code that would randomly generate strings from 10 to 30 in length and then I populated 6 different arrays with 1000 of them. I then ran the single threaded QuickSort three times and then I run the ParallelQuicksort three times. With 1000 items, here is the number of milliseconds that the sorts took.

1000 items

So, obviously with 1000 items the differences are negligible. So, lets bump this up to 10,000 items.

10000 items

At 10,000 items we are already starting to see some significant differences. So, lets bump it up one more time to 100,000 items.

100000 items

So, there you have it. You can see a pretty good performance gain from this algorithm, but with the QuickSort algorithm it is extremely important that you pick a good pivot value so that you split up your data set as evenly as possible when you first start off. The algorithm that I am using uses the "median of 3" method of picking a pivot values. It takes the first item, the last item, and the middle item in the array, and then picks the one in the middle. If you are dealing with large sets of data, and considering the fact that we are running this in parallel, it may make sense to spend even more time trying to find a good pivot point.

So, now that I showed you that you can sort a bunch of strings faster (or integers, or dates, etc...), how about making long running web service calls? If I have 10 web service calls that I need to make, and I only have two processors in my box then ParallelFx will only use 2 threads, right? That would be more efficient, but nowhere near as efficient as it could be. And yes, that is true, but that is just the default, we can go a step further with this and tell ParallelFX how many threads to use.

  Action[] actions = {   CallWebservice1(),
                        CallWebservice2(),
                        CallWebservice3(),
                        CallWebservice4(),
                        CallWebservice5()};
 
  //1 for MinThreads, array length for IdealThreads
  var policy = new TaskManagerPolicy(1, actions.Length);
  TaskManager manager = new TaskManager(policy);
  Parallel.Do(actions, manager, TaskCreationOptions.None);

Now this is a bit of a contrived example, because you have to know ahead of time what all of your calls are going to be. You cannot easily pass information into any of these calls and I'll show you why. Lets say we have some code like this, which passes in a list of cities:

  string[] names = { "Richmond, VA", "Washington, DC", "Boston, MA",
                "Los Angeles CA", "Las Vegas, NV", "Seattle, WA" };
 
  Action[] actions = new Action[names.Length];
  float[] results = new float[names.Length];
 
  for (int i = 0; i < names.Length; i++)
  {
      actions[i] = () => CallWebService(names[i]);
  }
 
  //1 for MinThreads, array length for IdealThreads
  var policy = new TaskManagerPolicy(1, names.Length);
  TaskManager manager = new TaskManager(policy);
  Parallel.Do(actions, manager, TaskCreationOptions.None);

So, you can see that we have "CallWebService" and we are passing in "names[i]". When "Parallel.Do" is actually called we end up with an IndexOutOfRange exception! Why is that? Well, you can see that we are putting our call to "CallWebService" inside of a parameterless lambda. This creates a closure which binds to the surrounding local variables, and this includes the array index variable. So, by the time we call "Parallel.Do" the value for "i" is now set to one past the length of our array. For performing an operation like this you are going to be better off using "AsParallel" with "ForAll" like this:

  names.AsParallel(names.Length).ForAll(name => CallWebService(name));

What is happening here is that we are using "AsParallel" to get an IParallelEnumerable, and we are passing in names.Length to the "DegreeOfParallelism" parameter. This tells our enumerable to use the same number of threads as there are items in our array. If we need to get results back from this, then we can call "Select" instead of "ForAll". We also need to maintain order so that our results can be coorelated with our data. (This may or may not be important to you)

  var results = names.AsParallel
    (ParallelQueryOptions.PreserveOrdering, names.Length)
    .Select(name => CallWebService(name));

So, here you can see that we are passing the "PreserveOrdering" parameter as well as the number of threads, and then we call Select passing in our array item and the results are returned to our "results" variable as another IParallelEnumerable.

So, now you can see how you would use ParallelFX to operate on an array of values, but this could easily be used for anything that supports IEnumerable. You have also seen an example of passing data to a web service, but you could also use this for long running database calls or any other long running process. Hopefully you have found this interesting and it helps you out.