codethinked (kōdthĭngked) adj. To be consumed by or obsessed with code.

Immutability and tail recursion

I was in Amanda Laucher's F# talk yesterday and I asked about whether F# was able to tell if a function was pure or semi-pure. (A pure function is simply a function with no side effects, and a semi-pure function is one that only modifies locals) I was just a bit curious if F# was able to tell if there were any immutable variables that had been introduced into method, or if any mutable variables had been changed during a method. Well, I got my answer and then I made a statement that I probably definitely should have shut my mouth on. I essentially asked if introducing mutable variables would cause the F# compiler to not do its tail recursion optimization.

Yeah, I don't know what I smoked that morning. So, if you know all about tail calls and tail recursion optimization, then feel free to stop reading now. Otherwise, you can continue on to learn just how dumb of a statement that was. First of all let me just say that I have actually found a few blog posts where people have made the same mistake, and that makes me feel at least a *bit* better. I guess it is the fact that tail recursion optimization is usually associated with functional languages, since they mostly do looping with recursion, you would run into a lot of problems with even relatively small loops if you couldn't find a way to remove the recursion. But that doesn't really excuse me for not thinking about my question before I asked it.

So, anyways, we all know what recursion is. It is just when a method calls itself (below is direct recursion, there is also indirect recursion. I can't find a good link for a definition, so suffice to say that indirect recursion is when method A calls method B which in turn calls back to method A. More than two methods can be involved, but they don't have to be.)

static void increment(int n)
{            
    increment(n + 1);
}

Now, that method is infinitely recursive because we have no base case, but it is recursive nonetheless.

Some of us probably know what a tail-call is, it is when we make a call to a method at the very end of another method:

static void Method1()
{
    int i = 1 + 2;
    Method2();
}
 
static void Method2()
{
    Console.WriteLine("hello");
}

There are some optimizations that can be made when we are doing a tail call, mainly due to the fact that we can reuse the stack frame of the method we are calling from, since it is no longer needed. (Since there are no instructions after our method call).

Well, there is a little thing called tail recursion as well. And tail recursion is when your recursive call is a tail call. Our "increment" method above is actually tail recursive, since we are calling increment at the end of the method and there is nothing after it. But this method would be considered tail recursive as well:

static void increment(int n)
{            
    if (n > 500000000)
    {
        Console.WriteLine(n);
    }
    else
    {
        increment(n + 1);    
    }    
}

And you may even be surprised to know that this is considered tail recursive, since after increment nothing else will ever be called.

static void increment(int n)
{                
    if (n > 500000000)
    {
        Console.WriteLine(n);
    }
    else if (n > 1000000)
    {
        increment(n + 2);
    }
    else
    {
        increment(n + 1);    
    }    
}

To make it more clear, look at this in IL:

image

Now you can see that after each call to "increment" we are going to return. This way it is easy for the compiler to tell this is the last instruction we are going to execute. Now, the beauty of tail recursion, is that since the recursion happens at the end of our method, we can essentially eliminate the recursion by using a loop. This is why (as Amanda showed us in her F# presentation) the F# compiler will turn our original increment method into something that looks like this:

static void increment(int n)
{                
    while (true)
    {
        n++;
    }    
}

And when you run it in C# our original "increment" method will blow up with a StackOverflowException, right? Well, kinda. So, lets build our app and run it under Debug mode. Well, we quickly get this:

StackOverflowException

But, like a good programmer, we know that running things in Debug mode often disables many compiler optimizations. Sooooo, we change our build type to "Release" and try it again. Suddenly our app is now running, happily pegging one of my CPU's...

image

What is that you say? On your machine you still got a StackOverflowException? Well, maybe you should get a new machine, I think yours is broken! Ha ha, just kidding! There is a mystery afoot. What happened on my release build, that you are not seeing on your machine? Or maybe you are seeing it? Well, it happens that I am running Vista 64-bit, and if you are seeing this app run fine as well, then you too are running on 64-bit hardware. (In order to prove this to yourself if you are running on a 64-bit machine, you can switch your target CPU to x86, then run in release mode and the StackOverflowException will appear quite quickly!!)

As a side note, after I ran this and realized that I was getting tail recursion optimization, I set about to find out why. Since I did not write the .net runtime (although it would be cool if I did) I had to rely on info from a few different places, but a great one was this post by Jomo Fisher. A lot of the info below this point was gleaned from his post!

But why should that make any difference? Well, let us take a look at our compiled IL:

64-bit:

image

32-bit:

image

Hmmm, they look the same... I thought you said that the 64-bit release would optimize our tail recursion? Well, you have to remember that in managed code there is more than one chance for optimization. Optimizations can occur at build time when the IL is generated (hence what F# is doing) and optimizations can also be done by the JITer (Just In Time compiler) when the code is turned from IL into native machine code (which the 64-bit JITer is doing for us).

In order for us to tell that this is actually happening, lets add some code at the end of our method in order to stop the JITer from optimizing our tail recursion and see if we can get our StackOverflowException. So we add this change:

static void increment(int n)
{                    
    increment(n + 1);
    int i = 5;
    i = i + 2;
}

I had to add the second line because just putting in "int i = 5;" the compiler was smart enough to optimize it out because it wasn't being used. Tricky compiler. So, now that we have made this change our application now blows up with a StackOverflowException even when running under the 64-bit JITer!

So there you go, you can now see that tail recursion is completely useless in C#! Since the optimization is done by the JITer and not by the compiler, the code above will run on 64-bit machines, but will crash quickly on 32-bit machines. These are the stuff that nightmares are made of. Anyways, the F# compiler obviously had to do these optimizations for itself since tail recursion is much more prevalent in functional languages and they can't assume that you'll always be running under a 64-bit OS. Although you SHOULD be!

And with that, we are back to my original point for this article. Immutable types and tail recursion optimization have nothing to do with each other. There are some other optimizations that can be done when a method is pure (such as automatic parallelization), but nothing to do with tail recursion. Well, at least nothing that I could find, so please prove me wrong! Hope you enjoyed this post, and hopefully you aren't laughing at me too hard right now. :-)

Richmond Code Camp 2008 Tomorrow

I am giving a talk at the Richmond Code Camp 2008 tomorrow, and I am really excited/nervous. Oh, and other people are giving some presentations as well. My friend Kevin Hazzard is giving two presentations. One on calling WCF services with no proxy (called "Look Ma! No Proxy!") and a Silverlight 101 presentation. I will definitely be attending the "No Proxy" session, but I'm going to have to skip the Silverlight session for Amanda Laucher's F# session. Sorry Kevin! I am extremely interested in F#, but I have not had a chance to dig my teeth in.

So, I have to admit that this is the first presentation that I have ever given. So, if you are in my session, please go easy on me. I don't get too nervous in front of people, but it is easy for me to say this right now. I am giving my presentation on DI (Dependency Injection) and I decided to give my examples in Spring.net and Ninject. I must give a shout out to Nate Kohari, who has helped walked me through Ninject (his brain-child) and who did an amazing job creating it.

So, if you have wondered at all what happened to my blog posts, then if you are near Richmond you can come tomorrow and see the fruits of my post-less-ness. And if you aren't able to attend tomorrow, you can bet that I will turn my presentation into a series considering the insane amount of time that I have put into it. Well, I hope you can attend, and if not, then I hope you enjoy the future blog posts!

Digging into the Linq "let" keyword

In my last post I talked abou the Linq "let" keyword and showed you how it could be used to simplify linq queries. Well, this post was thrown up pretty quick and I didn't get to research the topic in the way that I normally do, and I quickly got called out by a few readers who pointed out that there are some implications with using the "let" keyword that may not be initially apparent.

In the last post I used a query that looked like this:

var names = (from p in nameList
            let vowels = new List<string> { "A", "E", "I", "O", "U" }
            let startsWithVowel = vowels.Any(v => p.ToUpper().StartsWith(v))
            let endsWithVowel = vowels.Any(v => p.ToUpper().EndsWith(v))
            let fourCharactersLong = p.Length == 4
            let fiveCharactersLong = p.Length == 5
            where
            (startsWithVowel || endsWithVowel) &&
            (fourCharactersLong || fiveCharactersLong)
            select p).ToList();

In this case the "let" keyword greatly simplifies what would have been an otherwise very ugly "where" clause. So, I started thinking, how can I show what happens when this query is turned into an Expression object? I first started off by using the Expression visualizer that ships in the samples with VS2008 (click for full size):

Query Expression

That might give you a hint as to what is happening, if you are familiar with compiler generated anonymous types, but otherwise that just looks like a bunch of gibberish. You have to open up the app in reflector to really see what is going on with these anonymous types. What this is essentially telling us is exactly what "Skup" in my last post was trying to tell me. That each time you use the "let" keyword it is just generating an anonymous type which wraps the variable assigned in the let statement along with the item being queried.

After thinking a bit I realized that just rewriting this query with explicit Linq methods would probably be the best way to explain this. There is no "let" query method, you have to accomplish all of this through select statements.

var names2 = nameList
    .Select(a => new { a, vowels = new List<string> { "A", "E", "I", "O", "U" } })
    .Select(b => new { b, startsWithVowel = b.vowels.Any(v => b.a.ToUpper().StartsWith(v)) })
    .Select(c => new { c, endsWithVowel = c.b.vowels.Any(v => c.b.a.ToUpper().EndsWith(v)) })
    .Select(d => new { d, fourCharactersLong = d.c.b.a.Length == 4 })
    .Select(e => new { e, fiveCharactersLong = e.d.c.b.a.Length == 5 })
    .Where(w => (w.e.d.c.startsWithVowel || w.e.d.endsWithVowel) 
              && (w.e.fourCharactersLong || w.fiveCharactersLong))
    .Select(s => s.e.d.c.b.a);

So, as you can see, the "let" keyword cleans this up quite a bit. But you can also see what is actually happening. You can also see why Frans said in my last post that when you are running this against Linq to Sql it is going to generate a sub-query for each "let" statement. I hope that this helps a bit when trying to decide whether or not to use the Linq "let" statement.

The Linq "let" keyword

I've been busy working on some presentations that I have coming up, and so I haven't had too much time to update, but I was messing around with Linq and I found myself using the "let" keyword quite a bit to make some of my queries more readable. So I decided to do a small write-up so that you can see the joy of using the "let" keyword.

Lets say we have a small set of data that looks like this:

var nameList = new List<string>
                   {
                       "Matt",
                       "Adam",
                       "John",
                       "Peter",
                       "Owen",
                       "Steve",
                       "Richard",
                       "Chris"
                   };

And we have a method where we need to return a list of names that start or end with vowels and are either 4 or 5 characters long. So, our inexperienced Linq developer quickly codes up a query that looks like this (yes, I know the "ToUpper" is ugly, but I'm not changing it now :) ):

var vowels = new List<string> {"A", "E", "I", "O", "U"};
var names = (from p in nameList                                                  
            where
            (vowels.Any(v => p.ToUpper().StartsWith(v)) 
            || vowels.Any(v => p.ToUpper().EndsWith(v))) &&
            (p.Length == 4 
            || p.Length == 5)
            select p).ToList();

First we have to define our List of vowels, since we are using it twice. Then we code up our query just like we would if we were writing a sql statement. The only problem is that by just looking at this query it is pretty hard to tell what it is doing. But what if we had a way to break up the query so that we could make its purpose more clear? Well, thankfully we do!

var names = (from p in nameList
            let vowels = new List<string> { "A", "E", "I", "O", "U" }
            let startsWithVowel = vowels.Any(v => p.ToUpper().StartsWith(v))
            let endsWithVowel = vowels.Any(v => p.ToUpper().EndsWith(v))
            let fourCharactersLong = p.Length == 4
            let fiveCharactersLong = p.Length == 5
            where
            (startsWithVowel || endsWithVowel) &&
            (fourCharactersLong || fiveCharactersLong)
            select p).ToList();

Now, doesn't that look better? We define four intermediate variables that hold our vowels and booleans for our tests, then in the where clause we just check our boolean values. The result is a where clause that is very readable. There are tons more uses for "let", but I hope that you start using it in your queries to make them more readable.

Rob Conery's MVC Storefront Application

In case you have not seen, Rob Conery is currently working on a screencast series about building a simple e-commerce app using the MVC framework. He has received a bit of harsh criticism about his use of TDD (especially on twitter) from a few outspoken people in the community, and so I wanted to throw this post up to show a little solidarity. He clearly stated he was not an expert, and now after harsh criticism he has reached out for help with the series. Rob dealt with this in a calm and professional manner, which is something that I'm not even sure I could have done after a few of the comments that I saw. So, if you haven't seen it yet, go check it out, and go let Rob know that you support his efforts and appreciate the hard work he is doing.