IEnumerable ForEach extension method

List<T> has a method called ForEach that takes an Action<T> delegate, and I wanted one for IEnumerable. I also had someone ask about it in my previous post. It wasn't hard to write, but I figured I would throw it up here for future reference and also in case anyone needed help getting theirs working. If anyone notices anything I did that was dumb you can give me feedback as well. I believe I actually implemented something similar to this a while back, but anyways... without further ado...

public static void ForEach<T>(this IEnumerable<T> enumerable, Action<T> action)
{
    if (enumerable == null)
        throw new ArgumentNullException("enumerable");
 
    if (action == null)
        throw new ArgumentNullException("action");
 
    foreach (T item in enumerable)
    {
        action(item);
    }
}

Hope it helps.

The Linq SelectMany Operator

Most of you by now are familiar with LINQ, Microsoft's foray into crossing the code/data impedance mismatch. Most of what we see in Linq translates directly into our knowledge of SQL, since most LINQ queries use very similar semantics to SQL queries. There are certain operators though that don't look familiar because they either weren't present in our SQL lexicon or they were represented in a fundamentally different way. A little bit back I posted about one of these operators, the "let" operator, and how to use it effectively. I later followed it up with a post that dug a little bit deeper into the "let" operator so that you could get a peek at what was going on behind the scenes.

Today I introduce to you another one of these operators that is even a bit more foreign to those coming from a SQL background, it is the SelectMany operator. This operator doesn't have an explicit native syntax when dealing with LINQ queries, so we will first go over the operator using the LINQ extension methods and then we will show you how to achieve the same effect using the native C# LINQ syntax. Sound good? Good.

The SelectMany operator is described on MSDN as "Projects each element of a sequence to an IEnumerable<T> and flattens the resulting sequences into one sequence." Well, that is actually a pretty good description, but until you see it work, it is hard to visualize it. Well, at least it was for me!

In order to first give an example of this operator we will start off with the same list of names that I used in my previous post about the "let" operator and we will select the list with both the "Select" and the SelectMany operator to see what happens.

var nameList = new List<string>
                   {
                       "Matt",
                       "Adam",
                       "John",
                       "Peter",
                       "Owen",
                       "Steve",
                       "Richard",
                       "Chris"
                   };
 
var names1 = nameList.Where(n => n.Length == 4)
    .Select(n => n).ToList();
 
names1.ForEach(n => Console.WriteLine(n));
 
var names2 = nameList.Where(n => n.Length == 4)
    .SelectMany(n => n).ToList();                            
 
names2.ForEach(n => Console.WriteLine(n));

So, here you see the same two queries with only the Select and SelectMany swapped out. Then we write the results out to the console, so, what happens when this is executed (with a bit extra formatting code)?

image

What just happened there? The Select just returned the list of names that we filtered out. In fact, the select is entirely unneeded. If we left it off, then the query would execute identically, I only put it in there so you can see the difference. The SelectMany on the other hand got the list of names with four letters and since String is an IEnumable<Char> it then projects each String as an IEnumerable and then combines the results. So, as you can see, we got a single result with each character of each name in our list.

But how is this useful to us? Actually there are quite a few wonderful uses for this. Lets look at what we could do if we changed our example above to have several lists of names instead of just one.

var nameList = new List<List<string>>()
                   {
                       new List<string>
                           {
                               "Matt",
                               "Adam",
                               "John",
                               "Peter",
                               "Owen",
                               "Steve",
                               "Richard",
                               "Chris"
                           },
                       new List<string>
                           {
                               "Tim",
                               "Jim",
                               "Andy",
                               "Fred",
                               "Todd",
                               "Rob",
                               "Richard",
                               "Ted"
                           }
                   };           

You can now see that we have two different lists of names and how do we get the list of names with only four characters now? Without using SelectMany we could do this:

var names1 = nameList.Select(l => l.Where(n => n.Length == 4));
foreach (var list in names1)
{
    foreach (string name in list)
    {
        Console.WriteLine(name);
    }
}

Hmmm. You see where we are doing a sub query to select the names from each list. We then have a nested IEnumerable<IEnumerable<string>> that we have to use nested loops in order to access our names. But with SelectMany we should be able to select our names from our list and avoid the nested IEnumerable.

var names2 = nameList.SelectMany(n => n)
    .Where(n => n.Length == 4)
    .Select(n => n).ToList();
 
names2.ForEach(n => Console.WriteLine(n));

Pretty easy! Here we are actually using the SelectMany operator first in order to flatten the lists and then we are filtering it using a Where statement. Again, the Select here is not required, but I have thrown it in to be more explicit. What I am doing above though may hide a little bit of what is happening. In order to better show that you are actually passing an IEnumerable into SelectMany I am going to show you an example where we are splitting up a few sentences about the people in our list above. What we are going to do is to have a list of sentences, and then split those sentences into words.

var sentences = new List<string> {"Bob is quite excited.", "Jim is very upset."};

This is our list of sentences, and now we need to get our individual words out of this:

var words = sentences.SelectMany(w => w.TrimEnd('.').Split(' ')).ToList();

Here we do a SelectMany on our list of strings, and then we are removing the period from the end of the sentence, and then we split them on a space and the result of the call to "Split" is what SelectMany then operates on. If we took the Split off then it would simply treat each item as a String and then we would end up enumerating over each character like we did in our first example.

I hope that this has cleared up the SelectMany operator for you, and so now we have just one last thing to do which is to show you how to do a SelectMany using the native C# LINQ syntax. But, how do you do a SelectMany if there is no native query operator for it? Well, you do it by chaining from statements. That may sound weird, but check it out, it actually works quite well.

var words = from s in sentences
            from w in s.TrimEnd('.').Split(' ')
            select w;

See? You are simply funneling the first "from" into the second "from". The reason why this works so well is that it lets you nest even further very easily. For example, what if we wanted each individual character from the above query.

var characters = from s in sentences
                from w in s.TrimEnd('.').Split(' ')
                from c in w
                select c;

Now all we did was split the words, and then select each character out of our list of words. Pretty sweet, and it can allow you to drill down into multiple levels of data very very easily. To me it also gives a good visual flow as to what is happening, but you don't get any indication that a new operator is being involved.

Well, I hope this helps clear things up a bit, and I hope that you get some good use out of this!

Dependency Injection Presentation

For those of you that do not know, I gave a presentation last weekend at the Richmond Code Camp 2008.1. Yep, we have more than one Code Camp per year in Richmond, we are quite lucky!  For those of you who could not attend my presentation I am putting up my slide deck and sample code, but I will also be turning my presentation into a series of posts. I wanted to do a screencast of the presentation, but Camtasia is about 300 dollars, so it is a bit out of my price range for free work. I feel like the posts will present the information better than a slide deck and sample code, but if you want to download the presentation anyways, feel free!

Update: I have now added a pdf version of the slides for those who cannot use the pptx file. The only issue is that the pdf lacks the notes that the powerpoint presentation has on them. So, it may be hard to tell what certain parts of the presentation mean. Hopefully though it will still be of some use to you!

Presentation Slides (pptx)

Presentation Slides (pdf)

Presentation Source

Immutability and tail recursion

I was in Amanda Laucher's F# talk yesterday and I asked about whether F# was able to tell if a function was pure or semi-pure. (A pure function is simply a function with no side effects, and a semi-pure function is one that only modifies locals) I was just a bit curious if F# was able to tell if there were any immutable variables that had been introduced into method, or if any mutable variables had been changed during a method. Well, I got my answer and then I made a statement that I probably definitely should have shut my mouth on. I essentially asked if introducing mutable variables would cause the F# compiler to not do its tail recursion optimization.

Yeah, I don't know what I smoked that morning. So, if you know all about tail calls and tail recursion optimization, then feel free to stop reading now. Otherwise, you can continue on to learn just how dumb of a statement that was. First of all let me just say that I have actually found a few blog posts where people have made the same mistake, and that makes me feel at least a *bit* better. I guess it is the fact that tail recursion optimization is usually associated with functional languages, since they mostly do looping with recursion, you would run into a lot of problems with even relatively small loops if you couldn't find a way to remove the recursion. But that doesn't really excuse me for not thinking about my question before I asked it.

So, anyways, we all know what recursion is. It is just when a method calls itself (below is direct recursion, there is also indirect recursion. I can't find a good link for a definition, so suffice to say that indirect recursion is when method A calls method B which in turn calls back to method A. More than two methods can be involved, but they don't have to be.)

static void increment(int n)
{            
    increment(n + 1);
}

Now, that method is infinitely recursive because we have no base case, but it is recursive nonetheless.

Some of us probably know what a tail-call is, it is when we make a call to a method at the very end of another method:

static void Method1()
{
    int i = 1 + 2;
    Method2();
}
 
static void Method2()
{
    Console.WriteLine("hello");
}

There are some optimizations that can be made when we are doing a tail call, mainly due to the fact that we can reuse the stack frame of the method we are calling from, since it is no longer needed. (Since there are no instructions after our method call).

Well, there is a little thing called tail recursion as well. And tail recursion is when your recursive call is a tail call. Our "increment" method above is actually tail recursive, since we are calling increment at the end of the method and there is nothing after it. But this method would be considered tail recursive as well:

static void increment(int n)
{            
    if (n > 500000000)
    {
        Console.WriteLine(n);
    }
    else
    {
        increment(n + 1);    
    }    
}

And you may even be surprised to know that this is considered tail recursive, since after increment nothing else will ever be called.

static void increment(int n)
{                
    if (n > 500000000)
    {
        Console.WriteLine(n);
    }
    else if (n > 1000000)
    {
        increment(n + 2);
    }
    else
    {
        increment(n + 1);    
    }    
}

To make it more clear, look at this in IL:

image

Now you can see that after each call to "increment" we are going to return. This way it is easy for the compiler to tell this is the last instruction we are going to execute. Now, the beauty of tail recursion, is that since the recursion happens at the end of our method, we can essentially eliminate the recursion by using a loop. This is why (as Amanda showed us in her F# presentation) the F# compiler will turn our original increment method into something that looks like this:

static void increment(int n)
{                
    while (true)
    {
        n++;
    }    
}

And when you run it in C# our original "increment" method will blow up with a StackOverflowException, right? Well, kinda. So, lets build our app and run it under Debug mode. Well, we quickly get this:

StackOverflowException

But, like a good programmer, we know that running things in Debug mode often disables many compiler optimizations. Sooooo, we change our build type to "Release" and try it again. Suddenly our app is now running, happily pegging one of my CPU's...

image

What is that you say? On your machine you still got a StackOverflowException? Well, maybe you should get a new machine, I think yours is broken! Ha ha, just kidding! There is a mystery afoot. What happened on my release build, that you are not seeing on your machine? Or maybe you are seeing it? Well, it happens that I am running Vista 64-bit, and if you are seeing this app run fine as well, then you too are running on 64-bit hardware. (In order to prove this to yourself if you are running on a 64-bit machine, you can switch your target CPU to x86, then run in release mode and the StackOverflowException will appear quite quickly!!)

As a side note, after I ran this and realized that I was getting tail recursion optimization, I set about to find out why. Since I did not write the .net runtime (although it would be cool if I did) I had to rely on info from a few different places, but a great one was this post by Jomo Fisher. A lot of the info below this point was gleaned from his post!

But why should that make any difference? Well, let us take a look at our compiled IL:

64-bit:

image

32-bit:

image

Hmmm, they look the same... I thought you said that the 64-bit release would optimize our tail recursion? Well, you have to remember that in managed code there is more than one chance for optimization. Optimizations can occur at build time when the IL is generated (hence what F# is doing) and optimizations can also be done by the JITer (Just In Time compiler) when the code is turned from IL into native machine code (which the 64-bit JITer is doing for us).

In order for us to tell that this is actually happening, lets add some code at the end of our method in order to stop the JITer from optimizing our tail recursion and see if we can get our StackOverflowException. So we add this change:

static void increment(int n)
{                    
    increment(n + 1);
    int i = 5;
    i = i + 2;
}

I had to add the second line because just putting in "int i = 5;" the compiler was smart enough to optimize it out because it wasn't being used. Tricky compiler. So, now that we have made this change our application now blows up with a StackOverflowException even when running under the 64-bit JITer!

So there you go, you can now see that tail recursion is completely useless in C#! Since the optimization is done by the JITer and not by the compiler, the code above will run on 64-bit machines, but will crash quickly on 32-bit machines. These are the stuff that nightmares are made of. Anyways, the F# compiler obviously had to do these optimizations for itself since tail recursion is much more prevalent in functional languages and they can't assume that you'll always be running under a 64-bit OS. Although you SHOULD be!

And with that, we are back to my original point for this article. Immutable types and tail recursion optimization have nothing to do with each other. There are some other optimizations that can be done when a method is pure (such as automatic parallelization), but nothing to do with tail recursion. Well, at least nothing that I could find, so please prove me wrong! Hope you enjoyed this post, and hopefully you aren't laughing at me too hard right now. :-)

Digging into the Linq "let" keyword

In my last post I talked abou the Linq "let" keyword and showed you how it could be used to simplify linq queries. Well, this post was thrown up pretty quick and I didn't get to research the topic in the way that I normally do, and I quickly got called out by a few readers who pointed out that there are some implications with using the "let" keyword that may not be initially apparent.

In the last post I used a query that looked like this:

var names = (from p in nameList
            let vowels = new List<string> { "A", "E", "I", "O", "U" }
            let startsWithVowel = vowels.Any(v => p.ToUpper().StartsWith(v))
            let endsWithVowel = vowels.Any(v => p.ToUpper().EndsWith(v))
            let fourCharactersLong = p.Length == 4
            let fiveCharactersLong = p.Length == 5
            where
            (startsWithVowel || endsWithVowel) &&
            (fourCharactersLong || fiveCharactersLong)
            select p).ToList();

In this case the "let" keyword greatly simplifies what would have been an otherwise very ugly "where" clause. So, I started thinking, how can I show what happens when this query is turned into an Expression object? I first started off by using the Expression visualizer that ships in the samples with VS2008 (click for full size):

Query Expression

That might give you a hint as to what is happening, if you are familiar with compiler generated anonymous types, but otherwise that just looks like a bunch of gibberish. You have to open up the app in reflector to really see what is going on with these anonymous types. What this is essentially telling us is exactly what "Skup" in my last post was trying to tell me. That each time you use the "let" keyword it is just generating an anonymous type which wraps the variable assigned in the let statement along with the item being queried.

After thinking a bit I realized that just rewriting this query with explicit Linq methods would probably be the best way to explain this. There is no "let" query method, you have to accomplish all of this through select statements.

var names2 = nameList
    .Select(a => new { a, vowels = new List<string> { "A", "E", "I", "O", "U" } })
    .Select(b => new { b, startsWithVowel = b.vowels.Any(v => b.a.ToUpper().StartsWith(v)) })
    .Select(c => new { c, endsWithVowel = c.b.vowels.Any(v => c.b.a.ToUpper().EndsWith(v)) })
    .Select(d => new { d, fourCharactersLong = d.c.b.a.Length == 4 })
    .Select(e => new { e, fiveCharactersLong = e.d.c.b.a.Length == 5 })
    .Where(w => (w.e.d.c.startsWithVowel || w.e.d.endsWithVowel) 
              && (w.e.fourCharactersLong || w.fiveCharactersLong))
    .Select(s => s.e.d.c.b.a);

So, as you can see, the "let" keyword cleans this up quite a bit. But you can also see what is actually happening. You can also see why Frans said in my last post that when you are running this against Linq to Sql it is going to generate a sub-query for each "let" statement. I hope that this helps a bit when trying to decide whether or not to use the Linq "let" statement.