Don’t Get Bit By Lazy Evaluation

I’ve written numerous posts in the past about using “yield return” and delayed execution, but yes I was still stumped for a few minutes today when I was implementing a sample application. I had included the normal “Each” extension method that I have written in the past, since it is so useful when you want to easily iterate over a collection easily:

public static void Each<T>(this IEnumerable<T> list, Action<T> action)
{
    foreach (T item in list)
    {
        action(item);
    }
}

I was using this and I got to the point where I wanted to chain something in the same way that jQuery lets you do:

$('#something')
    .each(function(item) { //do something })
    .each(function(item) { //do something else });

So I modified my “Each” extension method to simply pass the last operated item out, and tried to get cute by using “yield return”:

public static IEnumerable<T> Each<T>(this IEnumerable<T> list, Action<T> action)
{
    foreach (T item in list)
    {
        action(item);
        yield return item;
    }
}

And then I walked away cause I had to do something else. I came back a while later and ran my application and nothing happened. Unfortunately I couldn’t remember what I had changed. So I started going through my code, and eventually I came across the code I had modified and it jumped right out at me! I was calling the above method like this:

myCollection.Each(item => item.Clear());

If you haven’t had to deal very often with delayed execution you may still be staring at that wondering what the problem is. And if you picked up on it right away then I applaud you. The problem is that nothing is “pulling” the values out of the “Each” method. Nothing is iterating over the iterator that is coming out of this method. That is the beauty of “yield return”, if the results are never used, then the code is never executed. Here though, the results weren’t important, but the side effects of the method that is the problem.

Side effects… ugh. Yet another reason why side effects and a functional style of programming will slap you in the face. If you still wondering what the deal is exactly, then take a look at the code generated by the above source:

public static IEnumerable<T> Each<T>(this IEnumerable<T> list, Action<T> action)
{
    <Each>d__0<T> d__ = new <Each>d__0<T>(-2);
    d__.<>3__list = list;
    d__.<>3__action = action;
    return d__;
}

Notice that there is a compiler generated class called <Each>d__0 that is being declared has the list and action assigned to it, then this class is returned. It is just a container for the action to be performed. If you then go and look at what the compiler generated for this class (I’m not going to paste the whole thing here), but it is implementing IEnumerable<T> and IEnumerator<T>. So this class is just being passed back from the method and unless this class is iterated over, then the Action delegate is never even being called.

So how do we fix this? Well, there are two ways we could fix this. One is to simply change the Each method so that it iterates the whole list each time:

public static IEnumerable<T> Each<T>(this IEnumerable<T> list, Action<T> action)
{            
    var result = new List<T>();
    foreach (T item in list)
    {
        action(item);
        result.Add(item);
    }
    return result;
}

When this is called we will iterate over the entire list, and perform an action on each item before the method returns. When the method returns, we can just return the original list since we aren’t changing it in any way.

The second way to fix it, which may or may not be preferable depending on your situation, is to force execution on the iterator. This can be done in a slightly odd looking way by calling ToArray(), ToList(), etc… Linq extension methods. Since these methods turn the iterator into a set list, they obviously have to iterate through the whole list:

myCollection.Each(item => item.Clear()).ToArray();

Since we aren’t using the result of this, it looks a bit odd. We can perform a similar action using a custom extension method called “Apply”. This was originally suggested to me a long time ago by Mono.Rocks author Jonathan Pryor. The Apply extension method simply iterates the list and forces execution:

public static void Apply<T>(this IEnumerable<T> list)
{
    foreach (T item in list)
    {
        //ignore T
    }
}

It still returns an IEnumerable so that it can be chained, in case you want intermediate steps executed in a chain of methods. As I was reminded by Sebastian below, it is very bad form to iterate an IEnumerable more than once. We could again cache the result in a list, so I’ll leave it up to you to implement that version.

So, why would you want to solve it in this way? Well, with the “Each” extension method still being delay executed, it means that we can chain a bunch of them together and a single item will get passed down through the chain. So something like this would work:

GetStreamOfNeverendingObjects()
    .Each(o => o.PerformAction())
    .Each(o => o.PerformOtherAction())
    .Take(5)
    .Apply();

Hmmm. So we get a list of some random objects take the first one, call two methods on it. Take the second, call two methods on it. And you’ll notice that we only do this for the first five items in the list, since we are using the “Take” method. Interesting stuff.

Wrap Up

This is really interesting stuff, and can be very confusing. The powers of delayed execution can amaze and confound. Remember, these tools are sharp, so make sure that you’re careful.

Be Sociable, Share!

13 comments

  1. I’m going to have to go out on a limb here and say that you’ve gone completely overboard and wrote something that is ridiculously over complicated for no reason and extremely difficult to read.

  2. I think it’s nifty — but I adore $.each and $.map, too. One thing that concerns me about this (and how I’d be trying to use it) though: how can you flag a non-fatal exception — "Item #5 is rejected because of #F00" while processing all other items — when doing chained yield returns? I’m having difficulty picturing the flow of control there, unless something a bit gawky is going on (like setting a checkable .WeDisapproveOfThisObject property or somesuch).

  3. @Sean

    Use jQuery or Ruby and you might change your mind.

    @Justin

    I ran into this exact issue about a year ago when I wrote my own "Each" method. Got clever about chaining Each’s, and forgot about deferred execution. Just wound up what you had, and returned the list immediately, instead of yielding each item. It made more sense anyway, you’d expect each "Each" to finish before going to the next one.

  4. @Sean Actually you’d probably be correct. I’m going to blog about it sometime over the next few days.

    @NNeko You’d have to either carry state on the object (like you said), or do something like implement a "zip" function and zip the list of objects with a list of statuses.

  5. @Jimmy Yep, jQuery and Ruby will certainly change your mind about this. And so will Linq. My current use for this is probably a bit over the top, but it is a pet project.

    And yeah, I’ve been bit by this before. It is just so freakin’ subtle. :-)

  6. You might find Eric White’s recent blog post interesting, "Why I Don’t use the ForEach Extension Method"
    http://blogs.msdn.com/ericwhite/archive/2009/04/08/why-i-don-t-use-the-foreach-extension-method.aspx

  7. I love your blog, however your two fixes are pure evil (sorry :) ): Never ever iterate over an IEnumerable more than once. If you really can’t avoid it, first cache the sequence, eg. using ToArray.
    You mentioned Mono.Rocks, so let’s look at the current API:

    public static void ForEach<TSource> (this IEnumerable<TSource> self, Action<TSource> action)
    public static IEnumerable<TSource> Each<TSource> (this IEnumerable<TSource> self, Action<TSource> action)

    One method for immediate, one for lazy evaluation. The former consumes the sequence without returning it, so no danger of iterating it twice. A clear separation of these two worlds.
    seq.Each(DoA).ForEach(DoB);

    Once there was an Apply method, but it wasn’t evil :) :
    public static IEnumerable<TSource> Apply<TSource> (this IEnumerable<TSource> self, Action<TSource> action)
    public static void Apply<TSource> (this IEnumerable<TSource> self)

    This is just a variant of the ForEach/Each pairing:
    seq.Apply(DoSomething).Apply();

    PS: "So something like this would work" @ infinite sequence:
    No, it won’t :) . "Take" is lazy (doesn’t consume the sequence), so your lambdas will never be executed.
    Using Jonathan’s Apply:
    …Take(5).Apply();
    Using Each/ForEach and a bit of reordering:
    GetStreamOfNeverendingObjects().Each(…).Take(5).ForEach(…);

  8. @Sebastian Thanks! And please tell me when I do something wrong, I did completely screw up the end of this post.

    I updated the "Apply" so that it doesn’t pass back an IEnumerable and I updated the "Each" method so that it places the values in an intermediate list before it passes the results back so that we don’t iterate over the enumerable twice. I think I confused the last implementation of "Apply" with the "Each" since I made the Apply return an IEnumerable and then said showed an example that didn’t use it. But anyways, thanks for pointing out my errors.

    As far as the last call, yep I know that "Take" is delayed, I was more showing that you could write something which passed through like that. This is why I think I might have been confusing the two calls. I added a call to "Apply" for clarity.

  9. [i]As I was reminded by Sebastian below, it is very bad form to iterate an IEnumerable more than once[/i]

    Why? Isn’t this context dependent?

  10. @Chuck Yep, it is context dependent. Some IEnumerables cannot be iterated over more than once and will throw an exception. Or even worse, produce unexpected results. This method has no context, so there is no way for it to know whether or not it can iterate over it more than once. The way it was designed was guiding the consumer down this path.

    It is much better to capture the results of iterating over the enumerator into a list and then return the list, since you know that (under most circumstances) this is safe to be iterated over again.

  11. @Oran Thanks, that is a great post.

  12. The Each() extension method is a rather cool idea, but I just feel like adding the "yield return" is simply over-complicating it to the point that it’s easy to create bugs without realizing it. To each, his own, but I’m a firm believer that consumers of our code shouldn’t have to worry about the implementation details behind the scenes. In other words, if a consumer has to structure his code in a way that isn’t natural (or isn’t known without reading YOUR code), I’d argue that we’ve done something wrong.

  13. @Sean To a certain extent I agree with you, in most cases the consumer should not have to be aware of the exact implementation of the code. But if you follow that rule to an extreme, then Linq wouldn’t really exist, since it is all based on lazy execution. For instance, if I were to do something like this, where I call some methods and don’t care about their results (and this is not proper form, but I’m just showing an example):

    people.Where(p => p.FirstName == "John")
    .Select(p => p.DoSomeTaskWithResult());

    Due to the mixture in styles of programming, you are going to get the same bug. This goes very much in line with the link in Oran’s comment up above talking about how "Each" is bad because it encourages mutation, whereas select doesn’t encourage this. While I agree with the sentiment, I think that an implementation of "Each" can clean up some code. Most languages have it, and it doesn’t cause problems. What I am coming around to though, is the fact that returning a result from "Each" is probably bad form because it encourages a more functional styles, which can be confusing. I am going to have to stew on that for a while.

Leave a comment