Added parallel abilities to Dizzy

I have added an “AsParallel()” method to Dizzy which mimics the way that the Parallel Extensions to the .net framework works. So, if you want to call a parallel map function you simply need to do this:

list.AsParallel().Map(n => n.ToUpper());

The AsParallel() method just looks like this:

public static IParallelEnumerable<T> AsParallel<T>(this IEnumerable<T> list)
{
    return new ParallelEnumerable<T>(list);
}

As you can see, it just takes an IEnumerable<T> and replaces it with a IParallelEnumerable<T>. This Interface/Class combo doesn’t provide any sort of work, it is just a flag to use the Parallel version of methods. Currently “Map” is the only method with a parallel version. The ParallelEnumerable class just looks like this:

public class ParallelEnumerable<T> : IParallelEnumerable<T>
{
    private IEnumerable<T> enumerable;        
 
    public ParallelEnumerable(IEnumerable<T> list)
    {
        if (list == null) throw new ArgumentNullException("list");
        this.enumerable = list;
    }
 
    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.enumerable.GetEnumerator();            
    }
 
    public IEnumerator<T> GetEnumerator()
    {
        return this.enumerable.GetEnumerator();            
    }
}

So, you can see that we are just wrapping an IEnumerable and returning the items we need to support the IEnumerable interface. Then we just define a “Map” method that takes an IParallelEnumerable instead of an IEnumerable and voila! Instant parallelism without the need for different method names. The map method is just the same as it was in this post, only the method signature is now change to look like this:

public static IEnumerable<TResult> Map<TArg, TResult>(
    this IParallelEnumerable<TArg> list,
    Func<TArg, TResult> func)

So, whenever we have an IParallelEnumerable we will call this method instead of our other Map method. We still return an IEnumerable because we don’t want additional methods chained to this call to automatically try to use parallel versions.

So, I hope you found this useful, and I hope you also check out Dizzy. I’ve added a few other methods, and plan to keep adding methods regularly. If you have any ideas for new methods, please let me know.

Be Sociable, Share!

12 comments

  1. Hey Justin,

    I’m really impressed with your library and with the work you’re doing with lists in C#. It seems that a lot of what you do is rooted in set theory. Do you think that an intro to sets and set theory would be useful to help understand what your library does?

    Cheers,
    Luke

  2. It is funny that you say that, because I actually had some set operations planned for future additions to dizzy. I hadn’t contemplated an intro to set theory, but I will consider a post on that in the future. Good idea! Thanks!

  3. Hi Justin,

    Can you elaborate on how the AsParallel extension function causes the Map function to run in parallel?

  4. @David Sure, no problem! The "AsParallel" method doesn’t really cause the Map function to run in parallel. What it does is take an IEnumerable and wrap it in an IParallelEnumerable. Then we have another overload of the Map function which takes an IParallelEnumerable instead of an IEnumerable. So, anytime you call "AsParallel()" on a list or array, then when you add ".Map(" on the end, you are looking at the version of Map that takes an IParallelEnumerable.

  5. I would suggest, actually, that your .Map<T>(IParallelEnumerable<T>,…) method return an IParallelEnumerable<T>, not an IEnumerable<T>. The reasoning is twofold:

    1. You want to gives threads a reasonable amount of work, and odds are that a single method call won’t give much of a performance win. An entire method call chain, however, likely will.

    For example, given list.Filter(n => n > 5).Map(n => n*2), it would likely be better if the Filter+Map could be done in a parallelized fashion, instead of just the .Filter, then an (implicit) thread join, followed by the .Map and a thread join.

    Granted this would take more work and isn’t currently implemented, but it’s something to think about. Using IParallelEnumerable<T> as the return value allows you to make this change in the future without breaking existing clients; IEnumerable<T> doesn’t.

    2. Since IParallelEnumerable<T> extends IEnumerable<T>, you can still call IEnumerable<T> extension methods on the returned value, so you don’t "lose" anything.

  6. @Jonathan I can see both sides of the fence on this. On one hand, having Parallel operations automatically chain without having to keep doing AsParallel() would be nice, but there are many situations where you wouldn’t want to use parallel operations.

    Let say for instance we have a parallel Filter method and we had a very long array and filtered it down to a very short list. Then since we have a short list we would most likely not want to use parallel operations since they would be very slow compared to the serial implementations. The only way around this would be to explicitly assign the IParallelEnumerable to an IEnumerable variable in order to call the serial method overloads.

    So, how do we approach this? Do we introduce an operator called AsSerial() or do we introduce separate AsParallel() and AsChainedParallel()? Or something similar to that?

    Thoughts?

  7. @Justin,

    Thanks for the extended info. What had me lost at first was my misunderstanding of the fact that IParallelEnumerable is part of PLINQ. Now that I have dug deeper this now makes more sense. Thanks!

    Also, regarding > "So, how do we approach this? Do we introduce an operator called AsSerial() or do we introduce separate AsParallel() and AsChainedParallel()? Or something similar to that?"

    I can see benefits to both approaches, though if AsParallel() was chained by default and AsSerial() were used to break out of that pattern it seems it would be easier to maintain. i.e. My guess is that the default behavior that most people will want will be chained operations, so breaking out of that chain takes a single call. On the other hand, if you later decide you want to run a bunch of operations AsChainedParallel() you’re going to have to go back and update a bunch of code. In many ways I could see AsSerial() becoming a quick and simple way to optimize code that you later find is performing slower than it could if run AsSerial().

  8. [quote]On the other hand, if you later decide you want to run a bunch of operations AsChainedParallel() you’re going to have to go back and update a bunch of code. In many ways I could see AsSerial() becoming a quick and simple way to optimize code that you later find is performing slower than it could if run AsSerial().[/quote]

    Of course on the other, other hand, I could see the argument for the exact opposite behavior. The "When you have a hammer everything looks like a nail." argument could easily be invoked. I wonder if future versions of the ParallelFX library will take on a similar approach to the way the Windows kernel handles asynchronous requests where it will run the code synchronously if it determines it would be faster?

  9. @David First, it looks like I need to work on my sites styling of the "cite" tag.

    Secondly, I think you can see the issue I am having with this. Although I am leaning toward the AsSerial() way of doing it, I think that the fact that you aren’t explicitly saying when parallel operations are used can detract from the implementation.

    Now my original idea of using "ParallelMap" as the method name doesn’t seem so bad. At least then you could look at a query and tell what is parallel and what isn’t. I’m going to have to think on this for a bit.

    Anyone else have any feedback on this?

  10. We don’t need a .AsSerial() extension method: we have it already — .AsEnumerable(), which accepts any IEnumerable<T> and returns…an IEnumerable<T>.

    It exists so that you can obtain the extension method version of various methods instead of the class’ versions, e.g.:

    List<int> a = new List<int> ();
    // List<T>.ToArray()
    int[] b = a.ToArray ();
    // Enumerable.ToArray<T> (IEnumerable<T>)
    int[] c = a.AsEnumerable().ToArray ();

    So if you want to return to the serial versions, just stick in .AsEnumerable:

    list.AsParallel()
    .Filter(…) // parallel filter
    .Map(…) // parallel map
    .AsEnumerable()
    .Map(…); // serial map

  11. Not sure you saw this already; I just haven’t seen it posted yet.

    There is no need for a .AsSerial() extension method, as it already exists: .AsEnumerable().

    .AsEnumerable() takes any IEnumerable<T> (or subtype) and returns an IEnumerable<T>, so this would work to convert an IParallelEnumerable<T> into an IEnumerable<T>.

  12. @Jonathan Yep, "AsEnumerable()" will work, but you don’t think it would make it more explicit if an "AsSerial()" method was added? That way you would have "AsParallel" and "AsSerial" complementing methods.

Leave a comment