Zip method added to Dizzy

I just checked in a Zip method to Dizzy. This method takes n number of lists and returns a list of lists containing the nth item in each list. Go that? Ha. Let me show you:

image

Did that clear it up? Good. Lets take a quick look at the code (before you freak out, scroll down below the code):

public static IEnumerable<T[]> Zip<T>(this IEnumerable<T> list, params IEnumerable<T>[] lists)
{
    if (list == null) throw new ArgumentNullException("list");
 
    int enumeratorsCount = lists == null ? 1 : lists.Length + 1;
 
    var enumerators = new IEnumerator<T>[enumeratorsCount];
    enumerators[0] = list.GetEnumerator();
 
    if (lists != null)
    {
        for (int i = 0; i < lists.Length; i++)
        {
            if (lists[i] == null)
                throw new ArgumentNullException("lists", 
                    String.Format("Item at index {0} is null", i));
            enumerators[i + 1] = lists[i].GetEnumerator();
        }    
    }            
 
    return ZipImplementation(enumeratorsCount, enumerators);
}
private static IEnumerable<T[]> ZipImplementation<T>(int enumeratorsCount, IEnumerable<IEnumerator<T>> enumerators)
{
    for (;;)
    {
        int current = 0;
        var result = new T[enumeratorsCount];
        foreach (var enumerator in enumerators)
        {
            if (!enumerator.MoveNext())
                yield break;
            result[current++] = enumerator.Current;
        }
        yield return result;
    }
}

Look a little crazy? Why is it two methods? Well, the first method basically just gets a list of all enumerators that we are going to loop through. It also checks the list to see if it is null. This method exists so that all of these checks will be done when the method is first called. Then we call ZipImplementation which uses “yield return” and so it is not executed right away. ZipImplementation is what does all of the heavy lifting for the function. It loops through the array of enumerators adding each item to the current sub list in our result. If we run out of items in any iterator then we just call “yield break” and quit.

If we successfully fill the current array then we “yield return” it. And that is pretty much it, I hope you find the method useful and go check out Dizzy if you feel the urge!

Be Sociable, Share!

5 comments

  1. There’s one problem with your Zip implementation: it requires that everything be the same type.

    For an example of why this is problematic, see:
    http://groups.google.com/group/mono-rocks/browse_frm/thread/f4c88de3749cf656, where someone wanted a reasonable equivalent of Python’s `enumerate`, which returns a tuple of integer and T.

    For example, while reading lines from a file you’d want the line number as an int, and the line itself as a string.

    Your zip won’t support this. :-/

    For the above `enumerate` request, a simple LINQ query + anonymous types [i]will[/i] support it:

    int lineCount = 0;
    var e = from line in readfile (file)
    select new { Line = line, LineCount = ++lineCount };

    foreach (var l in e) {
    Console.WriteLine ("{0,4}: {1}", l.LineCount, l.Line);
    }

    This can be generalized, which I’ve done here:

    http://groups.google.com/group/mono-rocks/browse_frm/thread/2d93041744f55093

    so that with an additional generator:

    static IEnumerable<int> Integers()
    {
    int i = 0;
    while (true) yield return i++;
    }

    We can get the equivalent LINQ query without the extra lineCount variable:

    var e = Integers()
    .SelectFromEach(readfile(file),
    (n,s) => new { LineNumber = n+1, Line = s });

    The .SelectFromEach() extension method also seems more in line with what Zip does in Haskell, as (again) it doesn’t require that all elements be of the same type.

    An alternate approach would be to define .Zip() to return a tuple, instead of taking a lambda which can return an anonymous type as .SelectFromEach() does. Tuples would also allow strong typing without all lists being of the same type.

    The obvious advantage to your zip, though, is that it can support an arbitrary number of lists: list1.Zip(list2, …, list100) is quite possible with your code, as long as they’re all of the same type.

    A Tuple-based .Zip() or .SelectFromEach() has limits, by design; .SelectFromEach() could support at most 4 lists (assuming we stick with using Func`N to create the return type), and a Tuple-based .Zip() would still be limited by the largest Tuple your framework provides. (Even if you go insane and add 100 Tuple types to your assembly, it still wouldn’t support 101 different types…)

    (For an example of some Tuple types: http://monoport.com/17592, though that won’t be valid for long…)

  2. Sorry, [i]this[/i] is the current Tuple prototype: http://monoport.com/17599

  3. I can see how this could limit the ability of the Zip method. I will think more about this later and see what I can come up with. Thanks for the code and suggestions.

  4. Do you use IRC at all?

    ##csharp on freenode.net tends to have a number of decent C# developers with a variety of backgrounds, including Microsoft developers working on Visual Studio; you might find it helpful to discuss things there. I certainly do…

    My nick is `jonp’.

  5. I’ll try and get on there in the evenings when I have time. Currently the employer I am working at does not allow me to get on IRC.

Leave a comment