Higher Order Function Demo

In my presentation I gave the other night at the Richmond .net users group one of the topics that I touched on very heavily was higher order methods. Higher order methods are simply methods that take a function as a parameter or return a function as a result. The key idea here is that we are not passing the results of functions in or out of functions, we are actually passing the functions themselves. I have a blogged a few times in the past about this topic and there was a demo that I had but ran out of time and did not get to show. I went back after my presentation and realized how cool the demo was, and knew that I really should have shown it, even if I ran over. Hindsight is 20/20 I guess. So anyways, the best I can do is to show you here.

The demo that I am getting ready to show you does this… It takes a list of domain names and it pings each of them n number of times. It then filters out any results that it gets that are 0 (meaning the ping failed), then it divides the results of each domain up by n (to get the average ping time) and then it spits out each domain with its average ping time. Pretty cool, huh? Well the task is not particularly cool, but the code is.

First I wanted to show the code in an imperative style.

var domains = new List<string> {"www.google.com", 
                                "www.yahoo.com", 
                                "www.akamai.com", 
                                "www.ask.com"};
 
var pinger = new FakePing();
int pingCount = 20;
 
Console.WriteLine("Imperative:");
 
var sw = new Stopwatch();
sw.Start();
 
long time = 0;
foreach (string domain in domains)
{
    int successfulPingCount = 0;
    for (int i = 0; i < pingCount; i++)
    {
        FakePingReply reply = pinger.Send(domain);
        if (reply.RoundtripTime != 0)
        {
            successfulPingCount++;
            time += reply.RoundtripTime;    
        }                    
    }
    time = time / successfulPingCount;
    Console.WriteLine(domain + ": " + time);    
}
sw.Stop();
 
Console.WriteLine("Elapsed: " + sw.ElapsedMilliseconds);

So, there it is. Pretty much what you probably expected. It is slightly different from my presentation, since I noticed one bug where it was using the total ping count and not the successful ping count in order to divide by to get the averages. I fixed it, and so above you can see that we are looping through each domain, then looping through the total number of pings, and sending a ping out, adding up the totals as we go skipping over pings that come back with 0 roundtrip times (these are failed pings). We then take the total time and divide by the number of successful pings to get our average ping time and then we print that out with the domain name.

So, lets look at the code that is going to use higher order methods and is going to have a more functional flavor to it. I’m not going to use any of the Linq extension methods here, since I wanted to use the code from my presentation, but keep in mind that most of this stuff is possible using the Linq extension methods. I’ll consider a source listing using Linq at the end of this post. Let us first take a look at the sample code:

var totals = 
    domains.Map(d =>
                    d.Repeat(domain => new FakePing()
                        .Send(domain).RoundtripTime, pingCount)
                    .Filter(p => p != 0).Apply()
               );
 
var averages = 
    totals.Map(p => 
                (p.Fold((p1, p2) => p1 + p2) / p.Count()).ToString()
            );
 
domains.Zip(averages).ForEach(p => Console.WriteLine(p[0] + ": " + p[1]));

Looks pretty good huh? I left out the stopwatch code around it so you wouldn’t have as much noise, but I think it represents what we are trying to do pretty well. I broke it into three different statements so that it was easier to follow. Let me go through it now and show you what each part is doing. First we are taking our list of domains, which is the variable “domains”, and we are calling Map. Map loops through each item in our list and applies a method to it, returning all of the results in an IEnumerable.

The method that we are passing to our Map method takes each domain in the list and calls “Repeat” on it. Repeat is a higher order method that simply takes an item, a function that accepts that item as a parameter, and a count for the number of times to repeat. In this case, we are passing “pingCount” to it, which is the number of times that we want to ping each domain. The method going into repeat is what actually creates our “FakePing” class (which can be replaced with “Ping” if you’d like to do actual pinging) and sends the ping request. We get the RoundtripTime off of the result.

We are then passing the result of this into our “Filter” method. This takes a predicate, which is simply a function that returns a boolean value, and uses that method to determine whether an item should be included in the resulting IEnumerable. Here we are testing to see if the ping time is not 0. If it isn’t zero then we include it in our result IEnumerable.

We then call “Apply” which is a higher order method that simply executes our list immediately so that in the next part it won’t be executed more than once. (Thanks to Jonathan Pryor, he posted this method on my blog just yesterday!)

So, now what we  have is an IEnumerable of lists containing ping times for each domains. It would look something like this if you were to print it out:

{ { 25, 27, 27, 25, 25 }, { 26, 28, 29, 31, 30 }, { 24, 25, 29, 31 } }

There would be more in there, one list for each domain, and the number of pings would be a bit higher, but you get the point. Also notice that the last list doesn’t have the same number of items. This would happen if one of the pings came back as 0.

So, we take this list of totals and we pass them again to our “Map” function. This time though we are taking each list and padding it to “Fold”. Now, what fold does is it takes each item in the list and calls a method that takes an accumulator variable and the current item in the list. Here our method that we are passing is adding our accumulator to the next item in the list, and since our items are integers, they get added and the result is an integer. If they were strings then our call to “Fold” would have concatenated them.

We then divide the result of each of our lists being folded by “p.Count()”, which is the number of items in the list of pings. If we had not called “Apply” earlier, then this is where our IEnumerable would have been executed twice, and we would have pulled all the pings twice for each domain! Delayed execution is a wonderful thing, but it is like beer… it will give you plenty of good times, but if you don’t respect it, it will kick you in the butt. So, after we divide to get our average, we then call “ToString()” to get a string result of our average. We now have the “averages” variable that holds a list of average values. One for each of our domains.

We now call our “domains” list again (not our averages!) and call “Zip” on it, passing in our averages. What zip does is it takes two lists and returns a pair for each nth item in the list. So, if we had lists { a, b, c } and { d, e, f } we would get { { a, d }, { b, e }, { c, f } }. This allows us to now print out our domains along with our results. Once, our application is executed we will see this on the screen:

image

You will notice that the imperative and functional times took practically the same amount of time. If we ran it a few times these numbers would vary slightly, but would always be virtually identical. What is neat though is that we could make one small change to our functional version:

var totals = 
    domains.ParallelMap(d =>
                    d.Repeat(domain => new FakePing()
                        .Send(domain).RoundtripTime, pingCount)
                    .Filter(p => p != 0).Apply()
               );
 
var averages = 
    totals.Map(p => 
                (p.Fold((p1, p2) => p1 + p2) / p.Count()).ToString()
            );
 
domains.Zip(averages).ForEach(p => Console.WriteLine(p[0] + ": " + p[1]));

Do you see it? I changed the first call to “Map” to be “ParallelMap”. And since we are developing this in a manner where we are not using any mutable state, we can do this change and not have to worry about any potential issues. So, now when we run this…

image

Hmm, I’d say that is a pretty good jump for a few extra keystrokes. It cut the time to run in half! This time we were only doing 20 pings, but what happens when we jump that number to 1000 pings?

image

Wow. It ran almost four times faster. That was actually a bigger speed bump than I was expecting. If you are curious, my ParallelMap function was blogged about here but I have made a few key changes that I will put up on the website soon (it will also be in the source below). As you can see it just uses the ThreadPool to schedule its tasks. Now this kind of parallel processing problem simply involves low CPU, long running processes, but with database calls and web service calls, this is a lot of what business developers see these days.

I am going to go ahead and attach the source, with the updated methods, but I first want to post the promised Linq version using the Linq extension methods like “Select”, “Where”, etc…

IEnumerable<IEnumerable<int>> totals =
    domains.Select(d =>
                        d.Repeat(domain => new FakePing()
                                               .Send(domain).RoundtripTime, pingCount)
                            .Where(p => p != 0).Apply()
        );
 
IEnumerable<string> averages =
    totals.Select(p =>
               (p.Aggregate((p1, p2) => p1 + p2) / p.Count()).ToString()
        );
 
domains.Zip(averages).ToList().ForEach(p => Console.WriteLine(p[0] + ": " + p[1]));            

 

Here is the source:

MapFold Source

Have fun!

Be Sociable, Share!

Leave a comment