Posted on 8/2/2008 9:31:15 AM by Justin Etheredge
I was reading Matt Podwysocki's blog yesterday (which, by the way, is awesome. Go subscribe.) and he had up a post called "Recursing into Recursion - Memoization". Very good post if you want to learn about memoization. I made a post about a generic memoize function a while back, so memoization is not what I am here to talk about. The thing that piqued my interest was when I got toward the end of the post and saw this:
Don't forget that proper memoization requires the functions that you are writing to be pure. What that means is that given the same input, the same output will always be calculated and returned.
My first thought was "Oh Matt, pure means that a function has no side effects, while deterministic means that a function always returns the same results for a give set of parameters". Then, like any good programmer, my second thought was "Actually, you are probably the one who is wrong." So, I went and looked it up, and not surprisingly I was the one who was wrong.
Here is the definition of a pure function from wikipedia:
- The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds, nor can it depend on any external input from I/O devices.
- Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
And the definition of a deterministic algorithm by the NIST:
An algorithm whose behavior can be completely predicted from the input.
So pure functions are actually a subset of deterministic functions. All pure functions are deterministic functions but not all deterministic functions are pure functions. For example, we could have a function that takes a specific set of parameters, and then returns a deterministic result, yet writes the values out to disk or to a console. By definition it would no longer be pure, but it would be deterministic. For example:
Deterministic and pure:
public int Add(int val1, int val2)
{
return val1 + val2;
}
Deterministic, but not pure:
public int Add(int val1, int val2)
{
Console.WriteLine(val1 + " " + val2);
return val1 + val2;
}
Now, deterministic methods can also not consume external state, because that would effect their functioning. But most definitions of deterministic functions deal simply with the fact that you can determine the output based on the input.
As Matt stated in his blog post, in memoization the need for a function to be deterministic is very important. This is clear because of the fact that if we have a setup of inputs and then we cache the set of outputs, we better hope that the function will return the same result for every call with those inputs, otherwise we just changes the methods behavior with our memoization.
Pure functions are important for many reasons, but number one being that it allows us to more easily reason about the behavior of a function. If a function is pure and has no side effects, then we are more likely to be able to reason about the performance of said functions, since we will have less variables to account for. You also cannot effectively make a non-pure function run in parallel without heavily analyzing its behavior. A pure function, on the other hand, should not be depending on anything other than the values you passed into it and the values that is holds. It should also not be changing any shared state, and therefore can be run in parallel without need for locking.
I hope that if you didn't know what pure methods were before, that you now have a good idea of what they are. But I also hope that if you had any misconceptions like I did, that they are all cleared up now.
You know, it is truly awesome to learn something new, but it is even better to learn that something you thought you knew was wrong.