Posted on 1/10/2010 8:27:59 PM by Justin Etheredge
In order to challenge others to the TekPub LINQ challenge, I felt like I had to create my own solution first, before I challenged anyone else to the task. What I came up with is an algorithm that works, and is also terribly inefficient. There are a few optimizations that can be made in order to speed it up though.
My original solution looked like this:
var primes = Enumerable.Range(1, 20)
.Where(i => i != 1 && !Enumerable.Range(2, i - 2).Any(j => i % j == 0));
Here you can see that we take our range, check to make sure that 1 is not involved (since 1 is not prime!) and then take the range from 2 to one less than the number and then use “Any” LINQ extension method to check to see if any of the numbers in between divide evenly into it. A brute force approach, which works, but it is not super efficient.
Let’s Pull In Someone A Bit Smarter
Well, my friend Nate over at Kohari.org created an implementation very similar to mine, but had one major difference, and that was leveraging the fact that for any number, in order to determine if the number is prime, you only need to check for divisors less than the square root of the given number. Like this:
var primes = Enumerable.Range(1, 20)
.Where(x => x != 1 &&
!Enumerable.Range(2, (int)Math.Sqrt(x)).Any(y => x != y && x % y == 0));
This is a very interesting law, so let’s look at that a bit closer. Let’s say that we want to check if 10 is prime. The square root of 10 is about 3.2, so we only need to check 3 and 2 to see if 10 is evenly divisible by these numbers. So why is this? Well, it is fairly simple, but not immediately obvious.
If we are only checking 3 and 2 to see if 10 is prime, then what about 5? Well, the idea here is that 10 is evenly divisible by 5, and so there must be some number n where 5 x n = 10. This number is 2. Since 2 is smaller than 5, we don’t need to try 10 against 5 because we will figure out 10 is not prime by dividing by 2. In this case, no matter what number we change 5 to, if n is smaller than that number, n is always going to be smaller than the square root of the number we are checking, or it is going to also have a factor smaller than the square root of the number we are checking. Pretty cool, huh?
As you can imagine, this has a huge impact on the performance of this algorithm. How much? Well, I’m glad you asked.
| Primes Up To |
My Algorithm |
Nate's Algorithm |
| 2000 |
.011 seconds |
.001 seconds |
| 200,000 |
31 seconds |
.2 seconds |
| 2,000,000 |
2,400 seconds (40 minutes) |
4.5 seconds |
Let’s Use All Of Those Cores
Wow. You really start to see the difference when you hit 2 million! Hmmm, now what would happen to these numbers if we were to use Nate’s algorithm, but with parallel LINQ? All we would have to do is add “AsParallel” to the LINQ query like this:
var primes = Enumerable.Range(1, 20).AsParallel()
.Where(x => x != 1 &&
!Enumerable.Range(2, (int)Math.Sqrt(x)).Any(y => x != y && x % y == 0));
And look at what we get!
| Primes Up To |
Nate's Algorithm With Parallel LINQ |
| 2000 |
.029 seconds |
| 200,000 |
.155 seconds |
| 2,000,000 |
2.6 seconds |
| 20,000,000 |
65 seconds |
As expected, the parallel version is quite a bit slower for the smaller numbers due to the required overhead. When we get into larger numbers, we see that on my dual core machine we see an almost 2 times speedup.
Summary
So there you have it. A tiny bit of research, and a little bit of parallel LINQ goodness had led to a speedy algorithm that beats out my naive implementation by an absolutely absurd margin. Now, there are other algorithms out there that are even more efficient, especially for larger primes. What do you all think, are you up for creating a LINQ query which will beat this algorithm? What are your trade-offs? How does your algorithm work? Let us all know!