Keep Your IQueryable In Check

Updated: I’ve posted a follow-up on this post here.

There is a good chance that you have seen Oren’s recent “Repository Is The New Singleton” post. I wanted to have a quick discussion of his approach, and discuss why I don’t 100% agree with it. Let me first say that I don’t have a problem with the repository pattern. I do have some issue with the way that it is implemented sometimes, but overall I think it provides a good layer in which to encapsulate the queries of your application.

On that note, I do see value in the pattern which Oren refers to as the Query Object pattern. Some people may look at the pattern Oren is using and see “class explosion”, but I think that it all depends on what your preferences are. I think that the repository method can be implemented in most medium sized application without much trouble, but when you get to really large applications, you can get into trouble with lots of little queries that can create huge repositories. I’d much rather have one hundred query objects than to have repositories with a hundred methods.

Okay, so I agree with Oren that encapsulating queries into objects is not a bad idea, especially if your application has a large number of complex queries. My problem with Oren’s approach comes from the example that he gives of one of these “Query Objects” in a later post (shortened to save space):

public class LatePayingCustomerQuery
{
    /* snip */
    public IQueryable<Customer> GetQuery(ISession session)
    {
        var payments =     from payment in s.Linq<Payment>()
                     where payment.IsLate
                    select payment;
    
        /* snip */
        
        return     from customer in s.Linq<Customer>
                where payments.Any( payment => customer == payment.Customer )
                select customer;
    }
}

My problem is in returning IQueryable from the “GetQuery” method. Oren does it in order to provide the ability to sort, page, etc… closer to the UI layer, and I agree that this is necessary, but I think that the result needs to be constrained a bit more. Passing back the IQueryable allows much more than just sorting and paging, it really allows heavy modification of the query itself, and in a way that you are really no longer able to control. So it seems to me that Oren’s problem really isn’t so much with the repository pattern as it is with the idea of completely encapsulating all interaction with the ORM inside of the repository. He thinks that this causes us to lose much of the power and flexibility that ORMs provide us. But why can’t there be a middle ground?

So let’s assume that we have a fairly large application with a bunch of these finely crafted query objects, and you have a developer who has the above query, but decides that they just need to add one more filter to it. Now I know that people will say that if you have a developer who breaks the rules then you probably don’t want them to work on your team, but this is the real world. And in the real world people cut corners. So they do something like this:

latePayingCustomQuery
  .Where(c => c.LastName.StartsWith("E"))
  .Skip((page - 1) * itemCount)
  .Take(10);

This developer may have known better, or maybe they didn’t, but what they did was change the query so that we are now potentially filtering against a non-indexed string column in the database. And there is really no limit to what the developer could do to the query. I can already hear people screaming about bad developers and not wanting someone on the team who cuts corners, etc…. but I honestly don’t want to hear it. Quite often I don’t get to choose who I have to work with, especially when working with clients. So I don’t want to hear about your fairytale land where all developers do the right thing all the time.

So we really have two conflicting requirements, we want to provide the flexibility of paging and sorting up closer to the UI, but we also want to lock down the queries a bit. So what if we defined our own interfaces which we could use to wrap IQueryable? Something like this:

public interface IPageable<T>: IEnumerable<T>
{
    ISortable<T> Page(int page, int itemCount);
}

public interface ISortable<T>: IEnumerable<T>
{
    ISortable<T> OrderBy<U>(Expression<Func<T,U>> orderBy);
    ISortable<T> OrderByDescending<U>(Expression<Func<T,U>> orderBy);
}

This would allow us to have a result type of IPageable<T> which the developer could use like this:

query.FindAll().Page(1, 3).OrderBy(q => q.PostedDate);

Looks pretty good. The implementation of the IPageable interface could look something like this:

public class Pageable<T>: IPageable<T>
{
    private readonly IQueryable<T> queryable;

    public Pageable(IQueryable<T> queryable)
    {
        this.queryable = queryable;
    }

    public ISortable<T> Page(int page, int itemCount)
    {
        return new Sortable<T>(
            queryable
                .Skip((page - 1) * itemCount)
                .Take(itemCount));
    }

    public IEnumerator<T> GetEnumerator()
    {
        return queryable.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Creating a pageable item would be as simple as just passing any IQueryable to it. Then all you have to do is pass a page number and an item count and it returns a sortable object. The sortable object would look something like this:

public class Sortable<T>: ISortable<T>
{
    private IQueryable<T> queryable;

    public Sortable(IQueryable<T> queryable)
    {
        this.queryable = queryable;
    }

    public ISortable<T> OrderBy<U>(Expression<Func<T,U>> orderBy)
    {
        queryable = queryable.OrderBy(orderBy);
        return this;
    }

    public ISortable<T> OrderByDescending<U>(Expression<Func<T,U>> orderBy)
    {
        queryable = queryable.OrderByDescending(orderBy);
        return this;
    }

    public IEnumerator<T> GetEnumerator()
    {
        return queryable.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Here you can see that it allows us to chain sortable calls so that we can still order by multiple columns like this:

sortable.OrderBy(q => q.PostedDate).OrderBy(q => q.Title);

At this point you may be thinking that by exposing the “OrderBy” and “OrderByDescending” methods we have opened up the bad querying box just as much as passing back IQueryable. Well, with the code provided this is true…but if you use a bit of imagination you can also see how we could use these classes to limit the columns on which different types are allowed to be ordered by simply walking the expression tree when the enumerator is requested or intercepting the calls to “OrderBy” and “OrderByDescending”. It is also possible to create similar interfaces in this same way to provide constrained interfaces to other types of functionality such as projections.

Overall I think that ORMs have certainly revolutionized the way that developers think about data access, but I think we really need to be careful about letting IQueryable (or any other querying mechanism) flow up through our applications because all we are doing is letting spreading queries all over. I pretty much agree with Oren in his thoughts that the repository might not be the solution to everything, and that we need to think about data access in a slightly different way, but I also think that there is some serious need for a bit of control.

So what are your thoughts? What are the problems that you see in this approach? Do you feel like we should let IQueryable go throughout our applications? Do you think I am being too Byzantine? Let me know!

Be Sociable, Share!

23 comments

  1. As someone who does not have a large amount of experience but has worked on projects where I’ve seen a varying number of strategies work and not work, I’ve come to the conclusion that a lot of developers who are all very much smarter than me are still ironing out the best way to do things (just read Oren’s posts from ~ 2005 and the ones ~ today), and it is depressing.

    After using the tried-and-true 3-layer Microsoft design that Oren mentions in one of his posts and finding it to be somewhat suboptimal (there is real business logic in those SQL where clauses and update statements, and in a lot of LOB applications that’s pretty much where almost all of your logic lies), I have gravitated to a pattern that uses both repositories (or at least classes *called* repositories) and first-class query objects. Keep in mind that none of this is in the context of using an ORM; I’ve always generally worked on legacy projects where an ORM was not used and adding one would probably add more complexity and flailing around because I simply have not yet and don’t know how to use one. (Not a slam against ORMs; just an admission of my incompetence relating to them.)

    The repositories are for loading and saving specific "aggregate roots" (I’ve always thought that this was kind of a silly term) to and from the data store. I have always been unclear on the exact difference between a DAO and a Repository, but to me a repository is quite like a DAO except that unlike a DAO it is responsible for saving and retrieving the whole suite of objects that are beneath that aggregate root. So the OrderRepository’s Save() method is also responsible for inspecting the Payments on the order object and determining which ones are updated and need to be saved, etc.

    But the problem I quickly encountered with our company’s Web site is that to display a simple product page, we need an absolute crapton of information from the database–product reviews, fabric contents, care instructions, "customer also bought" products, matching products, similar products, product images, colors, sizes–and my naive attempt at loading a full cream object graph for this from a repository (1) wasted a lot o’ memory and (2) took a long time sucking information from the database that was never even looked at before being discarded.

    The result was a ProductDetailsQuery object that returns a giant value object ("ProductDetailsDocument"). The query object is unashamedly tailored to this particular page of the Web site; it grabs just the information it needs; and it does this by executing a series of SQL statements in a single batch statement. Execution of the product details page went from 3 seconds on incredible hardware to "instant," but it still left a bad taste in my mouth. I hadn’t created a "ProductDetailsDocumentRepository"–that seemed stupid, it’s not persisted, it’s more of a view than anything–and it didn’t really fit as a method on the ProductRepository because, well, it’s not a Product–but it felt dirty having data access code in this object that seemed to not fit well in any of the 3 Microsoft layers. Was I doing it wrong?

    The conclusion that I’ve come to is "well, I don’t know anything" but I do know that I’ve come up with something that seems similar to what you describe here. UI can’t query and filter things directly as with an IQueryable, but there should be no qualms about writing a Query object whose capabilities are precisely what that view needs to display–and if paging is part of that requirement, then the methods on the Query object have paging capability, and whether or not that is added as a specific IPageable interface as described here depends on the situation. Since my query object is only probably going to be used in one place, I don’t see the point of the interface in my scenario, and I don’t have one.

    The repositories are still used for loading specific "full cream" objects when actually interacting with them to make changes in the system.

    In other words:

    // Why you would want to set one product’s
    // similar products to inactive, I don’t know,
    // it’s just an example
    using (ConnectionScope cs = new ConnectionScope())
    using (TransactionScope tx = new TransactionScope())
    {
    ProductRepository products = new ProductRepository();
    ProductDetailsDocument doc = new ProductDetailsQuery("5647589").Execute();

    Product product = products.Load(doc.SimilarProducts[0]);
    product.ActiveStatus = ProductActiveStatus.Inactive;
    products.Save(product);

    }

    The ConnectionScope class has some internal methods that the Query and Repository classes can use to retrieve an appropriate open SqlConnection to the database, which allows UI code to still be able to clearly mark the boundaries of an open connection and transaction. It’s not perfect, and it’s not completely unit testable (a lot of my tests people would consider to be integration tests), but I don’t think it’s bad for the evolution of a legacy system.

    Wow, that was long and kind of rambling. Hope it made for good reading and at least gave insight onto another developer’s Quest for the Golden Data Access Solution. I think in general I’m trying to agree with you; yours is the ORM-backed, first-class solution, and I flail around at a baser level. Great post!

  2. I hear you, Nicholas. I managed to get my head around NHibernate enough a couple of years to roll out a VB.NET solution that used, nominally, the repository or service approach. "Nominally" because I borrowed code that was clearly beyond my ability to comprehend at the time and of course now I see that it suffers from bloat beyond belief. After all, how many methods can a repository have in it before it just gets a little ridiculous? (answer: about 10)

    Anyway, I have since some to the same conclusion, which is I’m still grappling with software architecture while trying to lead a development team by example — now in C# thank goodness. The big difference is that the advise I seek from bloggers who I appreciate immensely (and in no small measure admire for their understanding) is taken with a much larger grain of salt. So Justin, thanks for continuing to offer solutions to the seemingly intractable problems facing your average technical lead when the moving sands of "what’s right" shift once more. Don’t get me wrong, I see why Oren, you, and Nicholas are saying what you’re saying and I agree on this point, but what keeps frustrating me is the complexity that our platform of choice (C#) seems to be throwing my way. I swear, the Django stuff I’ve been tinkering with recently just seems pretty darn straightforward! Mixins – get it. IoC – don’t need it. Fluent API – sure thing.

    Who am I kidding? I [i]love[/i] all this crazy stuff :-) But lesson learned: think long and hard before taking the collective wisdom as gospel, and trust your gut. And thanks again. Truly an excellent post.

  3. I’m sure I’m missing something, as SQL design is hardly my strong point, but what’s [i]wrong[/i] with passing an IQueryable to the View layer? (Or if not the View specifically, a layer on the same tier as the View layer, such as the Model or Controller, though intuitively the View seems best as .Skip() and .Take() would belong there for data binding/etc.).

    From reading your article, it seems that the biggest problem with it is that…you don’t trust your other developers. This seems to be a curious inversion in belief, instead of trusting developers to do what’s right (and discussing it with them when they do wrong), you assume that they’ll do something wrong.

    We should remember that people have a nasty habit of living down to their expectations, so by assuming you’re surrounded by idiots, you’re only ensuring that you’ll be surrounded by idiots.

    (This shouldn’t be taken to mean that you should write interfaces that require domain expertise to use properly and are trivially used incorrectly, on the assumption that those around you aren’t idiots and thus will figure it out; you should certainly follow FxDG conventions and make The Right Thing as easy to do as possible.)

  4. @Jonathan I’m not necessarily saying that developers around me are idiots, but whenever I design a system I often want to design things to guide developers down the correct path. In my opinion, by passing IQueryable up through the layers, you are giving implicit permission to use the functionality that IQueryable provides.

    Why should I pass IQueryable up to the view when I don’t want queries being formed in the view? If all I want in the view is ordering and sorting then I should try to limit that layer to doing these actions.

    I completely agree with you that people live down to their expectations, but on the other hand I think that it is important to keep honest people honest and guide them down the path to success. There are a million ways to write horrible software, and I shouldn’t add to that with the design of my software. :-)

  5. Hi Justin,

    Yep I have a very similar scheme to yours.
    For me it’s about trying to keep query logic in queries (or specifications).
    But there are several other nice benefits, for example both your paging method and mine offer arguments in terms of page number and page size which is the way the developer will be thinking at that point, not skip (pn-1) * ps if they were working with raw IQueryable.
    I also expose properties on the base interface that shows such things as PageNumber, PageCount, TotalCount, TotalPages, a feature which as has been kicked around on Rob Conerys blog and such. Again it’s exposing value at that particular point in the process.

    Also as the dal I have is designed to work with Linq2Sql and EF, if the dev calls the page method without setting a sort order then I implicitly sort by the Id column of the entity. This is because you must set a sort order before paging in EF. I can get away with that as I keep returning the sortable interface until the page method is called at which point I return an interface with only ToList and IEnumerable<T>, thus forcing a convention where paging is the last operation.

    Like you I just see it as a way guiding me to do the right thing.

    Paul

  6. Well, going back to the layered design why not break the Queryable interface after passing the repository level? What I mean is i’m fine with my Repository<T> returning IQueryable<T>, which allows me to have 1 repository for the complete application then the service layer all consume this repository using filters (LateThingyQuery) and from here up something along the lines of what Justin proposed. The only thing I don’t like about LateThingyQuery is that it would make (in the case of NHibernate) the dependencies for ISession bleed up one level into the application.

  7. Hi,

    I agree with Jonathan, in the end your adding a whole lot of constraints and complexity that’s not necessary. If the others developers mess with the IQueryable in a stupid way, changing the expected behavior, the problem is not passing the IQueryable but having developers who don’t understand the three tier model, simple as that.

    People should know what they’re doing, and the system should have enough automated tests so this kind of misuse can be detected early, even though
    testing is not the pain point here.

    I read Oren comment many times that in the software development business we are consenting adults, we should assume people know what they are doing instead of protecting them from making mistakes. They should be able to make them and as we should be able to detect them, and after that, learn something so everybody can make it right the next time. This can be an "impossible" task when dealing with cubicle-minded-developers, but I’d rather assume they can evolve too.

    Cheers

  8. @Rei I think that once you do that you aren’t using a repository anymore. Or at least you aren’t really using it in the way it was meant. I think that forming queries in the service layer is probably not where you want them, and if you are going to go this route then you are probably better off with the query per object route that Oren took.

    @Rafael All I can say is that I really honestly wish that I agreed with you. The problem is that when corners are cut, we can’t detect it, because logically the software probably works. I want to work in an environment where all the developers are good and care, but that just isn’t reality. So yeah, I wish I agreed with you. :-)

  9. @Justin,

    This has nothing to do with "cutting corners", I view this as an XP approach: respect your fellow developers and have the courage to let them work without needless restraints, assuming he is a capable professional. If he doesn’t fit this profile, teach him, fire him or go to another project. It’s just not worth the headache.

    This is true to dynamic languages coding style and the "Ruby Way", this is somewhat the same discussion about emulating static typing in dynamic languages. But I’m pretty sure this is not possible to apply to the regular "MS way" and I’m sure it won’t fit most MS shops, and I’m damn happy to be far far away from any place like that.

    Hope one day you can agree with me.

    Cheers.

  10. @Rafael But you’re making the assumption that the person consuming your code is going to be anyone that you are even in contact with. It could be someone consuming your code 6 months from now when you have left the company.

    I know where you are coming from, this is an argument as old as time itself. This is in the same vein as the virtual versus final argument, or , as you alluded to, the approach in most dynamic language programming environments. Since the developers aren’t locked out of doing anything. Unfortunately I’m not sure there is any "right" answer to this.

  11. @Justin,

    I don’t think there’s a definitive answer to this, most design choices are not definitive, and that’s ok to me. I believe we agree on that.

    I’m not assuming we’ll ever get in contact with the people maintaining the code, but I’m assuming the people hired to do that will be decent coders, nothing special, but at least people who understand how a 3-tier architecture works and that they shouldn’t, for example, mess with the query in the interface layer except for adding interface concerns, like pagination, just that.

    Adding constructs, like the custom pagination stuff, is easier to explain to these unknown developers? Maybe using standard interfaces, like IQueryable, would make this kind of transition easier and less dependent on custom, ex-employee code, for all they will need is the MSDN help or something like that. The simpler the code (which includes avoiding custom framework extensions/additions), the easier to maintain.

    If a developer wants to code stupid things, they will find a way to accomplish that, even with detailed interfaces and static typing, Murphy Law will make sure of that, resistance is futile :)

    Cheers

  12. I think you are trying to "protect" your fellow team members, when you should be sharing/teaching them instead.

    There is an interface for Paging and sorting, its called IQueryable ;-)

    If another team really wanted to do a "where" outside the Query, there is nothing stopping him, creating an overloaded method in the query and returning an IQueryable.

    just my 2 cents…

  13. my 2 cents,
    YAGNI
    I prefer studying instead of protecting.

  14. @Justin
    I got your point and I partially agree with you. Although I felt you touched a dangerous subject: developers don’t wanna be taken by fools. I know you didn’t mean that but the comments above just talk for yourselves. Actually I feel the very same pain as architect of a large company with lots of developers.

    Instead I’d prefer to say that I like your ideas just because they are cleaner and elegant way of doing things.

  15. I’m strongly on Nicholas side,
    Domain Driven Design patterns are all about managing domain state, and the Repository pattern is the domain view of persistance concerns in the domain. It’s main purpose is to find aggregate roots so that we can change there state when things happen in the domain.

    Displaying paged lists is not an event in the domain that make thing change, this is simply a query on the domain.

    This kind of query often spans on several aggregate roots/entities and often consolidates data on it.
    A real flexible Query service can provide this capabilities using the query object pattern.

    But the repository should be focused on its primary mission that is totally Domain (or Business) oriented.

    I blogged about that here :http://thinkbeforecoding.com/post/2009/04/08/Back-on-Repositories-and-Paging-Introducing-reporting

  16. @Justin

    I totally agree that passing IQueryables to the view is a no-no, but I still implement IQueryable returns in my repositories (when using that pattern). The reason for this is that I usually build a service-layer in-between the repository and the view where I return IList, and create more spesific methods. GetOrders(), GetOrders(int orderId), GetOrders(DateTime since), for instance. Further-more I create extension-methods for the IQueryable<Order> in order to link methods and create a nice easy-to-read method call in my service-layer:

    return _orderRepository.GetOrders().WithProduct(productId).OrderedAfter(someDate).ToList()

    (this is just an example written on the fly, but it states my point).

    I also use strongly typed ViewData-models, usually a super-model with aggregate-roots and sub-models, filling them when I need them with ActionFilters. I feel this gives me a tidy layered project-structure, and I always know where to debug stuff. As well as the IQueryable enables lazy loading (which is then triggered in the service-layer that returns IList or SingleOrDefault)

    Oh, I also keep my repositories really skinny. (Mostly a Get(), Save() and Delete() method)

    What do you think?

    Regards
    Yngve

  17. @Ynvve If it is working for you, then I hesitate to say it is a bad thing. Historically speaking, the repository pattern is there to contain your data access concerns. If you are returning IQueryable then you are not doing this.

    But, it rarely pays to be so rigid. If you find that in your application it makes more sense for you to do what you are doing, then do it. If you find that it starts causing you issues with troubleshooting query performance or controlling what queries are formed, then look at a different solution.

  18. Very interesting discussion. I’m using a generic repository pattern on two MVC projects at the moment (one with NHibernate and one (somewhat painfully) with Entity Framework), and after thinking briefly about ISortableAndPageable<T> or some other such abstraction a while back, in both cases I’m now ‘leaking’ IQueryable<T>’s right into the Controller. It really doesn’t feel that bad, or *evil*, although in both cases it’s a small team of developers so I guess that makes it more manageable.

    I have one question though.

    What about object navigation from the domain model? Assuming transparent lazy-loading applies, is having a reference to a domain object that is attached to an open ISession (open session in view pattern) less dangerous than a reference to an IQueryable? It seems to me that if you are saying an IQueryable in your UI layer (or Controller at any rate) could be ‘dangerous in the wrong hands’, then open-session-in-view should be viewed with the same suspicion.

    With a reference to a domain object at an aggregate root, there’s really nothing (aside from common sense or a very ‘partitioned’ domain model) to stop someone writing code in the Controller that recurses through half of the database. This of course could kill your application performance quite nicely. Is that a situation you would deal with by aggressively partioning the domain model or by some other method, or is this a situation where you ‘trust’ a developer not to do the wrong thing?

  19. *Justin please delete me first post, re-submitting to be notified on comments, thnx:)*

    My head hurts.

    I’m using the repository pattern and I more or less hate it, summed up, as Dylan said, when you hit about 10 repo methods it starts getting ridiculous.

    I also pass an IQueryable<T> and use Linq 2 Sql and thought I hit the ‘universal’ golden nugget but have been constantly, like most here in two minds about whether or not it’s the best thing to do..

    I guess I’m of two opinions:
    1) Teach, then trust your developers otherwise he/she should not be working on the project (easier said than done, but put measures into place so rules are followed)
    2) I guess in light of point one, I don’t see a problem passing around IQueryable’s. It adds flexibility, and removes complexity and like a wizard’s wand, if harnessed correctly could deliver awesome power, but in the wrong hands could cause an incredible disaster. (May or may not be true, any wands handy? lol) But seriously, I believe taking on the query objects approach could be a good thing and I’m going to take at a look at it at least, and I believe as long as you have locked down the -maximum- a developer, or implementer of the query could use (so limiting results or bottom line where clauses) not too much damage could be caused.

    That’s my 2c.
    PS. Hi Nicholas, you helped me out over email with that WCF stuff =) Hope you’re doing well. Small world.

    .. It seems there is so much confusion when it comes to design patterns and it really blows my mind how we can’t just get some definitive rules out there in stone for certain application make ups. Then so everyone is clued up maybe we need dotnetdesignpatterns.com or something. *registers (seriously)* I know one size won’t fit all, but some decent boilerplate samples. That’s it, stay tuned I’m going to make that a place to submit info & samples.. Who knows, one day we might be able to choose application architecture defaults from within Visual Studio…. the dream (or not?)

  20. @Graham Our industry is so nascent that it really isn’t surprising that there isn’t anything decided upon yet. We are still in the construction equivalent of ancient egypt.

    I would worry about the changing patterns and designs that people propose. Everyone is basically experimenting and you just need to do what works for you. If you feel pain, then look for a better way to do things.

    And just for the record, I seriously hope that we never choose our architectural defaults from within visual studio. :-)

  21. Microsoft love putting things into little manageable boxes, so it probably shits them they can’t control this :)
    But yeah it’s just like what’s happened with an ORM, 80% of tasks become easier, quicker to write and automated, then you have new problems for anything complicated.

  22. I agree with you Justin, exposing IQueryable is leaky. Not only I worry about the missuses of other developers, but also myself can end up making bad queries. I always return a closed graph. Furthermore, I never use deferred loading.

    The custom specification object is a good approach. The class doesn’t have to serve only 1 view, but it can contain the union of options for all the calling views.

    Sometimes I also use enums, which is an even more constrained approach:

    enum ProductsSort {
    LatestRelease,
    BestSelling,
    BestRating
    }

Leave a comment