O/R, databases and "leaky abstractions"
O/R has been on my mind in varying degrees since I first heard Scott Ambler (http://www.ambysoft.com/mappingObjects.html) speak at SD Expo a few years back. Every time I have gotten close to taking the leap, by attempting to implement Scott's UML model, learning EJB (poorly), and now by evaluating various .net O/R products, I am struck more by what I'd be losing than what I'd be gaining. Sometimes it makes me feel like a Luddite, but lately I have been putting more thought into my discomfort. Joel Spolsky's clever "Law of Leaky Abstractions" has put more perspective on the problem for me. Seeing my own systems grow to many millions of records has had an impact too. The whole java vs .net petshop discussion contributed to my thinking. Part of the problem is that simpler cases used to prototype technologies all work, every one. Every one works, so long as the data size remains smallish, the assumptions remain close the original assumptions, the complications minimal. After that, things start to get interesting.
As much as I have admonished, browbeaten, berated my programmers over the years to reduce the importance of relational databases and especially stored procedures in their designs, I have come to the conclusion that ignoring the power of a relational database system is foolish. Failure to understand and embrace the very important features they offer is foolish. It seems to me that part of the spirit of O/R is to make the database an invisible persistence mechanism, unimportant to the overall architecture. In most cases, I think this is a mistake. This I think could easily become a huge "Leaky Abstraction", on a scale that makes COM+ transactions seem like a good idea. If you look at a database server as a system with tables and indexes for which you must use this clunky 1960's vintage language to populate and query, then the view that making this transparent is probably very appealing. However, if you look at a database as a powerful system of storing, caching, indexing, optimizing the search of values, I think you will view the database as a first class asset, to be used to it's full potential. This isn't a question of whether or not to use stored procedures or not. It's a question of how to include the ability to use the power that hand tuned sql, which could be expressed as stored procedures, parameterized dynamic sql or plain sql.
This view is not the same as believing that the database is the "first tier", a phrase used by Frans Bouma to describe those who prefer to use stored procedures for all their database access calls in a recent advanced-dotnet post. I would never choose to base my domain model directly on my database schema. I think you have to recognize that stored procs and dynamic sql are both commands to a database. It's a recognition that the database can do things that are very difficult or impossible to accomplish in memory in your domain model. And, that some of those commands will require hand tuned queries, and intimate knowledge of the structure and context of a particular database to adequately construct. So, in my ideal O/R mapping scheme, while I may be perfectly fine with generated statements, I insist on having the ability to do my object mapping based on hand written code when needed. I insist on knowing that if a database performance problem appears, as it inevitably will, I will have a way to carefully examine a query, tune it, and replace it with a different query or a stored proc call. Whether or not it's dynamic sql is not the issue, it's whether or not I can substitute a bad query that's the issue. And, it's not trying to hide the database as an ugly step child.
In my mind, it should be possible to have transparent database access and still allow customized sql. In my database layer, all (read most) sql is stored outside the code. The statements are referred to by name with parameters, but the names do not have to correspond to anything database specific. The names can be thought of as domain model data commands. The statements could easily be generated from domain model metadata, just as I can generate statements from tables and stored procedures now. A factory produces the actual command calls and executes them, an architecture that allows for interception, multiple databases and data providers, chains, decorators and all the other OO goodness you would like to apply to a problem like this. Calling code has no idea if it's using a stored procedure or parameterized sql. Since I don't have an O/R layer, the programmer must know which commands to call. If an O/R tool used a DAL approach like mine, I don't think I'd have the same sort of hesitation about using the tool.
Finally, in my experience, databases and code evolve at different timeframes, especially if many applications share the same data. Even if you have the ability to architect every data call to use a shared domain model, in any large organization, you will not be able to migrate all of your applications at the same time. If project timeframes overlap, you always have a long cycle of retiring old, maintaining current and rolling out new code. In most environments, you also have software that simply will never be able to use your domain model directly. And, in my experience, the database will outlive every specific application, architecture and hot trend. The trick then is to acknowledge this basic fact, acknowledge the power of the database, but not let it dictate how any particular application or domain model should execute.