A Lightweight Alternative to ORM:
Mapping database entities to in-process data objects

Programming languages come and go, programming abstractions come and go, but the data is there to stay. The application may be rewritten multiple times over the course of decades, but each iteration will still have to interface to the same relational database and relational structure. This means that the RDBMS's structure is much more important than the program's structure. It is therefore perplexing that each programming language's interface to the RDBMS attempts to force programming language data structures (such as Object-Orientation) onto the RDBMS's structure, rather than mapping the RDBMS's structure onto the programming language's data structures.

Object-relational Mapping (ORM) is a family of techniques and methods for mapping data stored in a RDBMS to data objects stored and manipulated in a running program by way of representing incomplete Objects within the relational structure of the database. This article examines a few of the difficulties in creating a model that maps the data stored in a programming language's OO structure onto the RDBMS's relational structure.

The main reason for most of the difficulties inherent in mapping RDBMS entities to Object-Oriented objects is due to the fact that the two data models have almost nothing in common other than serving as a model for data storage. There is very little in the OO conceptual model that is present in the RDBMS conceptual model (and vice versa) thus it makes no sense to attempt a mapping between the two models. Attempts to create a map between the RDBMS and in-process data structures, such as current ORMs, attempt to force an Object-Oriented data model into the RDBMS model.

This article presents a data model simpler than OO that approaches the problem from the other direction, i.e. mapping the relational data model onto the programming language's data model. The module that implements this mapping is intended to be used as a standalone library that is easily incorporated into other software and thus uses a very simple SQL library wrapper interface.

TODO: incomplete abstract

Table of contents

1 Introduction
2 The Problem with ORMs
3 Heavyweight wrappers
4 A Lightweight Wrapper

There are multiple problems with mapping database records to in-process data objects. This difficulty is often termed Object-Relational Impedance Mismatchireland2009classification in the literature. Wikipedia lists nine conceptual incompatibilities between an Object-Oriented model and a Relational model. The only major point of convergence of these two data models is the fact that they model data. The two models differ in almost every respect:

Concept Object-Oriented Relational
Type of data model Hierarchy and compositions Relational
Typing system Nested types Composite types
Data manipulation Programming language Methods Declarative query language operators
Data retrieval Via interface Via declarative views

Many of the above problems can, with enough grim determination, be worked around sufficiently to present a relational model as an object-oriented model. However it is simply not possible to paper over the cracks that appear when trying to squash a large relational dataset into a small hierarchical dataset.

Moving data from a RDBMS into an OO object is problematic. By this I means that most queries that do anything interesting (other than CRUD) are carefully structured to bring in only the information required for that particular situation. Using an ORM will get you boilerplate code implementing CRUD for every individual table and every individual object, but frequently we don't have a need for a collection of records in an individual table. Instead we want to answer questions that span multiple tables; we want know, for example, "What are the subjects-enrolled fields for all users active prior to the last week?", while an ORM is more suitable to "Give me a collection of user objects who did not log in for the last week, and also the subjects those students were enrolled in".

Moving data from a programming language object back into a database is equally problematic. The original data from the database that was used to populate the programming language function may not map easily back to the database - after all the data may have been retrieved from a view or stored procedure. You can't update a view and the ORM is certainly not going to be able to figure out which tables contributed to the view, and how.

The other issue related to queries - there will be significantly more queries that are made across a group of tables than there are queries to retrieve single objects from a table. ORMs help significantly when you have dozens or hundreds of objects that you want to serialise to a database, but they help much less when you wish to draw reports from the data in all the tables. When you don't have an object that maps to each of the columns in a query the ORM simply gets in the way. In almost all cases you'll find yourself writing a single-use object, just to contain the results in a single query that may not be used again anywhere else in the program.

In all the cases above the abstraction is discarded for all practical purposes when a relational results set cannot fit into the OO model. This is usually the type of results one can expect when pounding a square peg into a round hole - you may get the peg in, but it isn't going to be easy, neat or as pretty as using a round peg.

To fix most problems ORMs provide their own Query Language2004hibernate so that the caller can ignore the ORM abstraction and directly use a query language. Some go further and drop even the pretence of an OO abstraction and just lets the caller directly execute SQL on the database. All of them provide a cursor object that the caller can use to iterate across the rows of a results set a single row at a time.

Many of the example mappings used in ORM literature are simple enough that one of the following approaches is taken when mapping the OO structure to a relational structure:the_vietnam_of_computer_science

  1. Table-per-class: Each derived class gets its own table. Retrieval and manipulation perform multiple joins on the RDBMS to fully populate all member objects in any particular object.
  2. Table-per-hierarchy: Each base class is mapped to a table that has columns for all the derived classes using a discriminator column to identify which rows belong to which classes. A large OO structure has the potential to massively increase the number of columns in a table and also prevent any further changes to the structure because changes to the OO structure would require changes to the structure of an existing table which already contains data.
  3. Table-per-concrete-class: A single table is created for each most-derived class. While this alleviates some of the problems faced in the table-per-hierarchy approach with regards to ballooning column counts per table, it still results in a few tables with a large number of columns which will have to be changed each time the OO structure is modified with new classes within the same hierarchy.

Regardless of the option chosen, none of them represent a clean map between the OO's hierarchical structure and the relational structure of the stored data. When using OO it might make more sense to use an Object-Oriented Database Management System (OODBMS) rather than a Relational Database Management System (RDBMS).

To say that the ORM is a heavy and complex interface to a database is almost an understatement when you consider the trade-offs that need to be made in the RDBMS structure. The disadvantages become more apparent when the program is eventually replaced but the database remains, because these trade-offs continue to live on in new systems that are designed to work with the broken object-to-table mappings we see above.

The database should be a data store that enforces referential integrity so that all data that is needed is stored, and little to no redunancies exist. It should not be limited in structure because a program needs to serialise objects. After all, the program is not as important as the data, and the database will interact with multiple programs. Restricting the structure of the database means that the ORMs limited capabilities extend to other programs. Even programs that don't use the ORM but use the database are limited to the structure imposed by the ORM.

When we're programming with objects most database questions become "How do we represent this object in the database? How would we update it from the database? How would we push our updated object back into the database?". What we actually need the data for is almost a secondary concern, to be solved after we've solved the serialisation problem.

If you wanted to simplify things to make a lightweight wrapper we'd ask only two questions: "How do we get data from the database into program? How do we send program data into the database?".

By distilling the database usage to these two questions we can ignore many of the issues that the OO approach has to deal with. For example, we don't need to create a once-off object for a single difficult query if we have an easy and generic way of retrieving any list of rows from the database. We don't need to worry about trying to push in data that was initially retrieved from a view.

We will however need to know SQL, which is something we would have needed to know anyway to draw those reports that an ORM cannot draw. As a bonus, using SQL rather than the methods in an ORM means that there are more people who can visually inspect a section of code in the program and determine what that code will do (there are more SQL users than there are users of any particular ORM).

For a start:




Task: Copy the ninth word of this article (including title) into the textbox below: