Dealing With Database Differences

One of the common debates in database programming is whether to write your code for a specific type of database (such as Oracle or Microsoft SQL Server) or instead to use a middle software layer to help abstract the logic so that you can use the same application with multiple vendor's databases. Those that want to port their applications from one database to another or who need to write a cross platform application will need to make some design decisions at the beginning of a project or do some research when it comes time to port.

In a recent project at Media Net Link, we decided to provide AthenaRMS, our help desk application, as open source and make it cross platform so that a potential user could use one of the free databases like Postgres or MySQL, or, if corporate requirements mandated, their corporate Oracle database. We chose these three databases because of their popularity and widespread use: MySQL and PostgreSQL are free open-source database solutions that provide varying levels of complexity, while Oracle maintains a strong presence in large business installations. By making it compatible with these database systems we made sure that the product was accessible by as many potential users as possible, from small businesses to Fortune 500 companies. Attaining that level of compatibility wasn't easy, however. During the process we discovered various pitfalls that developers need to avoid when building cross-platform database applications.

The first step in managing code that interfaces with multiple databases is to use a database-independent interface, such as Perl's DBI or PHP's PearDB. Both modules provide a consistent API for database operations, allowing developers to send queries to the database in a standard fashion. For example, in PHP you might have to write code such as:

switch ($DB_TYPE) { case 'mysql': $dbconn = mysql_connect($DB_HOST,$DB_USER,$DB_PASS); mysql_select_db($DB_NAME, $dbconn); break; case 'pgsql': $dbconn = pg_connect("host=$DB_HOST dbname=$DB_NAME user=$DB_USER password=$DB_PASS"); break; default: error_out("Unknown database type: $DB_TYPE"); break; }

PearDB abstracts away the different databases and provides a simple interface instead:

$dbconn = DB::connect("$DB_TYPE://${DB_USER}:$DB_PASS@$DB_HOST/$DB_NAME");

As useful as this is, it does not solve many other problems inherent in cross-platform development. We'll address a few in this article, and how you can work around them.

USING vs. ON

While all three databases support the USING keyword, PostgreSQL and MySQL differ from Oracle in a subtle way. Consider the following SQL query:

SELECT a.id FROM apples a LEFT JOIN bananas b USING (id)

This works fine in PostgreSQL and MySQL, but not in Oracle (which returns the error "ORA-25154: column part of USING clause cannot have qualifier"). The obvious solution of replacing "a.id" with just "id" (removing the offending qualifier) is insufficient -- although it works in Oracle and PostgreSQL, it fails in MySQL, returning the error "ERROR 1052: Column: 'id' in field list is ambiguous".

The solution is to avoid the USING keyword completely and rely on the ON keyword instead. The ON keyword is similar to the USING keyword, except it is more versatile, allowing the user to specify joins based on any column (as opposed to USING, which requires that the specified column have the same name in both tables). Rewriting the above query, we have:

SELECT a.id FROM apples a LEFT JOIN bananas b ON a.id = b.id

SEQUENCES

Often, data within a table needs to be identified using a unique identifer. This allows easier access to the data, as well as allowing the database to optimize accesses. These identifiers are usually generated using sequences. A sequence is a construct that automatically generates a sequential number each time it is accessed, which is perfect for creating unique IDs.

In Oracle and PostgreSQL, the process of assigning a sequence to a table (particularly, to the ID column of that table) involves creating a sequence object in the database and then creating a trigger that accesses the sequence whenever new data is inserted into the table. MySQL, however, does not support sequence objects. Instead, columns can have the AUTO_INCREMENT property, which means that whenever data is inserted into a row, the column will automatically be populated with a value created from within MySQL's internal sequence generator.

Faced with two such different paradigms, how did we manage them both? We took sequence generation out of the database and into the code. Instead of relying on the database to supply the sequence, we created a table which contained two columns: the sequence name and the current sequence value. Whenever we needed to get a new value we would access the table, find the correct value, and increment it. This method does result in a few more database transactions than if we used the internal sequences, but made the code much more standardized.

DATE FORMATS

Date formats are another point of contention among the three databases. While Oracle and PostgreSQL share similar date formats, MySQL uses a format similar to that used by the POSIX strftime(3) library function (i.e. date directives prefixed with a percent character). One way to deal with this difference is to pick one format and use a function to translate to another database format as needed. For example, in the following Perl code snippet the function to_date() accepts a strftime-style date string and formats it based on the current database type, which is kept as a global variable. The returned string is then used in the SQL query:

$DB_TYPE = 'oracle'; $date_str = &to_date("%m/%d/%Y"); ## will be translated to MM/DD/YYYY
$sql = qq{ SELECT * FROM sales WHERE sale_date = to_date(?, '$date_str') };

There are many other small differences between databases that will invariably pop up when developing any kind of cross-platform software solution. By leveraging common language constructs (using ON instead of USING),abstracting out functions (with sequences and dates), and relying on
a common programming language or operating system to move the logic away from the database and into a middle tier, you can reduce redundancy as well as maintain clean, readable code.

For more information, check out the following resources:

A Short Guide to DBI:
http://www.perl.com/pub/a/1999/10/DBI.html

Porting from Oracle PL/SQL:
http://developer.postgresql.org/docs/postgres/plpgsql-porting.html

Geordan Rosario - Media Net Link