Helpful Information
 
 
Category: Database Management
The future of RDBMS

"Object-oriented databases are the next step in evolution of data management", "The relational data model is dying, because it is too limited and non-intuitive for modern data needs", "...the industry is moving toward a more XML- or object-centric approach to data management", "Heirarchical, tree-structured, complex data types, instead of just columns and rows ..." Blah, blah, etc...

For the past couple of years, every time I heard or read about this sort of industry hype, I became just a little more uneasy. At first I couldn't put my finger on it. I thought it might be because I was just afraid I would have to learn something new, even if it must be better. (I mean, it's the next step). But every time I read someone going on about how easy it is to work with "arbitrary levels and attributes" and "non-structured data types", etc... I couldn't help thinking this is exactly what I didn't want my database to do. The whole reason I liked using a database was to prevent myself from just arbitrarily adding information, and the above just sounded like a recipe for headaches.

I now believe my feelings are justified, after taking a closer look at the original concept of the relational data model, proposed by Codd & Date. It's not just "one way to look at things". It is more like a self-evident mathematical theorem. In other words, they didn't invent it, they discovered it. By definition, there is no other way to really truly make sense of your data, to get rid of all redundancy, and to maintain true integrity of your data.

Before anyone jumps on me for making such a "generalization". You might want to spend some time at the following site: http://www.firstsql.com/dbdebunk/

This site is run by a couple of interesting people, one of whom happens to be C.J. Date himself. Yes... one of the original two guys who started the whole thing in motion. He and his two associates have some very interesting things to say about the current state of databases and the computer industry in general. For example:

1. SQL != Relational Data Management. according to them, SQL is actually an incomplete implementation of the true relational data model, and so the study of SQL itself is not enough to truly understand relational data. Yes, it's the best we can do, for now, but we have hung on to SQL far too long, and it is time to come up with a truly capable declarative language for data managemen.

2. XML databases -- it's already been tried. It's called the Heirarchical data model, and the problems associated with it in the 60s are exactly what led to the need for the relational data model

3. Object-oriented databases don't have a well-defined data model, or even a good concept of what a data type is. Thus there is literally no way they can provide true data integrity.

4. The universities and the computer companies are losing sight of the importance of understanding data fundamentals. Instead, they are just pushing developers to learn a vendor-specific implementation: it's a cookbook approach to learning: all how-tos, with no "why" or "why not". They all tend to dismiss any serious discussion of the fundamentals as "academic theory", and not "practical" for business usage. Companies don't look for a database designer, but for an Oracle person, or a Sybase person, etc...

The above is just a glance at what you will find in the topics and articles on the site. I found it very inspirational material, in the sense that these guys cut through all the industry BS, buzzwords, etc... and focus on what is really at stake. It's surprisingly readable, too for a bunch of "theorists" ;).

Anyway, I am glad that we have a General forum in the Database section, now. I hope we can have a few good discussions here about how to approach the whole concept of building databases for our applications. I am here to learn, as much as anyone else.

This is really thought invoking post.

Currently I am reading http://www.firstsql.com/dbdebunk/

JD

I visited the dbdebunk site some time ago and read several of the articles. Very interesting stuff. Have you read "the 3rd manifesto" by Date? I don't think Date and Codd invented the Relational model, that was the work of Codd only, Date recognized the beauty of it early on and started working with it.

I firmly believe in the relational model having a future. It's such a simple and elegant model that it can never become unfashionable. XML has it's uses, Object oriented databases are as far as I can see (and to put it bluntly) ****e. The presentations I have seen just make me think something like "Oh no, not this mess again".

It's interesting to read what Date and Pascal says because it shows that there still is room for improvement. I'm hoping to write my Master thesis around relational databases, and currently it's definitely the field that interests me the most.

What especially interests me is the fact that today we have a one-to-one relationship between the logical view and the physical view. This leads to nasty things like denormalization for the sake of speed. Yuck. Such details should be hidden deep inside any RDBMS. We should only work with the logical view of the data, which of course is perfectly normalized.

And of course, it's always interesting when someone says that an entire industry is doing things the wrong way... :)

Should anyone happen to hit upon other interesting articles on the subject I hope they post them here.

Yes, I suppose you're right. Codd was the discoverer, and Date helped him refine the theory. And yes, it is always fun when someone has the guts to take on an entire industry. Their Quotes of the Week (http://www.firstsql.com/dbdebunk/quotes.htm) section is merciless.

http://www.intelligententerprise.com/ports/search_date.shtml has some more good articles by Date.

I noticed that they mention Ingres as one of the few databases that was designed with the possibility of expanding into a true relational database. Interestingly enough, PostgreSQL is the descendant of Ingres. I found out about the Database Debunkings website from one of the PostgreSQL mailing lists. It seems like PostgreSQL might show some promise in these areas in the future. I have been learning a lot about PostgreSQL lately, and it is pretty impressive.

Also, see them have some fun with MySQL: http://www.firstsql.com/dbdebunk/innodb1.htm (It's kind of sad, actually)

I think this is going to be an interesting forum. I had just sent an email to ZeUs asking for a general database theory forum. Well, a general database forum will do just as well ;).

:eek: Scary conversation... are we talking about the implementation or the concept of relational databases? In other words you keep talking about actualities such as Oracle, Sybase, Postgres but I thought that the relational database was a concept and not an actual product.

As far as I understood the idea of RDBs, is that it's the concept of creating a data storage method whereby data is separated into smaller chunks by means of identifying links between those chunks of data.

I've always been confused about object-orientated databases, does this mean that you control the data not through SQL but through object-orientated programming? Hence the act of updating a record is simply a case of calling a method within the database that will do the update? I don't really get how that works, though I can see some logic. Is the database still tabular or are we now thinking databases in a different way to how we used to?

I could understand the theory of having a BLOB field in a database that contains a further database or XML file, thereby making the data quasi-3D. So a table of employees could have a field called pet that would contain an XML file that details the persons pet in a structured way that allows for much more flexibility/locality than 1 or more tables.

Help, need to cool brain down now. :(

We are talking about both the implementation and the concept. Specifically, no one has ever succeeded in a complete implementation. And, rather than pursue this complete implementation, the vendors instead are creating a plethora of add-on "solutions", application servers and whatnot, in the hopes of re-directing people away from the main problem.

See Codd's 12 (actually 13) rules which define a relational database: http://members.tripod.com/er4ebus/sql/ch01.htm

Yes, you are right about O-O databases. One of the chief complaints is that in order to access the data, you will need to use programming, rather than simple declarative statements, such as in SQL. This breaks the concept of data independence, where any program or any person can access the data, whichout depending on a specific piece of custom software. Those who are pushing this concept say it will be much easier to do "employee.getRecord(12)", instead of "SELECT * FROM employees WHERE id=12;". The problem is, that may sound nice and intuitive at first (to the programmer), but when you have to combine data in complex ad hoc queries, the O-O method will be a nightmare. Also, now your database requires that much more work to connect with other applications, because methods are not standard. A method such as "employee.getRecord()"belongs on the application level, where the application makes the SQL query internally.

The problem with storing an XML file in a BLOB column is: guess what? You have just broken the relational model. Now, in order to get the details on that pet, you have to rely on additional programming methods. Programming should be used for decision-making and user interaction, not complex storing and retrieval. Yes, SQL doesn't give us the easiest way to deal with a "tree-structured" relationship, but that is a shortcoming of SQL, not the relational data model itself.

Don't get me wrong: I can see how in certain circumstances, the data in the BLOB might not be an integral part of your main data needs, so this might be an expedient way to deal with it, but if that data becomes more important later, such as in integrating with reports and statistics from your main database, you are going to have problems. In the end, the relational data model applied properly, allows for more flexibility and efficiency than any other approach.

I see your point about O-O databases, SQL is easy to learn and can be put together to form very complicated statments in one line that conventional programming would take lots of lines to do. One problem though is that SQL does tease people into doing long lined queries that when you go back to them are difficult to follow - they work, yes, but you can't break them up. Yes you can do transactions, but people don't always do use them when they should.

I'm talking from a PHP point of view, dealing with an SQL database when I'm thinking of blobbing some XML. As everything would be processed afterwards then some extra XML processing wouldn't matter. It would also mean that different entries could have different fields (as it were) so the data setup is not so set in stone. This is essentially an RDB though with an added twist.

I think that with the world looking towards less complicated programming procedures then O-ODBs will not replace SQL driven RDBs. 4th generation language superceded by 3rd, that's a backwards step in my book.

:)

Of course, Date argues that SQL is an ad-hoc solution and completely useless, and that it should be replaced by something much better.

Have you ever done any relational algebra? If you have, writing long SQL statements is very easy, even if they grow very big. SQL doesn't tease people into doing anything. You query for the data you want, nothing more and nothing less.

Transactions are *not* meant to make queries easier to read, transactions are something you use to ensure *data integrity*. You don't really mean to say that people should use transactions to make queries easier to read do you?

So if I have a select nested inside a select nested inside another select I should do it like that instead of taking the information from select one into a variable then using that.. oh, it's not SQL is it? It's programming, maybe I should be doing OO databases.

Transactions, though ahh, now...erm let me see...
Right, an example. If I want to update lots of tables then instead of joining them all in a massive long SQL statement, why not use a lot of single updates inside a transaction?

No one ever said that SQL should be used entirely without programming. And yes, while SQL has its problems, it is the best we have at the moment.

If you have lots of SELECTS nested inside other selects, then there is probably another, better way to get at your data. Re-examine your queries, or even your data structure itself. See some further discussion on trees and SQL here: http://www.pgro.uk7.net/fp3a2.htm

I personally am not the biggest fan of tree structures. I know that they are sometimes useful, but I don't see why programmers consider them so intuitive. For example, I think it is a pain to constantly navigate through a file heirarchy in Windows Explorer. Yes, that's why shortcuts were invented, but that is just a band-aid solution that leads to a proliferation of unorganized icons on your desktop.

So IMHO, nested tree views of information definitely have their place, but when information becomes endlessly nested, I question it's usefulness to the user. Maybe the structure of what is being presented to the user needs to be rethought. I rarely see the use for more than 3 to 5 levels.

And trees are not always that great for mapping to the "real world". Actual trees have either branches or end nodes (leaves). But the forking of each branch doesn't really constitute a node, with it's own information (leaf). (Not to mention, the tree's "root" is not singular, but composed of many branchings in the other direction). And the analogy often gets worse when trying to deal with more complex real-world issues. In the real world, things have a way of re-connecting back into the structure in many ways, which can't be represented by a tree, and which deftly "skip" many levels of a heirarchical model. (Secretary is pretty low in the company heirarchy, but often can get you in to talk to the boss without going through all the other pointy-haired bosses).

There's nothing wrong with using a programming language to loop through multiple result sets, if it accomplishes your purpose. For example, if you want a multidimensional array, you won't get that from any one SQL query, so there is no reason not to use a programming language to extract that into a multidimensional array and work with it. I left a (very) simple example (http://sourceforge.net/snippet/detail.php?type=snippet&id=100993) of this in the code snippets at SourceForge.

But the question the relational data experts would tell you to ask yourself is "why do I need that multidimensional array?" Maybe there is another way. In the end, people do process information "flatly". When you click on different folders in the Explorer left hand pane, to open up this or that folder, you are still reading the folder names as a flat list, in actuality, even if you read from the root level down into a lower directory. You can't read more than one thing at a time. Queries accomplish the same thing.

Originally posted by binky
So if I have a select nested inside a select nested inside another select I should do it like that instead of taking the information from select one into a variable then using that.. oh, it's not SQL is it? It's programming, maybe I should be doing OO databases.

I guess all you need is a very rudimentary preprocessor. Why not write one? :)


Originally posted by binky
Transactions, though ahh, now...erm let me see...
Right, an example. If I want to update lots of tables then instead of joining them all in a massive long SQL statement, why not use a lot of single updates inside a transaction?

Forgive my ignorance, but what DBMS allows you to update several tables by joining them? You have to use a lot of single updates, and if the constraints are such, do them inside a transaction.

Originally posted by andnaess
Forgive my ignorance, but what DBMS allows you to update several tables by joining them? You have to use a lot of single updates, and if the constraints are such, do them inside a transaction.

Transact SQL supports updating JOINed tables (although I think it is limited to updating a single table at once):


UPDATE sometable
SET is_active = 0
FROM sometable,
inactive_parts
WHERE inactive_parts.id = sometable.id
AND inactive_parts.type = "brake_pads"


You can also use sub-queries and such instead of a join, however it still is that you can only update a single table. Updating multiple tables would be a nightmare although they are looking at adding it. I think maybe (in relation to transactions) he was thinking something like this:



BEGIN TRAN
-- Update bob's entries
UPDATE sometable
SET owner = "Bob"
WHERE owner = ( SELECT owner_name
FROM owner
WHERE prev_owner = "Joe" )

-- Now do other updates

COMMIT TRAN

I wholeheartedly agree with the vein of this posting in regard to OO db's. Let's look at this another way: what are the benefits of doing OO-programming? If you knock out the intellectual or dork prestige factor of being a gOOrOO you've got the big advantage of modularity. Sure there are other advantages but the BIG one for me is the Lego block approach to programming (and I've always loved Lego).

SQL, flawed as it is, is relatively standardized. I can take SQL scripts from an Oracle or Postgre or MYSQL site and use the logic with any SQL capable db. Sure there are a few syntax things (that's how we make the big... well that's how we make bucks anyway, right?) but by and large SQL99 compliant code is a standard that cuts through a lot of boundaries.

I can even give users a basic SQL window to play with, I wonder how much unecessary coding in an OO-db that would take! :p

Just to prove my point about the IT industry, (and IT media's lack of understanding), here is an article that speaks volumes: Dawn of a New Database (http://ww1.infoworld.com/cgi-bin/fixup.pl?story=http://www.infoworld.com/articles/op/xml/02/03/18/020318opwireless.xml&dctag=database) (InfoWorld).

Somehow, the author has gotten the idea that "transactional databases" are the key to the future, because they work faster than relational databases.

In the article, we see that a wireless carrier is looking for a way to store all of its SMS messages, "...something that a relational database can't really handle."

I did a Google search for "transactional database", and found that it is variously described as the following:

1. A standard SQL database engine with transactions
2. A specialized application (which uses a database) that records and analyzes consumer spending habits, or ... transactions.
3. Any database that is associated with commerce in any way)
4. A term that has been used by just about every database vendor (of any genre of database) to vaguely assure that their database enforces consistency. This includes the OODB vendors, of course.

So what is the definition of a transactional database? There is none.

The Caché database mentioned in the link above apparently gets its performance by using "super-efficient multidimensional arrays" (I believe I rightly detect this to mean heirarchies). See http://intersystems.com/cache/fusion/cache-prd_02.html. Gee, multidimensional arrays... sounds impressive. Wish I could do that with PHP-- oh wait! Yes, I can (http://www.php.net/manual/en/function.array.php).

Interesting news: it seems that finally a commercial developer has attempted to go beyond SQL and create a true relational database implementation. (see discussion at http://www.dbdebunk.com/x_trdbms_impl.htm).

The company is called Alphora (www.alphora.com) and the product in question is called Dataphor. To my knowledge, this is the first serious commercial attempt at providing a true relational method of handling data. The interesting part about this is that this company is not a traditional RDBMS vendor at all. They actually approached this from the perspective of needing a better application development environment. They discovered some of C.J. Date's writing, including "The Third Manifesto", and decided this was the best possible method. Their description:


Complete applications virtually “fall out” of the database definition, yet they can also be tailored to fit the requirements of any project.

This is exactly what I have been thinking would be possible from a complete relational data implementation. Dataphor actually modeled their query language on Tutorial D, from "The Third Manifesto". This also includes object-relational type inheritance.

One caveat to this development. Dataphor is only a data manipulation environment, and lacks its own physical storage layer, at this point. Thus for data persistence, it uses existing SQL engines in the background. Again, let me stress that this is on the physical implementation level, not on the logical data level. As Date, Darwen, Pascal, et all... say, a database can do anything it wants on the physical storage level, as long as it provides relational constraints on the interactive level. The only problem is that it still suffers from some of the inefficiency of current SQL implementations, for physical storage. However, during the data's "lifetime", it can actually reside in live memory, so performance can be good, up to the moment you need physical storage. Also, the physical storage is an open API, so you can write your own.

This sounds like a very interesting project. The only disappointment on my part is that there is currently no project by any open source developers that attempts anything equivalent to this. It seems to me that it would be a natural fit for PostgreSQL, but the developers are apparently more interested in "good SQL implementation" as being good enough for the present.

Also, Alphora only runs on Windows, being a .NET add-on. I hope they realize that there is a significant market for we Unix lovers too. If it could provide me a full relational query environment, with an C/C++ API, and support for PostgreSQL on the backend, I would willingly pay for it. (And you know the PHP and Perl developers would have a module for it in no time)

On a side note, there is a "placeholder" project at SourceForge, for anyone willing to assist in starting an open source implementation of Tutorial D. (Yes, there is absolutely nothing there at present) Leandro, the project maintainer, is simply hoping to provide a focal point where these things can be discussed, and perhaps even planned. I hope anyone here who is interested will provide whatever support they can. (that means you too, André ;)).

Alphora had a booth at Borland's conference a few weeks ago and I got to see a demo of their product. It was impressive and I wish I had had more time to look at it (at least I got one of their free t-shirts).


Also, Alphora only runs on Windows, being a .NET add-on. I hope they realize that there is a significant market for we Unix lovers too...(And you know the PHP and Perl developers would have a module for it in no time)


The product is written in C#. If Ximian (http://ximian.com/devzone/projects/mono.html) can get a Linux C# compiler working, maybe you'll get to see Dataphor running in the *nix environment soon. Dataphor is extendable by any language that conforms to the specifications for Microsoft's Common Language Runtime (of which Perl and about another dozen languages comply).

Actually, I believe there is a FreeBSD implementation (http://msdn.microsoft.com/msdnmag/issues/02/07/SharedSourceCLI/default.asp) also. I just don't know how well these implementations work. Somehow, I distrust the concept of running something non-native on Unix. It should work, but I will have to be convinced that it will be stable, and perform up to par with native C/C++ applications.

But, I will be doing everything I can to figure this out.










privacy (GDPR)