Helpful Information
 
 
Category: Database Management
More Food for Thought: database vs filesystem?

See comments (http://reports.newsforge.com/article.pl?sid=02/03/22/204220&tid=9) about Microsofts plans to incorporate SQL server into Office, so that all user data is stored in a database, instead of files and directories. (yes, some of the comments are mine)

I think it's time GUI developers thought about the concept of all user data being stored in a relational database instead of flat files. And, of course, I have some exceedingly elegant arguments in favor of this, which I will post when I have the time ;).

Please read the article and comment. I think this sort of approach to user data will become critical in the next generation, so we should face it now.

Thanks.

Us web-developers have been doing this for years... Just think about the web-apps you've written, they normally store the data in a database right?

At both the places I work, we have a database where everyone keeps Word documents, Excel spreadsheets, images, text and just about everything else that can be saved to disk as a file. By keeping it in a database users can access their data remotely via the web, dialing in from home or from any workstation on the network. This file database is the largest database we have. Its called the Microsoft Exchange Server. Locally its known as the Outlook.pst file.

According to the article rycamor linked to, Microsoft wants to extend this paradigm to their desktop products. I think this will definitely make it harder for third-party and open source products to wean users away from a Microsoft desktop. My copy of Visio automatically loads the Microsoft Data Engine to store information in. Getting other products to do the same shouldn't be that difficult to do.

It appears that the anti-trust lawsuits don't have Microsoft very worried.

Us web-developers have been doing this for years... Just think about the web-apps you've written, they normally store the data in a database right?
True, and in some ways, we may be ahead of the curve, but the difference is that it is usually on a fairly narrow pre-definition of what type of information the user will be saving and retrieving.

What I am talking about will be a little more of a challenge, because it is a shift in paradigms with the complete user environment. The challenge is to allow the simplicity and ad hoc nature of saving to files, but breaking away from the metaphors of "files in containers", and "containers in containers" that we have in present-day filesystems. We need to find a way to allow the user to have freedom and flexibility of saving any type of data, but still to have an underlying organization that is largely hidden to the user, and which enforces relational database rules. I believe the application that would manage this needs to have some AI capabilities, to continually refine the user's data and relationships, as well as the views, queries, and data integrity rules that would allow users to define what is most important to [b]them]/b].

The current heirarchical filesystem is an ancient construct, which forces you to go through every level of a structure to get to your final data inside a folder. This is especially bad in GUI applications where you have to click and expand a tree repeatedly. (I have ranted against the tree concept already in this forum). The only workaround we have for this limitation is the concept of "shortcuts", which also come from the Stone Age of computing.

Come on! the average personal computer is a thousand times more powerful than the first ones available in the 80s. In all this time, developers have tried to use up this power simply by adding features to software. Now the average user-level application has literally a thousand commands, widgets and menu items, but we still save files in the same old, tired way.

The time is more than ripe for this to change. I think if such a thing were done right, then computing might just become fun again. At present we only have a couple of basic GUI widgets for displaying the filesystem: the navigational tree, and the folder-inside-folder view. Then we have a form to do file searches, but it is only the most basic of searches, where we can look for certain filenames, and filename patters, or for certain patterns of text inside files. That's it. Think of all the cool widgets we could come up with for the whole user environment if the whole data store is relational.

Would this work? Well, anyone who has administrated Microsoft Exchange will tell you that all Exchange/Outlook folders, shared data, etc... is all in a database, which can be queried at will. It's been done, and whatever we may say about the bugginess and vulnerability of Outlook/Exchange, the user interface is a success. I know businesses who have migrated their whole document management environment into Exchange for this very reason. This is a very primitive implementation, though. Users must still navigate folders for most things, but administrators and developers can build advanced search forms and categorization on top of this, which far outstrip any simple filesystem method.

Any KDE developers out there listening? Please take this rant as a petition. We know that some of you must have been thinking about this already, because the QT tools, and the KDE widgets themselves are now database-aware, so why not just take the step, and do this for the whole user environment, while Microsoft is still figuring out how to do this just for Office? And please, don't use MySQL. Use PostgreSQL, or at least something that attempts true data integrity.

OK, ok...

bash# killal -HUP rantd

Heh, while I was previewing this, dcaillouet posted a piece which actually ties in with my rant. Yes, Microsoft may be trying to close the door to third party softare, but think about this database thing a moment: Are you trying to tell me that Microsoft will make this whole database-backed storage method, but will NOT provide any sort of ODBC, COM, ADO, .NET, interfaces into it? Of course they will, or MCSEs would stage an open revolt!

So I'm not at all worried about whether our open source stuff can interface with it. If anything, it will probably be easier to get at the information, and understand the format. But even if not, some sharp open source developer will figure it out anyway. Why shoot down a good concept just because it is done by Microsoft? That sounds to me a little like FUD in reverse. Let's be gracious and acknowledge good ideas, wherever they might originate.

Its a great idea. If there was an open source version, it would be even better. I would love to be able to save a file to a database at work. When I plugged my laptop in at home, started the program again and opened the same file, the program would connect via TCP/IP to the repository at work and give me my file. The same problems with security that you have with a database would apply, but it would be great if I could save my files to a universal location (the address of the database) instead of the C drive (how many C drives are there in the world?).

I have a lot of respect for Microsoft. They realize that most users aren't computer gurus and produce products that are easy for them to use. But their products usually work best with other Microsoft products. Whenever they release a new file version, it takes a while for the third-party products to reverse engineer the format and get their products to work.



Are you trying to tell me that Microsoft will make this whole database-backed storage method, but will NOT provide any sort of ODBC, COM, ADO, .NET, interfaces into it?
No I'm not trying to say that. I'm just saying go fire up your Linux box. Now read an email off the Exchange server. Open a public folder and set up a meeting. Save some data to the SQL Server. Modify an Excel spreadsheet. Is it possible. Sure. Is it easy? Is it robust? Is it as good as doing it from a Windows workstation? Probably not. If you were going to be doing database work on a Linux box would you rather have Oracle or SQL Server as your backend? I was just saying that Microsoft products generally don't play well with others. I do a lot of work with ADO and its really tough getting it to work on my Suse box.

Contrary to what other people say, Microsoft has some great products and has had some good ideas (stolen or otherwise). Over the years I've just gotten more suspicous of them and believe they will do as much as possible to keep open source products off the desktop. That territory belongs to them and everyone else is unwelcome. Period. Hopefully the release of Mac's Unix-based OS X / Aqua will get some people pumped up about improving the current desktops for the Linux platform and give people a serious alternative to Windows.

Isn't database based publishing systems almost standard in most corp. workplaces these days (I think Lotus has been using this type of system or a simular one for a long while now)?

The only pitfall to something like office using this approach is, that it would be a P.I.A transporting a document between one machine and another one at home or school or even sending it to a possible employer or client or work.

In particular it would be an even bigger P.I.A for end-users, who barely know how to use that funny thing with a cord attached called a mouse (I know not all end-users are like this, but I have dealt with people like this in the past in both professional and home user environments).

It could also open more security holes and possibly make espionage even easier... After all we have seen how great windows security has been in the past, i.e: CodeRed family and Nimba.


--- end rant ---

Your points are well made with me. I know that the implementation of any method of tying in with Microsoft products is very difficult. However, why does Microsoft owe us any more than that? This is a philosophical issue, not a technical one.

On the technical level, though, what I am saying is: if a relational database is at the core, then we actually have a standard by which to operate, which is much better than some arbitrary data dumped in a physical file. The whole point of a database at the core of userland applications is that now we can separate the logical operations from the physical operations. I believe there is already one Unix-based replacement for Exchange (http://www.caldera.com/products/volutionmsg/datasheet.html), which was made possible by this very thing.

And generally, I think Microsoft's days of full proprietary interfaces are over, and they realize it. Thus the game now for them is "how can we keep a balance while giving away certain territory, in order to keep our overall usage strong?". The whole .NET strategy itself is a perfect example. Let's face it, .NET is an implementation that let's anyone do anything they want with the data at the other end. So, once all these products are ".NETized", it will be even easier to trade data around from platform to platform.

Not that I am concerned much either way: I intend to keep on writing applications which use relational databases, open protocols, and open source languages, so I will never be boxed in. Yes, I hope that between OS X and KDE, the desktop oligarchy will be brought down. While Microsoft does well at some things, I can't think of a single specific area where they have produced the BEST. That's why I think they should be pursuing what I consider to be the single most important next step for user interface design of an OS: relational DB storage methods instead of physical files. All of the attentiona paid to this or that graphical widget is a waste of time, if you don't provide a comprehensively better way for users to store their data. This also allows for much better interoperability, because the whole "separation of logical from physical" concept allows for each system to have it's own implementation, while still sharing data through well-established database methods, such as ODBC, JDBC, etc...

At the moment, what I am talking about is mostly theoretical. Now, this could all go very well, if the industry follows some good sense about what exactly constitutes a database. What I fear, though, is that instead we will see a mishmash of XML and "Object-Oriented" data polluting these waters, along with some muddy thinking about "unstructured" data, and "media objects" and what-have-you. But then again, maybe necessity will force industries to examine their approach a litte more rigorously, if our government actually starts holding software makers responsible for living up to their claims.

<not holding breath... />

My last post was of course directed to dcaillouet ;).

In response to deepspring's post, these are all implementation issues, not problems with the core concept. It is quite possible to shield the user from having to deal with SQL statements, etc..., and all databases have ways to transferring data from one point to the other, even temporarily dumping data to a file, which can be imported back in at the other end. To the end user, if this is done right, the only thing they would notice is that it is now much easier to group documents, categorize or track down any document or document fragment they have ever created, searches will be much faster, and distribution of the data from one network to another is much easier. Also, document versioning problems can be resolved much better.

Of course Microsoft will have all kinds of security problems here, but essentially, these security problems should be no more difficult than any other security problems. Networked data is networked data; encryption is encryption.

Rycamor,

No disagreement here, I was just voicing a concern from a common standing point that's all (newbie's p.o.v).

hopefully M$ would make such a system (the data dumping especially) transparent to the user, so the concerns I mentioned before (regarding people who barely know how to use a mouse) would be eliminated from the equation.

Another alternative or or one that could probably be used along side the database idea, is the use of Journaling filesystems like EXT3 and ReiserFS, which are extremely fast compared to M$'s common ones and are much safer for critical data.

Well enough of my rant....


"ban those evil M$ .NETzie's!"

Well, all discussions of Microsoft aside, here are a couple more interesting links in regards to "database as filesystem":

1. There was a lively discussion about relational database-backed filesystems (]http://slashdot.org/comments.pl?sid=25435&threshold=0&commentsort=0&tid=130&mode=thread&pid=2766866#2766901) on Slashdot a few months ago:

if you were going to add some additional functionality to your 'filesystem database', such as the ability to make usefull links and associations, like 'see also' or 'see related documents' or 'see documents by this author', then a relational database is just the trick as you can make 'many to many' joins between the various files on your network. Add a simple Web interface and voila! you have a really neat way of navigating through the filesystem.
2. And at one time, there was apparently someone working on a virtual filesystem for Linux, called "pgfs". (You guessed it; using PostgreSQL as journaling, versioning filesystem). See http://www.linuxjournal.com/article.php?sid=1383 This was intended to be used as an NFS mount, so it is perfect for a shared filesystem... say, where you have many developers updating a lot of source code and you want to enable versioning, user permissions, on-change triggers, file groupings across directories, and more, all while maintaining absolute data integrity. This sounds to me like the future of filesystems. Think about it: something like this, done right, could make CVS obsolete (Blasphemy!? hehe...). I want to find out of pgfs was ever continued. If this is available, I want to migrate my source code repository to it ASAP.

Actually, it's quite amusing how we developers keep storing our source code in regular files, using hopeless text-editors to work on it. I mean, here you have to most structured data possible, and still we handle it as if it has no structure, as if the computer can't aid us in handling it. Sure, there are things like IDEs, but these don't really offer much help. For more, see this excellent article: http://mindprod.com/scid.html

I believe integrating a SQL server into office is only a crutch.

The database should be down in the center of the OS. I guess that 90% of OS functuality fits easily into the database regime.

Ever since UNIX most OS'es were pretty much file oriented. A database oriented OS would be a huge step. We would need to think about (raw) data instead of files.
Having Os functuality as transactions which can be protokolled and rolled back seems appealing to me.
Security could profit from having only one access path for all data. Also I believe that a sandbox (or a set of different security levels for that matter) would be much more easy to implement in a complete and consistent way in a database OS.

But MSoft being what they are, I fear that a database will mostly be used to further shield competitors from essential information and to reduce other programmers to mere add-on producers.

As users I have to fear that data stored with such a system would no longer be readable by a system running newer versions of the software. If I forget to update my old archives with each new office-version the data could be lost. This of course would encourage users to use central web-based commercial datamanegement/datastorage applications offered by ...

You might have a point there. However, I think it could just be another way to skin a cat.

My point was not necessarily about how a database should integrate with the OS itself, but more on the user level. I think this is what Microsoft is attacking. Maybe later they will attempt a complete OS integration. IMHO handling this sort of thing for an application suite is much easier than doing it for the whole OS. The main benefit of a relational database is to separate logical implementation from physical implementation, so the user doesn't need to know "where" files are, or what type of file contains what type of data, but merely what type of information one is saving and retrieving. It's a "what not how" thing.

I don't see how this could be construed as a "crutch", though. SQL databases have a much more consistent API than any filesystem, so if anything, it should be easier for open source applications to integrate with Microsoft products. And as far as users' fear of losing data with different Office versions, how is that any different than nowadays, where different versions of software can produce incompatible file formats? Not that I am a Microsoft-trusting kinda guy, by any means, but I fail to see a reason to fear this change, and prefer the existing mess.

In the end, though, I agree that a deep integration of the OS with relational database could be a major step forward in data management and development in general. It's just that it would require a major change in how most application development is pursued, and I don't thing the programming world is ready for such a thing (yet).










privacy (GDPR)