Helpful Information

Category: Firebird SQL Development

joining and copying across multiple databases

I need to store my tables in two (or more) separate database files, and make queries across them. Does interbase let you do this, and if so, what does the syntax look like?

I also need to be able to copy data across the databases like this:

INSERT INTO ...(table in database A)... SELECT ....(data in database B)...

Is that possible? If it matters, i'm doing all this through JDBC.

Also, is there likely to be a noticable performance penalty on this? And yes, i am dealing with very large amounts of data.

I need to store my tables in two (or more) separate database files, and make queries across them.
I could not find a way to do a join across two separate database files. The only solution I could think of was replication (see the Operations Guide). You could set up replication so the the data in B is copied at set intervals to A. Then you could join it locally.

Depending on the what you're doing, Interbase allows the use of external files as tables (See the EXTERNAL FILE option under Tables in the Data Definition guide). This external file can only be used for inserting and selecting (no updating or deleting). Database B could insert data into the external file and Database A could select from it.

I also need to be able to copy data across the databases
I use a commercial product call Database Workbench which has a tool called the Data Pump to migrate data from one database to another (http://www.upscene.com/).

There's are scripts that would allow you to transfer data using external files:
http://www.mers.com/IB_0158.HTML

There's some utilities to perform various tasks:
http://www.ibphoenix.com/main.nfs?a=ibphoenix&page=ibp_projects

Hmm, thanks. Interesting, but not quite what i needed.

What i'm doing is generating caches to make my searches smoother. I have a very large table and a rather complex stored procedure (with several arguments) to decide which record are relevant to a particular query - which is a distinctly smaller but still very large number. These both work fine, but when combined in a complex query it seems to be running the procedures and doing the joins many, many times instead of just once. Probably some to do with the, ahem, optimizer. So, i'm caching the results of table-joined-to-procedure, in temporary tables. These tables may have a much longer lifespan than one query, so they aren't really the same as the "Temporary Tables" that interbase doesn't offer anyway.

But these temporary tables are taking up a lot of space and dropping them takes time, so i'd like store them in a separate file. That way, when the program quits the file can just be deleted, rather than laboriously dropping a hundred large tables.

So what i need is the ability to quickly run an INSERT / SELECT where the target table is in a separate database from the original data.

Interbase allows the use of external files as tables

Ooh, i've never seen this one before. Any idea what the performance is like?

Any idea what the performance is like?
No. I just found out about them trying to research the answer to your question. I have zero expereience with them.

Since you're having performance problems, here's some links that might be helpful:
http://firebird.sourceforge.net/download.php?op=file&id=makeibscream.pdf
http://www.ibphoenix.com/main.nfs?a=ibphoenix&page=ibp_tip_perf

Firebird does not allow for crossdatabase queries, but one transaction can span several databases (useful if you must ascertain that something is moved from one database to another - e.g. for banking applications).

Why and how you generate cache is probably more relevant to your problem. Are you coming from a different database background where this would normally be neccessary? And when you say a very large table, are we talking of 1 million rows or hundreds of millions? Another thing that could be useful, is to see the SQL and the generated plan (together with an explanation of indexes, selectivity and table sizes). Arno Brinkman has done a great job in optimizing the optimizer, but that doesn't mean that it isn't necessary to tweak the statements every now and then.

HTH,

Showing you the real SQL would be a long story - it's all auto-generated for a start. And i'm constantly tweaking indexes and so on. Plus there's stuff in there i'm not supposed to show the general public :) But it looks something like this:

SELECT * FROM A
INNER JOIN A_PROC(...) AP ON A.ID = AP.ID
INNER JOIN B ON ...
INNER JOIN B_PROC(...) ON B.ID = BP.ID
WHERE ...

I'm pretty sure of what's going on, due to the fact that execution time for a query like that shown above increases exponentially with the size of the table - which suggests that the expensive second proc is being run many more times than it needs to be. I'm doing what i can with indexes, but of course the results of a proc can't be indexed.

There's another complication - too many joins in one query and firebird seems to choke.

So you can see why i'm caching the results of the A INNER JOIN A_PROC(...) in one table, and B INNER JOIN B_PROC(...) in another. Initial testing shows that it does seem to be helping.

And when you say a very large table

It has to keep a record of everything it ever does. Test data right now is a few million, but that's only a month or two of data. It's going to keep growing indefinitely.

(edit: grammar)

Joining across multiple databases was for long my interest too.
The only way interbase does such joins is with the command line tool qli , using an old syntax .

I don't see why all this is now lost in the new implementations of firebird.

Documentation on qli is quite spartan and permits only interbase to interbase(firebird) joins.

Another way of joining in code is with the Borland BDE which by the way is also left to die.

So heterogenous joins are realy dificult to do now with current tehnologies.

Recently I found that php adodb is simple to use in syntax to perform such select/inserts in multiple databases, but not easy joins.

Maybe one should take his destiny in his own hands and code by himself this join.

For me heterogenous joins are useful in joining two systems operating on Oracle and Firebird.

You can do heterogenous selects/inserts/joins using Visual Foxpro. This is an amazing tool. You can have remote views/cursors to any backend database, create temporary indexes on local cursor columns, join local cursors, work offline and then update data on server when online, full hierarchical XML support (dataset/cursor to XML , XML to dataset/cursor). And it is the fastest local database engine on earth (much faster than MySQL-locally, of course).
You can:
orclConn = SQLCONNECT({oracle database connection string})
SQLEXECUTE(orclConn, [SELECT ... FROM oracle_table], 'oracle_cursor')
fbConn = SQLCONNECT({firebird database connection string})
SQLEXECUTE(fblConn, [SELECT ... FROM firebird_table], 'firebird_cursor')
SELECT ... FROM oracle_cursor, firebird_cursor ON {join condition}
And I didn't say anything about its full OOP language .
:)[FONT=courier new]

Originally posted by Sadie
Hmm, thanks. Interesting, but not quite what i needed.

What i'm doing is generating caches to make my searches smoother. I have a very large table and a rather complex stored procedure (with several arguments) to decide which record are relevant to a particular query - which is a distinctly smaller but still very large number. These both work fine, but when combined in a complex query it seems to be running the procedures and doing the joins many, many times instead of just once. Probably some to do with the, ahem, optimizer. So, i'm caching the results of table-joined-to-procedure, in temporary tables. These tables may have a much longer lifespan than one query, so they aren't really the same as the "Temporary Tables" that interbase doesn't offer anyway.

But these temporary tables are taking up a lot of space and dropping them takes time, so i'd like store them in a separate file. That way, when the program quits the file can just be deleted, rather than laboriously dropping a hundred large tables.

So what i need is the ability to quickly run an INSERT / SELECT where the target table is in a separate database from the original data.

Ooh, i've never seen this one before. Any idea what the performance is like?
I can ask the arhitect Jim how can we reload (Oh not the matrix :)) the old code that worked
for across many machines(and databases like oracle, dec rdb..now oracle rdb) . From my memory you could use multiple databases but not in sql . And by the way forget about M$ Fox pro:cool: and XML . We are the relational guys .
PS here is nice email from the dev list
maybe it will help you

Subject: [Firebird-devel] RE: Anyone is using FB 1.5 with 1Gb+ database?
Date: Tue, 3 Jun 2003 10:29:40 +0200
Hi there
Hayk Petrosyan wrote:
> I was wondering if anyone is using FB 1.5 with 1Gb+ database or even biger
> one?
> If so can you please share how it works? How stable? How fast? Are there any
> tricks need to be done to tune it up?

Yes, I have a few "live" production databases now on FB 1.5 RC3
One of them is 38GB, one is 1.5GB. All of them FLY! ...in the larger one, to do a select, with a few joins, etc, is practically instant. ...yes, it takes quite a while to do a backup. Client has an IBM LTO Ultrium backup tape, that helps. Have had this database on Interbase 6, FB beta, 1.0, and now 1.5. 1.5 definitely much faster, the users came and told me, asked me what I'd done... ;-) Also much faster to backup and restore.
About tuning, the normal tuning you would do on any FB database. Its more I think about good database design. If you designed it right, no problems.
And here is the very ugly solution
open 2 databases , query from first and use the second database data into a parameter (for example an id ) .You do a join by hand
(doing the engine work :( )
http://groups.google.com/groups?q=interbase+query+across+multiple+databases&hl=en&lr=&ie=UTF-8&oe=utf-8&selm=20000725.19330072%40mis.configured.host&rnum=4

Originally posted by mariuz
One of them is 38GB, one is 1.5GB. All of them FLY! ...in the larger one, to do a select, with a few joins, etc, is practically instant...
About tuning, the normal tuning you would do on any FB database. Its more I think about good database design. If you designed it right, no problems.

I guess that answers my question. Do the design properly, and multiple databases won't matter. Thanks!

At 04:23 PM 6/5/03 -0300, Daniel Rail wrote:
>Hello Marius,
>
>Thursday, June 5, 2003, 3:18:05 PM, you wrote:
>
>MP> sorry for the first email
>MP> What i want to ask is about connecting to muliple
>MP> databases in interbase , it is possible to be
>MP> implemented into firebirdsql ?
>
>I've asked that question myself and many others, and still I couldn't
>find those discussions. I know Ann gave a good point as to why it
>wasn't implemented yet.
>
>But, I think with the introduction of aliases in FB 1.5, it might be
>one small step forward for this feature to be implemented. But,
>there's a lot more work to do to get there. If my memory serves me
>well, the biggest hurdle in cross database selection is to be able to
>get the proper optimization of the data retrieval and proper
>synchronization of the data retrieval between the databases, and
>having a common cache between the databases for the result set.
>
>So I believe that the task at hand is not small and would require
>quite some work and a very good knowledge of the engine.

The short answer is no. The longer answer is that an unimplemented
part of
the grand architecture was a beast called the "mega-database manager."

The mega-database manager is a data manager that sat under the Y-valve
parallel to the engine, remote interface, and gateways. The
mega-database
manager, in essence, was a virtual database defined in terms or two or
more database, each of which was independently administrated, The
mega-database manager managed virtual database definition, consistent
meta-data presentation, communication, request decomposition, request
execution/synchronization, and two-phase commit.

The original concept was, of course, designed around BLR. Since there
is
fairly convincing evidence that BLR is unlikely to kill off SQL in the
very
near
future, any serious thinking about the mega-database manager should be
SQL centric, which, unfortunately, precludes any significant reuse of
existing
engine code.

All in all, the mega-database manager is probably the same order of
magnitude
in size and complexity to the current engine, though the available of
large
amounts of memory for meta-data caching would simplify implementation.
The really hard part, however, comes with the realization that 85% of
the
mega-database manager is not Interbase/Firebird specific, and with just
a
little
bit more work, could work with an arbitrary ODBC/JDBC (pick one)
complaint
datamanagers, giving it an independent existence, and changing the
economics
drastically. And, unfortunately, probably tripling the size *
complexiity *
reliability. Oh well.

da Wolf