Helpful Information

Category: PostgreSQL Help

Unique identifiers question

Hi !!

I just want to ask for your opinion of what do you think it's the best way.

If I have 2, 3 or more tables, each one having an identifier field (usually as primary key), I see three ways of managing it:

1.- Have a sequence for each table (id) and fetch the nextval in every insert. The common and book way.

2.- Have a unique sequence for all tables (ids). You need only unique identifiers, not sequentially ones. Every time you insert a row in a table that shares the sequence, fetch a nextval.

3.- Using internal OIDs.

What do you think it's best? Can you give me pros & cons?

Thanx !!

Because some things go astray in transactions. Let each table maintain it's state and current increment.

That is cleaner than trying to keep three tables in sync.

Also, if there is a reason to have different objects (tables), then the day may come that you would add an instance (row) that does not require an instance in each table.

HTH

I think I didn't explained well.

Suppose you have 3 tables: customers, languages and articles. The tables have no relationship between them.

Let's see the tables

customers: id_customer, name, telephone
languages: id_language, description
articles: id_article, header, content

Every entry in a table has a unique id field, in order to identify this row. The most common thing is to use 3 seqs, one for each table.

If you want an auto increment id, you would use

CREATE SEQUENCE id_customer_seq;
CREATE TABLE customers (
id_customer INTEGER PRIMARY KEY DEFAULT nextval(id_customer_seq),
name CHAR(16)
...
)

CREATE SEQUENCE id_language_seq;
CREATE TABLE languages (
id_language INTEGER PRIMARY KEY DEFAULT nextval(id_language_seq),
description CHAR(24)
...
)

CREATE SEQUENCE id_article_seq;
CREATE TABLE articles (
id_article INTEGER PRIMARY KEY DEFAULT nextval(id_article_seq)
header CHAR(32),
...
)

The second solution I proppose is to have only one sequence:

CREATE SEQUENCE id_seq;
CREATE TABLE customers (
id_customer INTEGER PRIMARY KEY DEFAULT nextval(id_seq),
name CHAR(16)
...
)

CREATE TABLE languages (
id_language INTEGER PRIMARY KEY DEFAULT nextval(id_seq),
description CHAR(24)
...
)

CREATE TABLE articles (
id_article INTEGER PRIMARY KEY DEFAULT nextval(id_seq),
header CHAR(32)
...
)

This way, you still have unique id's in each table. There is no need to keep in sync the tables, as they are not related. You do not need to keep the id numbers sequentially in a table. Understand the sequence as a "random number generator".

And the other fields of every table are not the same.

What do you think?

I think the 'cleanest' approach would be to use 3 sequence numbers. The overhead of sequence numbers is very low so there's no real reason to kludge it in there.

OK, I'll have to say that it's the common way.

But I see a great benefit from the point of view of bugs. Suppose you have to make a query such as ...

update table1 set col = 2 from table2 t2, table3 t3
where id = 4 and parent = t2.id and t2.parent = 5 and t2.brother = t3.brother and t3.id = 7

(Warning: this query may be perfectly wrong or a silly one, but it is intended only for explaining purposes)

The constant values in this query are set by a script, so they vary from one query to another. You mistake the order of the variables and swap 4 with 7.

If you have N sequences, it's quite probable that 4 and 7 exists in the two sequences. So the database won't complaint about this and execute the query, returning an "OK" state, and updating some mistaken rows.

But, if you have only one sequence, it's impossible to make mistakes of this sort. Why? Because the databse would tell you (I can't find a row with id 4), and won't update anything.

And I think it's better to share the sequence, because it's not so easy to find the next row in a table just adding one to the last id.

There's only one case in which I see mandatory to have N seqs. When you really need the id number, not just for identifying rows, but, for example, a member number to put in a card.

In my university, I've always been told to use N seqs. But I think nobody thought about it seriously, they follow the common way without asking themselves why. I want to convince myself that it can be done without danger, so I'm looking for expert comments.

Can it affect transaction rollbacks or some database integrity?