Helpful Information

Category: Database Management

Reuseable integer keys

I'd like to get some opinions on the best kind of primary key to use for records that need to be uniquely identified, but temporarily and transiently.

For example, consider the case that you are designing a database to store classified ads which are going to be displayed on a website. Each ad will run for, say, 60 days. During the time that it is in the database it needs to be uniquely identified. It seems to me that it would be a bad idea to use an auto incrementing primary key because, assuming you're using an RDBMS that doesn't reuse key values, the keys are going to get burned up fairly quick.

Question: if I create a primary key field that is defined as type INTEGER (of whatever magnitude) AUTO_INCREMENT, what happens when the keys are exhausted?

The thinking that I have done so far leads me to believe that perhaps the best way to handle this situation is to create a primary key that is a large magnitude integer, but not auto incremented and calculate new keys by finding the current max and adding one, reusing key values that become available because of deletion, and periodically compacting the keys to reclaim "pockets".

Any thoughts/suggestions/questions?

Hi,

I just generally go with the 'auto_incrementing' field for unique ID's. You can generally use a pretty big numbers for the field, with unsigned int you get up to 4000000000 plus before you get near the edge!

As for checking the last value and doing some calculations for the next value IMHO, you are just re-inventing the wheel when auto_inc does it already? Also there may be concerns in a multi threaded system with duplicate values being created if for example over an internet connection the last value is read, next user gets the same last value, system generates next value the same for both users, both write the same value as the identifier? auto _inc retains integrity.

You may be able to trim a few k off the database with your own routines to manage unique i.d's, but you add extra processing time? 4billion plus ad's would be nice I guess, but in reality will this ever run out?

Regards, Ed.

Answer #1: this depends on which DBMS you use: Some databases, such as PostgreSQL, Oracle, etc... allow for primary key re-use, simply by restarting the cycle once the top integer is reached. It will restart the cycle looking for the lowest possible free value, and start moving upwards through the available values.

Answer #2:

There is a lot of discussion in database design theory about what types of primary keys are best for different situations. My approach, before seriously reading some of this, was to simply stick an auto_increment field everywhere, thinking this was the most efficient use of the database resources, rather than maintaining my own primary key. I now think differently.

The main thrust of these discussions center around the question "when do you use a natural key, and when do you need an artificial key?". An artificial key is one that doesn't directly relate to the data itself, such as an auto_increment field. The auto_increment field is meaningless beyond its job as a provider of unique integers. However, there are many times when the nature of the data itself means that you don't need an auto_increment field at all, because somewhere in that table you have other unique values. Why muddy your data model by requiring artificial unique values when you have a unique field such as "username", or "SS_number" for example. The very nature of a username in a system requires it to be unique. The very nature of a social security number requires it to be unique. Thus you achieve your unique values, and hey there's a plus: these values are actually meaningful in other ways as you manipulate the data. This can actually make certain types of queries much more readable.

So the question to ask yourself is: "is there any attribute of a classified ad record that is already unique?". If so, then that is probably your best choice for a primary key. For example, is there an invoice number generated for each classified ad?

Answer #3:

You can definitely manage your own primary key, using whatever algorithm you want, as long as you use efficient methods of checking for uniqueness (keep your re-tries low). But, whatever you do, don't start "periodically compacting the keys to reclaim pockets". A primary key is your main data integrity mechanism. Once set for a certain record, it should be left alone. Once you start reshuffling keys, you will have to make sure that you change any other tables with records referring to those keys, and you will end up with one big headache.

Thanks for your thoughtful and informative replies.

Mr. T,

The duplicate values issue you bring up is valid, but doesn't really apply to my case in which there will be single administrator, so there will never be more than one user attempting to update the database at a given time.

Will it ever actually run out? Good question. I have a very poor sense of scale -- it seems very abstract to me. I would be much more confident creating a generalized system that I knew could sustain itself indefinitely. Perhaps you are right though, 4 billion values would probably last a very long time in this situation. A quick calculation suggests that even if you entered 10000 ads per month (which is waaay more than this system will need to accomodate) it would take approximately 33,000 years to exhaust the keys.

rycamor,

That's good to know. In this case it's MySQL or MS Access.

I've also considered the kinds of issues you're talking about and do not arbitrarily create an artifical key all of the time.

In this case it is unlikely that there will be an attribute of the ad which would naturally identify it uniquely or that would make for efficient queries. (These are just free ads, so there would not, for example, be an invoice number).

I understand and agree with your third point, but in this case it really would not compromise data integrity for the following reasons:

* The primary key for each add will not be a foreign key in any other tables.
* The ads only need to be uniquely identified from each other at any given time, they do not need to be uniquely identified over time. For example, the primary key would be used to create links to the ads that would be dynamically output on a web page each time the page is requested.

Thanks again. I welcome further discussion on this subject, though I've already gained some valuable knowledge and perspective.

You could make your primary key use some combination of the date, including hour,minute,second, plus a few digits of randomization afterwards. The likelihood of a random key being identical to another random key which is submitted at the exact same second is so low that the system would probably never have to re-try a unique key. Your primary key would now be useful as a human-readable identifier, you wouldn't need it to be an integer field.