'Best practices on primary key, auto-increment, and UUID in RDBMs and SQL databases

We're designing a table for user entity. The only non-trivial requirement is that there should be a permanent URL to the user entity (for example their profile). There's a lot about int/long vs UUID on the web. But it is still unclear to me.

  1. Considering the fact that the profile contains private information, it's not a good idea to have a predictable ID embedded in the URL. Am I right?
  2. To satisfy the first I can have primary key as UUID and embed it in the URL. But there's two question. Should I be worried about the performance penalty of having UUID as primary key in anyway; indexing, inserting, selecting, joining?

Having that said, which one of the following is better (with respect to the above)?

CREATE TABLE users(
  pk UUID NOT NULL,
  .....
  PRIMARY KEY(pk)
);

or

CREATE TABLE users(
  pk INT NOT NULL AUTO_INCREMENT,
  id UUID NOT NULL,
  .....
  PRIMARY KEY(pk),
  UNIQUE(id)
);


Solution 1:[1]

It's a matter of choice actually and this question can raise opinion based answers from my point of view. What I always do, even if it's redundant is I create primary key on auto increment column (I call it technical key) to keep it consistent within the database, allow for "primary key" to change in case something went wrong at design phase and also allow for less space to be consumed in case that key is being pointed to by foreign key constraint in any other table and also I make the candidate key unique and not null.

Technical key is something you don't normally show to end users, unless you decide to. This can be the same for other technical columns that you're keeping only at database level for any purpose you may need like modify date, create date, version, user who changed the record and more.

In this case I would go for your second option, but slightly modified:

CREATE TABLE users(
  pk INT NOT NULL AUTO_INCREMENT,
  id UUID NOT NULL,
  .....
  PRIMARY KEY(pk),
  UNIQUE(id)
);

Solution 2:[2]

This question is quite opinion-based so here's mine.

My take is to use the second one, a separate UUID from the PK. The thing is:

  • The PK is unique and not exposed to the public.
  • The UUID is unique and may get exposed to the public.

If, for any reason, the UUID gets compromised, you'll need to change it. Changing a PK may be expensive and has a lot of side effects. If the UUID is separate from the PK, then its change (though not trivial) has far less consequences.

Solution 3:[3]

I came across a nice article that explains both pros and cons of using UUID as a primary key. In the end, it suggests using both but Incremental integer for PK and UUIDs for the outside world. Never expose your PK to the outside.

One solution used in several different contexts that has worked for me is, in short, to use both. (Please note: not a good solution — see note about response to original post below). Internally, let the database manage data relationships with small, efficient, numeric sequential keys, whether int or bigint. Then add a column populated with a UUID (perhaps as a trigger on insert). Within the scope of the database itself, relationships can be managed using the usual PKs and FKs.

But when a reference to the data needs to be exposed to the outside world, even when “outside” means another internal system, they must rely only on the UUID. This way, if you ever do have to change your internal primary keys, you can be sure it’s scoped only to one database. (Note: this is just plain wrong, as Chris observed)

We used this strategy at a different company for customer data, just to avoid the “guessable” problem. (Note: avoid is different than prevent, see below).

In another case, we would generate a “slug” of text (e.g. in blog posts like this one) that would make the URL a little more human friendly. If we had a duplicate, we would just append a hashed value.

Even as a “secondary primary key”, using a naive use of UUIDs in string form is wrong: use the built-in database mechanisms as values are stored as 8-byte integers, I would expect.

Use integers because they are efficient. Use the database implementation of UUIDs in addition for any external reference to obfuscate.

https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439

Solution 4:[4]

Don’t make it your database primary key: this will cause problems in the future of you want to change your database technology. And if you make it an increasing number, your competitors will know how many users you have and how fast you are adding new ones.

Solution 5:[5]

Using UUID as pk: The first problem is, UUID takes 9x storage than int. 2nd problem is, if you need sorting by pk more frequently, don't even think about UUID. UUID as pk doesn't affect the time complexity for where condition or others except sort.

Using int as pk: Easily guessable. Brute force attacker will love this. this is the only problem but biggest one.

Using int as pk but, keeping UUID as well: If the UUID is not pk then the time complexity will be increased for searching by UUID. even though, all the relations will be maintained by int, but, when you will search by UUID, it will take time. As the relations are on int, the 9x storage issue is solved here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 nsv
Solution 4 vy32
Solution 5