Search This Blog

Thursday, April 19, 2018


GUID or Integer Primary Key?


If you are designing a new data model or maybe you are looking to improve an existing one.  One of the key decision you would be making is what should be the primary key in each table.

Now first rule or best practice is that it should be a surrogate key i.e. arbitrary/meaningless unique value. In other words, it should not be based on a business key.

Globally Unique Identifier aka GUID (also known as UUID) and auto generated Integer  are two of the popular methods used for a surrogate primary key.

You may know that using GUID has certain advantages, its natural and virtually guaranteed to be globally unique, meaning across all tables, databases, servers... basically the whole universe. Whereas Integer is only guaranteed in a single table and only when defined as a primary or unique key.  And, even then you could end up having duplicates in bizarre and rare circumstances. Crazy things happens and I am speaking from a painful experience!


So is it settled that GUID would make a great data type for a primary key? well, not so fast.

GUID has extra performance cost due to its 4 times larger size and for being non-sequential. This is much more pronounced if the tables cluster index is based off GUID column, which usually is the case.

Why GUIDs performs poorly especially when compared to Integers?

GUID is a 16-byte type while INT is a 4-byte type. With storage nowadays very cheap, this normally is not a problem from the storage point of view. The problem is that reads (and writes) will be multiplied by 4 times.

And biggest performance hit is when you use the GUID as not only as a primary key  but it is the clustered index of the table. In this case, with every INSERT you would potentially be changing the internal layout of the data as it is stored, SQL Server potentially having to move many data pages. This is unlike primary key based on IDENTITY INT clustered indexes where data is stored sequentially providing the best performance.

Now some of this performance hit can be mitigated by using the sequential GUID feature. However, the sequential counter gets reset when SQL Server is restarted, which does not affect the uniqueness but can lead to the internal fragmentation of the data.

Because of the performance hit, the cluster key for table is not recommended to be a GUID. In other words, even if you choose to go with GUID as record identifier, the cluster index on the table should be based off an auto incremental integer value.

Here is what it would look like, as an example.

Field Name
Data type
Default
Constraint Type
Index Type
ID
Integer
Auto Increment
Primary Key
Clustered
GUID
Unique Identifier
NEWID()
Unique
Non-clustered