12/03/2009

Common Mistakes in SQL Server - Part 2

Last week we discussed about whether to choose variable or fixed length data type and when to use when to use a Unicode data type like nchar or nvarchar in Common Mistakes in SQL Server – Part 1.
Let’s discuss about the 2nd common mistake I have observed - using Identity column as primary key.

Identity is a property which can be defined on an integer, decimal, numeric, small integer, big integer or tiny integer data type, an identity column is a column which - generally is used as primary key.  The  Identity property creates an incremental value for the specified column automatically, which is why it is widely used by developers when they designed the table and a primary key – an identifier column.

How does it work?
The Identity column property has two components: a seed value and an incremental value. A seed is the intiial value  set for that particular column , where as an incremental is added to last inserted value when a new row is created. 

When we set the Identity properties, we have to define the value for first record aka seed value (for example, 0) and the incremental value (for example 1). Once you have defined this property and you insert a record into the table for the first time, it will add 0 for the Identity column.  upon the 2nd time you insert a record it, will have a value 1 (seed of 0 plus the incremental of, 1),  and at 3rd record insertion it would have a value  of 2  (1+1). (Download the demo script here Part2_1 which illustrates the table definition with Identity column)

Advantage of using an Identity column as a Primary Key:
There are couple of advantages using an Identity column as a Primary Key, including:

  • SQL Server itself manages the  Identity values
  • Developers don’t need to write a code for new primary key value because Identity column value is created automatically
  • The size of index will be low as compared to index size on char based columns

Are there any catches?
While there are couple of advantages, there are some disadvantages which makes managing Identity columns a bit harder including:


1. You cannot control Identity values, because this column and its values are controlled and managed by SQL Server. Please note, however, we may use 'set Identity_insert tblName on' to insert an Identity value explicitly.

2. For any reason, if insertion fails, a value for Identity will get created. Then the next time you insert a record it will create gap in numbering.

Let us check this out with an example. Create a table called TestIdentity with three columns: one Identity column, one varchar(10) and  one varchar(5) column. We’ll insert 10 records here, and then 11th record will fail because we’ll try to insert a value which is greater in size then what is defined in table definition. Once it has failed, we’ll again pass the correct values. Please notice the gap created in the Identity value. (Download the demo script here Part2_2)

3. You cannot add an Identity property to an existing column. There are only two options that you have if you need to add an Identity column to a table - drop and recreate the table or add a new column with Identity value (and then drop the existing column, if applicable). (Download the demo script here Part2_3)

4. There has been a lot of said about the disadvantage of Identity columns , especially when it comes to a replicated database., you have to alter your identity property and say “Not for Replication”.






Then, if you have to restore your replicated database for any reason you will have to be very careful, as you will have lost all your replication settings unless you specify that you wish to  keep_replication. For further information on what all we have to aware of when we back up or restore replicated database please refer to this article.

Whenever the Identity has reached its threshold value you will have to reseed the Identity values and adjust Identity range at publisher end. This happens when we restore backup at subscriber end; you will have to find the last inserted Identity value and adjust the Identity range at publisher, using these steps:
a. Execute select ident_current(‘tblName’) for every subscriber you have
b. Note down the highest value
c. Go to your publication server and execute
dbcc checkident(‘tblName’,reseed,value*+1) where value = value found in step b
d. Finally execute sp_adjust
e. Now, go to your publication server and
execute sp_adjustpublisherIdentityrange PublicationName, TblName

Note: above case is for Merge Replication.

Here is a nice article on MSDN on Replicating Identity Columns, and you may also refer Hilary Cotter’s article Identity Crisis (generally known as Identity Crisis).

I personally believe that one should avoid using  the Identity property , especially as a primary key when you have replicated database to over come  the issues you face with Identity columns. Rob Volk has written an interesting observation on Identity and Primary Key. Yes, doing this will require some more efforts on your part to create incremental values for your Primary key; but it really is just a matter of writing a few more line of code. Additionally, there are alternate ways of doing this - for more information you may refer an article here on how to generate auto incremental values in SQL Server. For further reading you can refer 
http://msdn.microsoft.com/en-us/library/aa175244(SQL.80).aspx
 


4 comments:

  1. I rather prefer unique identifier as primary key which is very well supported by ORM tools like Hibernate /Nhibernate. With support to generate new unique identifier within t-sql (newid() method) you will never run out of option anywhere.. be it t-sql or application code. Yes I know it takes more memory than int but personally I have never seen it to be a big issue in any app. For indexing and performance, ORM tools supports Guid.Comb generator which results in almost same as int field performance.

    ReplyDelete
  2. Mahendra,

    True, but using GUID as primary key will unnecessarily create fragmentation and occupies more size. The bigger the table - bigger the size and waste of disk resource.

    However this *depends* on the application requirement, in that case NEWSEQUENTIALID() function should be used to generate IDs as newid() will not generate sequential ids.

    Regards

    ReplyDelete
  3. I agree about fragmentation issue with guid types. Following article by Jimmy explains meat behind the reasoning and suggest best practice as well.
    http://www.informit.com/articles/article.aspx?p=25862

    My perspective on this in general is driven by the fact that I see it as whole system as domain driven and not just database so with use of ORM in my system I make sure even with guid type as primary key, I can ensure sequential keys (almost sequential, I would say) and take least hit there but at the same time take full advantage of GUID type in whole domain.

    Hope that clarify my point here

    ReplyDelete
  4. yes, in this case this is justified.

    Regards

    ReplyDelete

I appreciate your time, thanks for posting your comment. I will review and reply to your comment as soon as I can.

Thank you
Hemantgiri