Last week we discussed advantages of recompilation in Common Mistakes in SQL Server – Part 3. This week I will discuss the Null Value and how it can cause an issue!!
What is NULL?
Null is the value which indicates the value is unknown. Null is different than zero or an empty string, as a zero or empty string indicates that the value is known (refer http://en.wikipedia.org/wiki/Null_%28SQL%29 for further reading.)
Why should we avoid NULL values?
In one of my previous projects I was asked to perform some tests and do analysis on a specific database which had a tremendous growth rate, and it seemed that either the estimation related to growth was wrong, or the application inserts data more often than is expected. The database growing more than 300% in just 2 years wasn’t a good sign - something was really wrong. After I finished the analysis I found that there was a single table which had significantly contributed to the database size - that table itself was 450+ gigs in size; it had 217 columns and more than 50 million records, most of the columns allowing null values, which is where the problem was.
There is a misconception that if we have the NULL values in a table it doesn't occupy storage space. The fact is, a NULL value occupies space – 2 bytes. If we do not consider this point at the estimation stage we will have trouble with the growth of the database. Sometimes, however, application design demands some columns to be there for future use, and in this case, there will be NULL values.
Let me give you an example – create a table “tblEmployee” having thirteen columns of fixed length data type. I will add 150000 records and check the size of table. Remember, we are not creating a primary key/clustered index - thus this will be the Heap.
1. Specify the number of rows that will be present in the table
Num_Rows = 150000
2. Specify the number of fixed length and variable length columns:
Num_cols = 13
Fixed_data_size= 4+15+15+4+25+25+10+10+3+4+4+35+35 = 189
Num_Variable_Cols = 0
Max_Var_Size = 0
3. Part of the row, known as the null bitmap, is reserved to manage column nullability:
Null_Bitmap=2 + ((13+7)/8) = 4.5 = 4
4. Calculating the variable-length data size:
Variable_Data_size: 0
5. Calculate the total row size:
Row_size = Fixed_data_size + variable_data_size + Null_bitmap + 4
Row_size = 189 +0 + 4 + 4
Row_size = 197
6. Calculate the number of rows per pages (8096 free bytes per page)
Rows_per_page = 8096 / (122+2) = 40.27 = 40
7. Calculate the number of pages required to store all the rows
Num_Pages = Num_rows / Rows_per_page
Num_Pages = 150000 / 40 = 3750
8. Calculate the amount of space required to store the data in the heap (8192 total bytes per page)
Heap Size (bytes) = 8192 * 3750 = 30720000 Bytes
The size of the table tblEmployee is 30776KB which means storing 150,000 records would occupy 30 MB. Now, lets calculate this for 50 million records - you will get 9990 MB ((50000000/150000)*30). Remember we have calculated this for 13 columns only; if your table has more columns the size could be bigger than this. Download the demo script here
The above example shows how your estimation will be wrong if you do not consider NULL values while estimating database size. To estimate your table size correctly, you must consider this point, and you should try to narrow down the possibility of having NULL values in your tables.
Resources: