Why are Denali sequences supposed to perform better than identity columns?
In his answer to Which is better: identity columns or generated unique id values? mrdenny says:
When SQL Denali comes out it will support sequences which will be more efficient than identity, but you can't create something more efficient yourself.
I'm not so sure. Knowing Oracle's sequences, I have either to create a trigger for insert, encapsulate each insert into a call of a stored procedure, or pray that I do not forget to properly use the sequence when I do an ad-hoc insert.
I doubt that the advantages of sequences are so obvious.
Answers 2
I'll answer here as well. It has to do with the internals of how
IDENTITY
andSEQUENCE
work.With
IDENTITY
, SQL Server pre-caches values into memory so that they are readily available. See Martin Smith's answer for the details. As values are used, a background process generates more values. As you can imagine this pool can run out pretty quickly, leaving the application at the mercy of the background process that is generating the values.With
SEQUENCE
, SQL Server allows you to define how large the cache should be. While SQL Server doesn't actually keep the values in the cache, it only keeps the current value and the top end value, this will greatly reduce the amount of IO that is needed to create values.Don't set the cache too high, as this will reduce the number of numbers which can be used: if SQL Server were to crash, any values specified in the current cache range which weren't used would be lost.
As for row insertion, just specify a default value for the column, like so:
Since the Itzik Ben Gan article was written the hardcoded cache size of 10 for
IDENTITY
seems to have been changed. From the comments on this connect itemThe T-SQL Querying book contains the following table but emphasises that these values are not documented or guaranteed to be unchanged.
The article here tests various sequence cache sizes and insert batch sizes and comes up with the following results.
Which appears to show that for large inserts
IDENTITY
out performsSEQUENCE
. It doesn't test cache size 1,000 however and also those results are just one test. Looking specifically at cache size 1,000 with various batch sizes of inserts I got the following results (trying each batch size 50 times and aggregating the results as below- all times in ?s.)For larger batch sizes the
IDENTITY
version seems generally faster.The TSQL Querying book also explains why
IDENTITY
can have a performance advantage over sequence.The
IDENTITY
is table specific andSEQUENCE
isn't. If disaster was to strike mid insert before the log buffer was flushed it doesn't matter if the recovered identity is an earlier one as the recovery process will also undo the insert, so SQL Server doesn't force flushing the log buffer on every identity cache related disc write. However for Sequence this is enforced as the value might be used for any purpose - including outside the database. So in the example above with a million inserts and cache size of 1,000 this is an additional thousand log flushes.Script to reproduce