Why are Denali sequences supposed to perform better than identity columns?

In his answer to Which is better: identity columns or generated unique id values? mrdenny says:

When SQL Denali comes out it will support sequences which will be more efficient than identity, but you can't create something more efficient yourself.

I'm not so sure. Knowing Oracle's sequences, I have either to create a trigger for insert, encapsulate each insert into a call of a stored procedure, or pray that I do not forget to properly use the sequence when I do an ad-hoc insert.

I doubt that the advantages of sequences are so obvious.

Answers 2

  • I'll answer here as well. It has to do with the internals of how IDENTITY and SEQUENCE work.

    With IDENTITY, SQL Server pre-caches values into memory so that they are readily available. See Martin Smith's answer for the details. As values are used, a background process generates more values. As you can imagine this pool can run out pretty quickly, leaving the application at the mercy of the background process that is generating the values.

    With SEQUENCE, SQL Server allows you to define how large the cache should be. While SQL Server doesn't actually keep the values in the cache, it only keeps the current value and the top end value, this will greatly reduce the amount of IO that is needed to create values.

    Don't set the cache too high, as this will reduce the number of numbers which can be used: if SQL Server were to crash, any values specified in the current cache range which weren't used would be lost.

    As for row insertion, just specify a default value for the column, like so:

    DEFAULT (NEXT VALUE FOR Audit.EventCounter),

  • Since the Itzik Ben Gan article was written the hardcoded cache size of 10 for IDENTITY seems to have been changed. From the comments on this connect item

    The size of the pre-allocation is based on the size of the data type of the column the identity property is defined on. For a SQL Server integer column, the server pre-allocates identities in ranges of 1000 values. For the bigint data type the server pre-allocates in ranges of 10000 values.

    The T-SQL Querying book contains the following table but emphasises that these values are not documented or guaranteed to be unchanged.

    |    DataType     | CacheSize |
    | TinyInt         | 10        |
    | SmallInt        | 100       |
    | Int             | 1,000     |
    | BigInt, Numeric | 10,000    |

    The article here tests various sequence cache sizes and insert batch sizes and comes up with the following results.

    enter image description here

    Which appears to show that for large inserts IDENTITY out performs SEQUENCE. It doesn't test cache size 1,000 however and also those results are just one test. Looking specifically at cache size 1,000 with various batch sizes of inserts I got the following results (trying each batch size 50 times and aggregating the results as below- all times in ?s.)

    |            |             Sequence              |             Identity              |
    | Batch Size |    Min    |    Max    |    Avg    |    Min    |    Max    |    Avg    |
    | 10         | 2,994     | 7,004     | 4,002     | 3,001     | 7,005     | 4,022     |
    | 100        | 3,997     | 5,005     | 4,218     | 4,001     | 5,010     | 4,238     |
    | 1,000      | 6,001     | 19,013    | 7,221     | 5,982     | 8,006     | 6,709     |
    | 10,000     | 26,999    | 33,022    | 28,645    | 24,015    | 34,022    | 26,114    |
    | 100,000    | 189,126   | 293,340   | 205,968   | 165,109   | 234,156   | 173,391   |
    | 1,000,000  | 2,208,952 | 2,344,689 | 2,269,297 | 2,058,377 | 2,191,465 | 2,098,552 |

    For larger batch sizes the IDENTITY version seems generally faster.

    The TSQL Querying book also explains why IDENTITY can have a performance advantage over sequence.

    The IDENTITY is table specific and SEQUENCE isn't. If disaster was to strike mid insert before the log buffer was flushed it doesn't matter if the recovered identity is an earlier one as the recovery process will also undo the insert, so SQL Server doesn't force flushing the log buffer on every identity cache related disc write. However for Sequence this is enforced as the value might be used for any purpose - including outside the database. So in the example above with a million inserts and cache size of 1,000 this is an additional thousand log flushes.

    Script to reproduce

    DECLARE @Results TABLE(
      BatchCounter INT,
      NumRows      INT,
      SequenceTime BIGINT,
      IdTime       BIGINT);
    DECLARE @NumRows      INT = 10,
            @BatchCounter INT;
    WHILE @NumRows <= 1000000
          SET @BatchCounter = 0;
          WHILE @BatchCounter <= 50
                --Do inserts using Sequence
                DECLARE @SequenceTimeStart DATETIME2(7) = SYSUTCDATETIME();
                INSERT INTO dbo.t1_Seq1_cache_1000
                SELECT N
                FROM   [dbo].[TallyTable] (@NumRows)
                OPTION (RECOMPILE);
                DECLARE @SequenceTimeEnd DATETIME2(7) = SYSUTCDATETIME();
                --Do inserts using IDENTITY
                DECLARE @IdTimeStart DATETIME2(7) = SYSUTCDATETIME();
                INSERT INTO dbo.t1_identity
                SELECT N
                FROM   [dbo].[TallyTable] (@NumRows)
                OPTION (RECOMPILE);
                DECLARE @IdTimeEnd DATETIME2(7) = SYSUTCDATETIME();
                INSERT INTO @Results
                SELECT @BatchCounter,
                       DATEDIFF(MICROSECOND, @SequenceTimeStart, @SequenceTimeEnd) AS SequenceTime,
                       DATEDIFF(MICROSECOND, @IdTimeStart, @IdTimeEnd)             AS IdTime;
                TRUNCATE TABLE dbo.t1_identity;
                TRUNCATE TABLE dbo.t1_Seq1_cache_1000;
                SET @BatchCounter +=1;
          SET @NumRows *= 10;
    SELECT NumRows,
           MIN(SequenceTime) AS MinSequenceTime,
           MAX(SequenceTime) AS MaxSequenceTime,
           AVG(SequenceTime) AS AvgSequenceTime,
           MIN(IdTime)       AS MinIdentityTime,
           MAX(IdTime)       AS MaxIdentityTime,
           AVG(IdTime)       AS AvgIdentityTime
    FROM   @Results
    GROUP  BY NumRows;

Related Questions