Would using varchar(5000) be bad compared to varchar(255)?

Since varchar anyways allocate space dynamically, my question is whether using varchar(255) be more efficient or save more space compared to using varchar(5000). If yes, why?

Answers 1

  • Yes, varchar(5000) can be worse than varchar(255) if all values will fit into the latter. The reason is that SQL Server will estimate data size and, in turn, memory grants based on the declared (not actual) size of the columns in a table. When you have varchar(5000), it will assume that every value is 2,500 characters long, and reserve memory based on that.

    Here's a demo from my recent GroupBy presentation on bad habits that makes it easy to prove for yourself (requires SQL Server 2016 for some of the sys.dm_exec_query_stats output columns, but should still be provable with SET STATISTICS TIME ON or other tools on earlier versions); it shows larger memory and longer runtimes for the same query against the same data - the only difference is the declared size of the columns:

    -- create three tables with different column sizes
    CREATE TABLE dbo.t1(a nvarchar(32),   b nvarchar(32),   c nvarchar(32),   d nvarchar(32));
    CREATE TABLE dbo.t2(a nvarchar(4000), b nvarchar(4000), c nvarchar(4000), d nvarchar(4000));
    CREATE TABLE dbo.t3(a nvarchar(max),  b nvarchar(max),  c nvarchar(max),  d nvarchar(max));
    GO -- that's important
    
    -- Method of sample data pop : irrelevant and unimportant.
    INSERT dbo.t1(a,b,c,d)
      SELECT TOP (5000) LEFT(name,1), RIGHT(name,1), ABS(column_id/10), ABS(column_id%10)
      FROM sys.all_columns ORDER BY object_id;
    GO 100
    INSERT dbo.t2(a,b,c,d) SELECT a,b,c,d FROM dbo.t1;
    INSERT dbo.t3(a,b,c,d) SELECT a,b,c,d FROM dbo.t1;
    GO
    
    -- no "primed the cache in advance" tricks
    DBCC FREEPROCCACHE WITH NO_INFOMSGS;
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
    GO
    
    -- Redundancy in query doesn't matter! Just has to create need for sorts etc.
    GO
    SELECT DISTINCT a,b,c,d, DENSE_RANK() OVER (PARTITION BY b,c ORDER BY d DESC)
    FROM dbo.t1 GROUP BY a,b,c,d ORDER BY c,a DESC;
    GO
    SELECT DISTINCT a,b,c,d, DENSE_RANK() OVER (PARTITION BY b,c ORDER BY d DESC)
    FROM dbo.t2 GROUP BY a,b,c,d ORDER BY c,a DESC;
    GO
    SELECT DISTINCT a,b,c,d, DENSE_RANK() OVER (PARTITION BY b,c ORDER BY d DESC)
    FROM dbo.t3 GROUP BY a,b,c,d ORDER BY c,a DESC;
    GO
    
    SELECT [table] = N'...' + SUBSTRING(t.[text], CHARINDEX(N'FROM ', t.[text]), 12) + N'...', 
    s.last_dop, s.last_elapsed_time, s.last_grant_kb, s.max_ideal_grant_kb
    FROM sys.dm_exec_query_stats AS s CROSS APPLY sys.dm_exec_sql_text(s.sql_handle) AS t
    WHERE t.[text] LIKE N'%dbo.'+N't[1-3]%' ORDER BY t.[text];
    

    So, yes, right-size your columns, please.

    Also, I re-ran the tests with varchar(32), varchar(255), varchar(5000), varchar(8000), and varchar(max). Similar results (click to enlarge), though differences between 32 and 255, and between 5,000 and 8,000, were negligible:

    enter image description here

    Here's another test with the TOP (5000) change for the more fully reproducible test I was being incessantly badgered about (click to enlarge):

    enter image description here

    So even with 5,000 rows instead of 10,000 rows (and there are 5,000+ rows in sys.all_columns at least as far back as SQL Server 2008 R2), a relatively linear progression is observed - even with the same data, the larger the defined size of the column, the more memory and time are required to satisfy the exact same query (even if it does have a meaningless DISTINCT).


Related Questions