Are there any disadvantages to always using nvarchar(MAX)?
In SQL Server 2005, are there any disadvantages to making all character fields nvarchar(MAX) rather than specifying a length explicitly, e.g. nvarchar(255)? (Apart from the obvious one that you aren’t able to limit the field length at the database level)
22 Solutions collect form web for “Are there any disadvantages to always using nvarchar(MAX)?”
Same question was asked on MSDN Forums:
- Varchar(max) vs Varchar(255)
From the original post (much more information there):
When you store data to a VARCHAR(N) column, the values are physically stored in the same way. But when you store it to a VARCHAR(MAX) column, behind the screen the data is handled as a TEXT value. So there is some additional processing needed when dealing with a VARCHAR(MAX) value. (only if the size exceeds 8000)
VARCHAR(MAX) or NVARCHAR(MAX) is considered as a ‘large value type’. Large value types are usually stored ‘out of row’. It means that the data row will have a pointer to another location where the ‘large value’ is stored…
Its a fair question and he did state apart from the obvious…
Disadvantages could include:
Query optimizer uses field size to determine most efficent exectution plan
“1. The space alloction in extends and pages of the database are flexible. Thus when adding information to the field using update, your database would have to create a pointer if the new data is longer than the previous inserted. This the database files would become fragmented = lower performance in almost everything, from index to delete, update and inserts. ”
Integration implications – hard for other systems to know how to integrate with your database
Unpredictable growth of data
Possible security issues e.g. you could crash a system by taking up all disk space
There is good article here:
Sometimes you want the data type to enforce some sense on the data in it.
Say for example you have a column that really shouldn’t be longer than, say, 20 characters. If you define that column as VARCHAR(MAX), some rogue application could insert a long string into it and you’d never know, or have any way of preventing it.
The next time your application uses that string, under the assumption that the length of the string is modest and reasonable for the domain it represents, you will experience an unpredictable and confusing result.
I checked some articles and find useful test script from this: http://www.sqlservercentral.com/Forums/Topic1480639-1292-1.aspx
Then changed it to compare between NVARCHAR(10) vs NVARCHAR(4000) vs NVARCHAR(MAX) and I don’t find speed difference when using specified numbers but when using MAX. You can test by yourself. Hope This help.
SET NOCOUNT ON; --===== Test Variable Assignment 1,000,000 times using NVARCHAR(10) DECLARE @SomeString NVARCHAR(10), @StartTime DATETIME; --===== SELECT @startTime = GETDATE(); SELECT TOP 1000000 @SomeString = 'ABC' FROM master.sys.all_columns ac1, master.sys.all_columns ac2; SELECT testTime='10', Duration = DATEDIFF(ms,@StartTime,GETDATE()); GO --===== Test Variable Assignment 1,000,000 times using NVARCHAR(4000) DECLARE @SomeString NVARCHAR(4000), @StartTime DATETIME; SELECT @startTime = GETDATE(); SELECT TOP 1000000 @SomeString = 'ABC' FROM master.sys.all_columns ac1, master.sys.all_columns ac2; SELECT testTime='4000', Duration = DATEDIFF(ms,@StartTime,GETDATE()); GO --===== Test Variable Assignment 1,000,000 times using NVARCHAR(MAX) DECLARE @SomeString NVARCHAR(MAX), @StartTime DATETIME; SELECT @startTime = GETDATE(); SELECT TOP 1000000 @SomeString = 'ABC' FROM master.sys.all_columns ac1, master.sys.all_columns ac2; SELECT testTime='MAX', Duration = DATEDIFF(ms,@StartTime,GETDATE()); GO
Think of it as just another safety level. You can design your table without foreign key relationships – perfectly valid – and ensure existence of associated entities entirely on the business layer. However, foreign keys are considered good design practice because they add another constraint level in case something messes up on the business layer. Same goes for field size limitation and not using varchar MAX.
A reason NOT to use max or text fields is that you cannot perform online index rebuilds i.e. REBUILD WITH ONLINE= ON even with SQL Server Enterprise Edition.
Based on the link provided in the accepted answer it appears that:
100 characters stored in an
nvarchar(MAX)field will be stored no different to 100 characters in an
nvarchar(100)field – the data will be stored inline and you will not have the overhead of reading and writing data ‘out of row’. So no worries there.
If the size is greater than 4000 the data would be stored ‘out of row’ automatically, which is what you would want. So no worries there either.
- You cannot create an index on an
nvarchar(MAX)column. You can use full-text indexing, but you cannot create an index on the column to improve query performance. For me, this seals the deal…it is a definite disadvantage to always use nvarchar(MAX).
If you want a kind of “universal string length” throughout your whole database, which can be indexed and which will not waste space and access time, then you could use
The only problem I found was that we develop our applications on SQL Server 2005, and in one instance, we have to support SQL Server 2000. I just learned, the hard way that SQL Server 2000 doesn’t like the MAX option for varchar or nvarchar.
Bad idea when you know the field will be in a set range- 5 to 10 character for example. I think I’d only use max if I wasn’t sure what the length would be. For example a telephone number would never be more than a certain number of characters.
Can you honestly say you are that uncertain about the approximate length requirements for every field in your table?
I do get your point though- there are some fields I’d certainly consider using varchar(max).
Interestingly the MSDN docs sum it up pretty well:
Use varchar when the sizes of the
column data entries vary considerably.
Use varchar(max) when the sizes of the
column data entries vary considerably,
and the size might exceed 8,000 bytes.
There’s an interesting discussion on the issue here.
The job of the database is to store data so that it can be used by the enterprise. Part of making that data useful is ensuring that it is meaningful. Allowing someone to enter an unlimited number of characters for their first name isn’t ensuring meaningful data.
Building these constraints into the business layer is a good idea, but that doesn’t ensure that the database will remain intact. The only way to guarantee that the data rules are not violated is to enforce them at the lowest level possible in the database.
One problem is that if you are having to work with multiple versions of SQL Server, the MAX will not always work. So if you are working with legacy DB’s or any other situation that involves multiple versions, you better be very careful.
As was pointed out above, it is primarily a tradeoff between storage and performance. At least in most cases.
However, there is at least one other factor that should be considered when choosing n/varchar(Max) over n/varchar(n). Is the data going to be indexed (such as, say, a last name)? Since the MAX definition is considered a LOB, then anything defined as MAX is not available for indexing. and without an index, any lookup involving the data as predicate in a WHERE clause is going to be forced into a Full Table scan, which is the worst performance you can get for data lookups.
1) The SQL server will have to utilize more resources (allocated memory and cpu time) when dealing with nvarchar(max) vs nvarchar(n) where n is a number specific to the field.
2) What does this mean in regards to performance?
On SQL Server 2005, I queried 13,000 rows of data from a table with 15 nvarchar(max) columns.
I timed the queries repeatedly and then changed the columns to nvarchar(255) or less.
The queries prior to the optimization averaged at 2.0858 seconds. The queries after the change returned in an average of 1.90 seconds. That was about 184 milliseconds of improvement to the basic select * query. That is an 8.8% improvement.
3) My results are in concurrence with a few other articles that indicated that there was a performance difference. Depending on your database and the query, the percentage of improvement can vary. If you don’t have a lot of concurrent users or very many records, then the performance difference won’t be an issue for you. However, the performance difference will increase as more records and concurrent users increase.
I had a udf which padded strings and put the output to varchar(max). If this was used directly instead of casting back to the appropriate size for the column being adjusted, the performance was very poor. I ended up putting the udf to an arbitrary length with a big note instead of relying on all the callers of the udf to re-cast the string to a smaller size.
Interesting link: Why use a VARCHAR when you can use TEXT?
It’s about PostgreSQL and MySQL, so the performance analysis is different, but the logic for “explicitness” still holds: Why force yourself to always worry about something that’s relevant a small percentage of the time? If you saved an email address to a variable, you’d use a ‘string’ not a ‘string limited to 80 chars’.
legacy system support. If you have a system that is using the data and it is expected to be a certain length then the database is a good place to enforce the length. This is not ideal but legacy systems are sometime not ideal. =P
If all of the data in a row (for all the columns) would never reasonably take 8000 or fewer characters then the design at the data layer should enforce this.
The database engine is much more efficient keeping everything out of blob storage. The smaller you can restrict a row the better. The more rows you can cram in a page the better. The database just performs better when it has to access fewer pages.
The main disadvantage I can see is that let’s say you have this:
Which one gives you the most information about the data needed for the UI?
CREATE TABLE [dbo].[BusData]( [ID] [int] IDENTITY(1,1) NOT NULL, [RecordId] [nvarchar](MAX) NULL, [CompanyName] [nvarchar](MAX) NOT NULL, [FirstName] [nvarchar](MAX) NOT NULL, [LastName] [nvarchar](MAX) NOT NULL, [ADDRESS] [nvarchar](MAX) NOT NULL, [CITY] [nvarchar](MAX) NOT NULL, [County] [nvarchar](MAX) NOT NULL, [STATE] [nvarchar](MAX) NOT NULL, [ZIP] [nvarchar](MAX) NOT NULL, [PHONE] [nvarchar](MAX) NOT NULL, [COUNTRY] [nvarchar](MAX) NOT NULL, [NPA] [nvarchar](MAX) NULL, [NXX] [nvarchar](MAX) NULL, [XXXX] [nvarchar](MAX) NULL, [CurrentRecord] [nvarchar](MAX) NULL, [TotalCount] [nvarchar](MAX) NULL, [Status] [int] NOT NULL, [ChangeDate] [datetime] NOT NULL ) ON [PRIMARY]
CREATE TABLE [dbo].[BusData]( [ID] [int] IDENTITY(1,1) NOT NULL, [RecordId] [nvarchar](50) NULL, [CompanyName] [nvarchar](50) NOT NULL, [FirstName] [nvarchar](50) NOT NULL, [LastName] [nvarchar](50) NOT NULL, [ADDRESS] [nvarchar](50) NOT NULL, [CITY] [nvarchar](50) NOT NULL, [County] [nvarchar](50) NOT NULL, [STATE] [nvarchar](2) NOT NULL, [ZIP] [nvarchar](16) NOT NULL, [PHONE] [nvarchar](18) NOT NULL, [COUNTRY] [nvarchar](50) NOT NULL, [NPA] [nvarchar](3) NULL, [NXX] [nvarchar](3) NULL, [XXXX] [nvarchar](4) NULL, [CurrentRecord] [nvarchar](50) NULL, [TotalCount] [nvarchar](50) NULL, [Status] [int] NOT NULL, [ChangeDate] [datetime] NOT NULL ) ON [PRIMARY]
One disadvantage is that you will be designing around an unpredictable variable, and you will probably ignore instead of take advantage of the internal SQL Server data structure, progressively made up of Row(s), Page(s), and Extent(s).
Which makes me think about data structure alignment in C, and that being aware of the alignment is generally considered to be a Good Thing (TM). Similar idea, different context.
MSDN page for Pages and Extents
MSDN page for Row-Overflow Data
My tests have shown that there are differences when selecting.
CREATE TABLE t4000 (a NVARCHAR(4000) NULL); CREATE TABLE tmax (a NVARCHAR(MAX) NULL); DECLARE @abc4 NVARCHAR(4000) = N'ABC'; INSERT INTO t4000 SELECT TOP 1000000 @abc4 FROM master.sys.all_columns ac1, master.sys.all_columns ac2; DECLARE @abc NVARCHAR(MAX) = N'ABC'; INSERT INTO tmax SELECT TOP 1000000 @abc FROM master.sys.all_columns ac1, master.sys.all_columns ac2; SET STATISTICS TIME ON; SET STATISTICS IO ON; SELECT * FROM dbo.t4000; SELECT * FROM dbo.tmax;
This will cause a performance problem, although it may never cause any actual issues if your database is small. Each record will take up more space on the hard drive and the database will need to read more sectors of the disk if you’re searching through a lot of records at once. For example, a small record could fit 50 to a sector and a large record could fit 5. You’d need to read 10 times as much data from the disk using the large record.
It will make screen design harder as you will no longer be able to predict how wide your controls should be.