Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
789 views
in Technique[技术] by (71.8m points)

tsql - Weird SQL Server 2005 Collation difference between varchar() and nvarchar()

Can someone please explain this:

SELECT 
  CASE WHEN CAST('iX' AS nvarchar(20)) 
      > CAST('-X' AS nvarchar(20)) THEN 1 ELSE 0 END,
  CASE WHEN CAST('iX' AS varchar(20)) 
      > CAST('-X' AS varchar(20)) THEN 1 ELSE 0 END

Results: 0 1

SELECT 
  CASE WHEN CAST('i' AS nvarchar(20)) 
      > CAST('-' AS nvarchar(20)) THEN 1 ELSE 0 END,
  CASE WHEN CAST('i' AS varchar(20)) 
      > CAST('-' AS varchar(20)) THEN 1 ELSE 0 END

Results: 1 1

On the first query, the nvarchar() result is not what I'm expecting, and yet removing the X make the nvarchar() sort happen as expected.

(My original queries used the '' and N'' literal syntax to distinguish varchar() and nvarchar() rather than CAST() and got the same result.)

Collation setting for the database is SQL_Latin1_General_CP1_CI_AS.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

When comparing unicode text, hyphens are treated specially. Unicode comparison uses "dictionary order", which ignores hyphens. This is not the case with non-unicode text comparison.

Comparing -X and iX, is like comparing X and iX, so -X, the left side, is greater. When comparing "-" and "i", is like comparing "" and "i", so "i", the right side is greater.

From MSDN,

A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen.

SELCT body From MSDN_Articles WHERE url IN ("http://support.microsoft.com/kb/322112")


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...