Source code editor What Is Ajax
↑
The following sections discuss various aspects of character set collations.
With the COLLATE
clause, you can override whatever the default collation is for a comparison. COLLATE
may be used in various parts of SQL statements. Here are some examples:
With ORDER BY
:
SELECT k FROM t1 ORDER BY k COLLATE latin1_german2_ci;
With AS
:
SELECT k COLLATE latin1_german2_ci AS k1 FROM t1 ORDER BY k1;
With GROUP BY
:
SELECT k FROM t1 GROUP BY k COLLATE latin1_german2_ci;
With aggregate functions:
SELECT MAX(k COLLATE latin1_german2_ci) FROM t1;
With DISTINCT
:
SELECT DISTINCT k COLLATE latin1_german2_ci FROM t1;
With WHERE
:
SELECT * FROM t1 WHERE _latin1 'Mьller' COLLATE latin1_german2_ci = k;
SELECT * FROM t1 WHERE k LIKE _latin1 'Mьller' COLLATE latin1_german2_ci;
With HAVING
:
SELECT k FROM t1 GROUP BY k HAVING k = _latin1 'Mьller' COLLATE latin1_german2_ci;
The COLLATE
clause has high precedence (higher than ||
), so the following two expressions are equivalent:
x || y COLLATE z x || (y COLLATE z)
The BINARY
operator casts the string following it to a binary string. This is an easy way to force a comparison to be done byte by byte rather than character by character. BINARY
also causes trailing spaces to be significant.
mysql>SELECT 'a' = 'A';
-> 1 mysql>SELECT BINARY 'a' = 'A';
-> 0 mysql>SELECT 'a' = 'a ';
-> 1 mysql>SELECT BINARY 'a' = 'a ';
-> 0
BINARY
is shorthand for str
CAST(
. str
AS BINARY)
The BINARY
attribute in character column definitions has a different effect. A character column defined with the BINARY
attribute is assigned the binary collation of the column's character set. Every character set has a binary collation. For example, the binary collation for the latin1
character set is latin1_bin
, so if the table default character set is latin1
, these two column definitions are equivalent:
CHAR(10) BINARY CHAR(10) CHARACTER SET latin1 COLLATE latin1_bin
The effect of BINARY
as a column attribute differs from its effect prior to MySQL 4.1. Formerly, BINARY
resulted in a column that was treated as a binary string. A binary string is a string of bytes that has no character set or collation, which differs from a non-binary character string that has a binary collation. For both types of strings, comparisons are based on the numeric values of the string unit, but for non-binary strings the unit is the character and some character sets allow multi-byte characters. Section 11.4.2, “The BINARY
and VARBINARY
Types”.
The use of CHARACTER SET binary
in the definition of a CHAR
, VARCHAR
, or TEXT
column causes the column to be treated as a binary data type. For example, the following pairs of definitions are equivalent:
CHAR(10) CHARACTER SET binary BINARY(10) VARCHAR(10) CHARACTER SET binary VARBINARY(10) TEXT CHARACTER SET binary BLOB
In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison operation. For example, in the following cases, it should be clear that the collation is the collation of column x
:
SELECT x FROM T ORDER BY x; SELECT x FROM T WHERE x = x; SELECT DISTINCT x FROM T;
However, when multiple operands are involved, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should this query use the collation of the column x
, or of the string literal 'Y'
?
Standard SQL resolves such questions using what used to be called “coercibility” rules. Basically, this means: Both x
and 'Y'
have collations, so which collation takes precedence? This can be difficult to resolve, but the following rules cover most situations:
An explicit COLLATE
clause has a coercibility of 0. (Not coercible at all.)
The concatenation of two strings with different collations has a coercibility of 1.
The collation of a column or a stored routine parameter or local variable has a coercibility of 2.
A “system constant” (the string returned by functions such as USER()
or VERSION()
) has a coercibility of 3.
A literal's collation has a coercibility of 4.
NULL
or an expression that is derived from NULL
has a coercibility of 5.
The preceding coercibility values are current as of MySQL 5.0.3. In MySQL 5.0 prior to 5.0.3, there is no system constant or ignorable coercibility. Functions such as USER()
have a coercibility of 2 rather than 3, and literals have a coercibility of 3 rather than 4.
Those rules resolve ambiguities in the following manner:
Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then:
If both sides are Unicode, or both sides are not Unicode, it is an error.
If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement will not return an error:
SELECT CONCAT(utf8_column, latin1_column) FROM t1;
It will return a result, and the character set of the result will be utf8
. The collation of the result will be the collation of utf8_column
. Values of latin1_column
will be automatically converted to utf8
before concatenating.
Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “subset” of Unicode. Because it is a well-known principle that “what applies to a superset can apply to a subset,” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.
Examples:
column1 = 'A' | Use collation of column1 |
column1 = 'A' COLLATE x | Use collation of 'A' COLLATE x |
column1 COLLATE x = 'A' COLLATE y | Error |
The COERCIBILITY()
function can be used to determine the coercibility of a string expression:
mysql>SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);
-> 0 mysql>SELECT COERCIBILITY(VERSION());
-> 3 mysql>SELECT COERCIBILITY('A');
-> 4
Each character set has one or more collations, but each collation is associated with one and only one character set. Therefore, the following statement causes an error message because the latin2_bin
collation is not legal with the latin1
character set:
mysql> SELECT _latin1 'x' COLLATE latin2_bin;
ERROR 1253 (42000): COLLATION 'latin2_bin' is not valid
for CHARACTER SET 'latin1'
Suppose that column X
in table T
has these latin1
column values:
Muffler Mьller MX Systems MySQL
Suppose also that the column values are retrieved using the following statement:
SELECT X FROM T ORDER BY X COLLATE collation_name
;
The following table shows the resulting order of the values if we use ORDER BY
with different collations:
latin1_swedish_ci | latin1_german1_ci | latin1_german2_ci |
Muffler | Muffler | Mьller |
MX Systems | Mьller | Muffler |
Mьller | MX Systems | MX Systems |
MySQL | MySQL | MySQL |
The character that causes the different sort orders in this example is the U with two dots over it (ь
), which the Germans call “U-umlaut.”
The first column shows the result of the SELECT
using the Swedish/Finnish collating rule, which says that U-umlaut sorts with Y.
The second column shows the result of the SELECT
using the German DIN-1 rule, which says that U-umlaut sorts with U.
The third column shows the result of the SELECT
using the German DIN-2 rule, which says that U-umlaut sorts with UE.