Summary: in this tutorial, we’ll explore MySQL character sets and their importance in how MySQL stores text data.
In MySQL, the character set and collation are essential concepts that define how data is stored and sorted in text columns.
Understanding character sets and collations are crucial for designing databases that can handle different languages and text requirements.
Introduction to MySQL Character Sets
A character set is a collection of characters with a unique encoding. It defines the set of characters that can be used in a text column, such as letters, numbers, symbols, and special characters.
MySQL supports various character sets, and the choice of character set determines the range of characters that can be stored in a column.
Common character sets include utf8
, utf8mb4
, latin1
, utf16
, and many others. To list all character sets in the current MySQL server, you use the following statement:
SHOW CHARACTER SET;
Code language: SQL (Structured Query Language) (sql)
Here’s the partial output:
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| binary | Binary pseudo charset | binary | 1 |
...
Code language: PHP (php)
The Maxlen
column specifies the number of bytes that a character in a character set holds.
Some character sets contain single-byte characters e.g., latin1
, latin2
, cp850
, etc., whereas other character sets contain multi-byte characters.
The default character sets are utf8mb4
and utf8mb4_0900_ai_ci
. However, you can change them accordingly.
Converting between different character sets
MySQL provides two functions that allow you to convert strings between different character sets: CONVERT
and CAST
.
The syntax of the CONVERT
function is as follows:
CONVERT(expression USING character_set_name)
Code language: SQL (Structured Query Language) (sql)
The CAST
function is similar to the CONVERT
function. It converts a string to a different character set:
CAST(string AS character_type CHARACTER SET character_set_name)
Code language: SQL (Structured Query Language) (sql)
Take a look at the following example of using the CAST
function:
SELECT CAST(_latin1'MySQL character set' AS CHAR CHARACTER SET utf8);
Code language: SQL (Structured Query Language) (sql)
Setting character sets for client connections
To configure a character set for a client connection, you can do one of the following ways:
1) Using the SET NAMES statement
Issue the SET NAME
statement after the client connected to the MySQL database server. For example, to set a Unicode character set utf8mb4
, you use the following statement:
SET NAMES 'utf8mb4';
Code language: SQL (Structured Query Language) (sql)
2) Using –default-character-set option
If the application supports the --default-character-set
option, you can use it to set the character set.
For example, mysql client tool supports --default-character-set
option and you can set it up in the configuration file as follows:
[mysql]
default-character-set=utf8mb4
Code language: plaintext (plaintext)
3) Using the charset in connection strings
Some MySQL connectors allow you to specify a character set. For example, if you use PHP PDO, you can set the character set in the data source name as follows:
$dsn ="mysql:host=$host;dbname=$db;charset=utf8mb4";
Code language: PHP (php)
Regardless of which way you use it, make sure that the character set used by the application matches the character set stored in the MySQL database server.
Summary
- Character sets define how MySQL stores data in text columns.
- Use the
SHOW CHARACTER SET
statement to list all character sets. - Use the
CONVERT()
orCAST()
function to convert between character sets.