Cameron Yule

Configuring MySQL to use UTF-8

A project I’m working on at the moment is going to have multiple language options available, not all of which use the same alphabet (e.g. Russian and Chinese).

To lessen the pain commonly associated with internationalisation on the web, it’s beneficial to use the UTF-8 character set. This short summary from the Unicode Consortium may help explain better;

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.

Thankfully MySQL has supported Unicode for quite some time now, even if it’s not configured to use it by default.

First, let’s check what our settings are at the moment;

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.01 sec)
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     | 
| character_set_connection | latin1                     | 
| character_set_database   | latin1                     | 
| character_set_filesystem | binary                     | 
| character_set_results    | latin1                     | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

That’s to be expected, but it’s not really what we wanted.

Find your MySQL configuration file (on most Linux/BSD systems it’s /etc/my.cnf) and make sure it’s got the following statements under the relevant headers.

[mysqld]
default-character-set=utf8
default-collation=utf8_general_ci
character-set-server=utf8
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'
 
[client]
default-character-set=utf8

Restart MySQL and make sure it’s working;

service mysql restart
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci | 
| collation_database   | utf8_general_ci | 
| collation_server     | utf8_general_ci | 
+----------------------+-----------------+
3 rows in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       | 
| character_set_connection | utf8                       | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | utf8                       | 
| character_set_server     | utf8                       | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

Update; Mo pointed out that it’s worth demonstrating setting the charset and collation when creating tables too (good point!).

CREATE TABLE `content` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `language` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `title` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM CHARACTER SET utf8 COLLATE utf8_general_ci;

Published on July 15, 2008 in Programming, Server
Tags: , , , ,

Comments

Join the discussion by leaving a comment, or trackback from your own site. (Comment feed)

Enrico
22/09/2008

Hi,
thanks for sharing this meaningful example of how to manage databases and charsets.
I tried what you say here, and I got this strange “limbo” solution:

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | utf8_general_ci   |
| collation_server     | utf8_general_ci   |
+----------------------+-------------------+
3 rows in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

Oh, and
Your MySQL connection id is 6836 to server version: 5.0.27
just to take into account different versions of mysql.

Now what can you think should I do?

Bye,
Enrico

Cameron
24/09/2008

Hi Enrico, it looks as if the line:

init-connect=’SET NAMES utf8′

is not being executed – is this definitely in your my.cnf?

From the MySQL command prompt, try this to check it’s working;

mysql> show variables like 'init%';
+---------------+----------------+
| Variable_name | Value          |
+---------------+----------------+
| init_connect  | SET NAMES utf8 | 
| init_file     |                | 
| init_slave    |                | 
+---------------+----------------+
3 rows in set (0.00 sec)

Gavin
13/01/2009

Bear in mind that connecting are root will bypass the init-connect config setting. Just one of the many reasons why it’s better to set up secondary users on the server, even if it’s just a local development box.

masha
18/05/2009

i did all but when i run my site on localhost, it says connection failed. Please tell me what to do for this.
thanks

Stan
24/02/2010

Hi Cameron,

I got the same issue as Enrico. here my outputs:

[quote]
mysql> show variables like “%character%”;
+————————–+—————————-+
| Variable_name | Value |
+————————–+—————————-+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+————————–+—————————-+
8 rows in set (0.00 sec)
[/quote]

[quote]
mysql> show variables like ‘init%’;
+—————+—————-+
| Variable_name | Value |
+—————+—————-+
| init_connect | SET NAMES utf8 |
| init_file | |
| init_slave | |
+—————+—————-+
[/quote]

[quote]
mysql> show variables like ‘init%’;
+—————+—————-+
| Variable_name | Value |
+—————+—————-+
| init_connect | SET NAMES utf8 |
| init_file | |
| init_slave | |
+—————+—————-+
[/quote]

Do you have any suggestions?

Regards,
Stan

Stan
24/02/2010

nvm, I figured something out. It only showed that output when logged into mysql as root. If logged in as user it shows correctly :)

Loki Clock
11/06/2010

I managed to wrestle MySQL into Unicode submission using this advice in conjunction with my.cnf specifications mentioned in Getting out of MySQL Character Set Hell. I FEEL LIKE AN EXPERT. Much love to you, Yule, for showing me how to be sufficiently pedantic.