Posts Tagged ‘internationalisation’

Internationalisation in edge Rails

Wednesday, July 23rd, 2008

After the announcement yesterday that Rails 2.2 will have support for internationalisation (i18n) in the core, I read the linked articles and was slightly surprised that rather than unifying/replacing the i18n plugins currently available, it will actually provide an API which makes creating a back-end for handling your locales easier.

I’ve been using Globalite in the Rails project I’m working on and while it’s the best solution I’ve tried, I was always frustrated by the use of YAML for it’s locale files. Thankfully the changes in Rails 2.2 will likely mean a move to pure Ruby locale files, introducing the benefits of namespaces and the ability to override/extend generic files with site-specific ones.

Now we just have to wait a few months for the release and see what plugins adapt/appear to take advantage of it.

Configuring MySQL to use UTF-8

Tuesday, July 15th, 2008

A project I’m working on at the moment is going to have multiple language options available, not all of which use the same alphabet (e.g. Russian and Chinese).

To lessen the pain commonly associated with internationalisation on the web, it’s beneficial to use the UTF-8 character set. This short summary from the Unicode Consortium may help explain better;

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.

Thankfully MySQL has supported Unicode for quite some time now, even if it’s not configured to use it by default.

First, let’s check what our settings are at the moment;

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.01 sec)
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     | 
| character_set_connection | latin1                     | 
| character_set_database   | latin1                     | 
| character_set_filesystem | binary                     | 
| character_set_results    | latin1                     | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

That’s to be expected, but it’s not really what we wanted.

Find your MySQL configuration file (on most Linux/BSD systems it’s /etc/my.cnf) and make sure it’s got the following statements under the relevant headers.

[mysqld]
default-character-set=utf8
default-collation=utf8_general_ci
character-set-server=utf8
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'
 
[client]
default-character-set=utf8

Restart MySQL and make sure it’s working;

service mysql restart
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci | 
| collation_database   | utf8_general_ci | 
| collation_server     | utf8_general_ci | 
+----------------------+-----------------+
3 rows in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       | 
| character_set_connection | utf8                       | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | utf8                       | 
| character_set_server     | utf8                       | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

Update; Mo pointed out that it’s worth demonstrating setting the charset and collation when creating tables too (good point!).

CREATE TABLE `content` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `language` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `title` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM CHARACTER SET utf8 COLLATE utf8_general_ci;