charset: utf-8 with 1.0x series

A place to discuss Joomla! translation matters.

Moderator: wendhausen

Locked
User avatar
55thinking
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 183
Joined: Mon Sep 05, 2005 8:58 am
Location: Madrid
Contact:

charset: utf-8 with 1.0x series

Post by 55thinking » Wed Apr 12, 2006 8:35 am

I have been reading quite a few post, but i could not figure out what's the final status.

1. It seem we can change the chartset from the language file from iso to utf-8
2. Joomla 1.0.x does not support utf-8

What kind of problems can we expect if you still decide to go utf-8 with 1.0x series. Actually I have seen quite a few live site using joomla 1.0x and utf-8, with no apparent problems

Looking for a definitive answer. thanks
55 Thinking - Strategy Design Technology 
Good looking, Fast and Usable web solutions   
http://www.55thinking.com/

User avatar
eyesofkids
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 238
Joined: Tue Aug 23, 2005 6:04 am
Location: Taipei , Taiwan
Contact:

Re: charset: utf-8 with 1.0x series

Post by eyesofkids » Wed Apr 12, 2006 9:49 am

55thinking wrote: 1. It seem we can change the chartset from the language file from iso to utf-8
2. Joomla 1.0.x does not support utf-8
Yes, you can do it.
But you need to care about the database charset and ie utf-8 bugs.
Some functions in PHP don't support utf-8 strings like substr, strlen...and some of 'Regular Expression'.
So there are some little problems when the Joomla! 1.0.x handle these non-English strings.
IMO, the "support utf-8" of Joomla 1.0.x is better than the old Mambo.

Eddy Chang
All day long the superior man is creatively active. At nightfall his mind is still beset with cares. ~ I-CHING
Eddy Chang (Taipei, Taiwan)
Member of the Traditional Chinese Joomla Translation Team
http://www.joomla.org.tw

User avatar
davidgal
Joomla! Guru
Joomla! Guru
Posts: 963
Joined: Sat Aug 20, 2005 9:19 am
Location: Israel
Contact:

Re: charset: utf-8 with 1.0x series

Post by davidgal » Wed Apr 12, 2006 11:57 am

55thinking wrote: I have been reading quite a few post, but i could not figure out what's the final status.

1. It seem we can change the chartset from the language file from iso to utf-8
2. Joomla 1.0.x does not support utf-8

What kind of problems can we expect if you still decide to go utf-8 with 1.0x series. Actually I have seen quite a few live site using joomla 1.0x and utf-8, with no apparent problems

Looking for a definitive answer. thanks
Hi there,

I'll do my best to point out the issues of utf-8 in Joomla 1.0.x series.

To be fully utf-8 compatible the following needs to be fulfilled:
- The database needs to be utf-8 compliant otherwise there is a danger of data truncation. A 20 character string in utf-8 may be up to 60 bytes long. In a varchar field that is defined as utf-8 with a length of 20 - 20 utf-8 characters can be safely stored. The field adapts to the byte length. In a non-utf-8 database the same varchar (20) field will truncate the string after 20 bytes.

- The connection between the database and the php application needs to have utf-8 encoding otherwise unwanted conversions will occur and data corruption will result.

- Multibyte string functions need to be used when the applied data is encoded as utf-8. Unfortunately PHP's native string functions are not utf-8 aware and can seriously corrupt data (see http://www.phpwact.org/php/i18n/utf-8). There is an extension package to PHP 4 and 5 that has utf-8 aware string function ('mb_string'). However this extension is not always loaded/installed and the php code needs to be modified to call the appropriate mb_ versions of the string function. (PHP 6 will be fully Unicode and utf-8 aware).

- The HTML page encoding needs to be set to utf-8 (setting charset in the language file)

Why does Joomla 1.0.x seem to work fine when only the charset is set to utf-8?
This in fact occurs if only pure English is used. The reason is that all English characters are in lower ASCII and do not include any extended ASCII characters. In this special case utf-8 is equivalent to iso-8859-1 as all characters are single byte characters. The problems begin with European languages with diacritic Latin characters (umlauts, accents etc.) and with other non Latin languages. If you are only going to use English, you might as well stay with iso-8859-1. If you are going to use other languages, please check out the workaround below.

How is all this solved for Joomla 1.5?
See: http://dev.joomla.org/component/option, ... d,33/p,16/

Is there a workaround to apply utf-8 in Joomla 1.0.x series?
Yes. Here is a quick guideline to getting Joomla 1.0.x to work with utf-8
- use MySQL version 4.1.2 or newer (older versions don't support utf-8).
- create an empty database manually before installing Joomla. Set the character set to utf8 when creating with some collation (utf8_general_ci is the default and should be OK).
- convert the language files to utf-8 (all language files including for editors, components etc.)
- Install Joomla using the pre-existing database. After installation check that the database has utf8 encoding for all text fields (just in case Joomla created a new database and is not working on the pre-created one).
- set 'charset=utf-8' in the _ISO define in the language file
- You should uncomment one line of code in the includes/database.php file at about line 102 (second line below)

Code: Select all

$this->_table_prefix = $table_prefix;
//@mysql_query("SET NAMES 'utf8'", $this->_resource);    // THIS IS THE LINE TO UNCOMMENT
$this->_ticker = 0;
$this->_log = array();
Please note that the above does not make Joomla 1.0.x fully utf-8 compatible. All string functions will still be using singlebyte character functions. This works well in most cases (no guarantees). There will be some instances of garbage characters especially with diacritic Latin characters and logical error in searches and filtering features.

I sincerely hope that this is definitive enough :)
Last edited by davidgal on Wed Apr 12, 2006 10:27 pm, edited 1 time in total.
David Gal

User avatar
eyesofkids
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 238
Joined: Tue Aug 23, 2005 6:04 am
Location: Taipei , Taiwan
Contact:

Re: charset: utf-8 with 1.0x series

Post by eyesofkids » Wed Apr 12, 2006 7:24 pm

very clear description form davidgal.
It's a good article for every translators and non-English Joomla! users.
I will translate it to Chinese and post on my site....  ;D
Thank you, davidgal !
All day long the superior man is creatively active. At nightfall his mind is still beset with cares. ~ I-CHING
Eddy Chang (Taipei, Taiwan)
Member of the Traditional Chinese Joomla Translation Team
http://www.joomla.org.tw

User avatar
55thinking
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 183
Joined: Mon Sep 05, 2005 8:58 am
Location: Madrid
Contact:

Re: charset: utf-8 with 1.0x series

Post by 55thinking » Sat Apr 15, 2006 10:44 am

Thanks a lot David

You've provided an excellent definitive answer !!!
55 Thinking - Strategy Design Technology 
Good looking, Fast and Usable web solutions   
http://www.55thinking.com/


Locked

Return to “Translations”