Adding UTF-8 support is so easy!

For Joomla! 1.0 Coding related discussions.
Locked
xender
Joomla! Apprentice
Joomla! Apprentice
Posts: 26
Joined: Wed Sep 28, 2005 10:15 am

Adding UTF-8 support is so easy!

Post by xender » Wed Sep 28, 2005 10:21 am

Hello Joomlars!

I found the pieces required to UTF-enable mambo/joomla sites, and I thought it might be useful for someone to have it all in one place.

In fact, I'm hoping the developer team will consider adding such support to Joomla very soon now!

1. in database.php, after

Code: Select all

$this->_table_prefix = $table_prefix;
add

Code: Select all

@mysql_query("SET NAMES 'utf8'", $this->_resource);
(without this line you cannot operate doublebyte characters in your site)

2. fix too long keys in the mambo.sql / joomla.sql file:

Code: Select all

UNIQUE KEY `section_value_value_aro` (`section_value`(100),`value`(100)),
  UNIQUE KEY `mos_gacl_section_value_value_aro` (`section_value`(100),`value`(100)),
(without the length constraint setup fails)

Cheers,
Grzegorz

User avatar
spacemonkey
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 182
Joined: Fri Aug 12, 2005 7:50 pm
Location: Turin, Italy
Contact:

Re: Adding UTF-8 support is so easy!

Post by spacemonkey » Wed Sep 28, 2005 9:18 pm

Thanks for the suggestion!  8)

This solves some nagging problems for mysqli users for sure, including me!

Tested locally on a clean 1.0.1 install, and Jean-Marie has already forwarded to the Stability Team to incorporate into the next release. If there are no incompatibility issues with the mysql interface that is  ;)

xender
Joomla! Apprentice
Joomla! Apprentice
Posts: 26
Joined: Wed Sep 28, 2005 10:15 am

Re: Adding UTF-8 support is so easy!

Post by xender » Thu Sep 29, 2005 9:46 am

Hi again,
If there are no incompatibility issues with the mysql interface that is
I've been using the above UTF-8 hack on several websites for 3+ months now, both on a WAMP and LAMP setups, MySQL 4.0 & 4.1, and found not a single issue. Tested heavily with doublebyte characters (http://www.cjk.pl) and still works like charm! 8)  Looking forward to seeing it part of a standard release!


Also, I forgot to add one more requirement for full UTF-8 support:

3. In "language" folder, in a selected language file, e.g. "english.php", set correct ISO definition:

Code: Select all

DEFINE('_ISO','charset=utf-8'); 
(without this line browser might not recognize correct encoding)

Cheers,
Grzegorz
Last edited by xender on Thu Sep 29, 2005 9:56 am, edited 1 time in total.

Abstraiko
Joomla! Apprentice
Joomla! Apprentice
Posts: 32
Joined: Sun Sep 18, 2005 4:58 pm

Re: Adding UTF-8 support is so easy!

Post by Abstraiko » Fri Oct 07, 2005 12:10 pm

This was added in Joomla 1.0.2:

$this->_table_prefix = $table_prefix;
        //@mysql_query("SET NAMES 'utf8'", $this->_resource);


However I still have the following problem:

I did some changes in the language file english.php in order to put some portuguese words with special characters like "ç" or "ó", etc. The file has already utf-8 and it still doesn't show me those special chars.

As I had also char probs in the main body of my site, I also changed the index.php and put before and this:

" />

In this way I could repair the main body charset, but I still cannot see special chars defined in the language file.

Anyone could help me please?

Thanks a lot,
Abstraiko


Forgot to say that my local version of MySQL is 4.3 and it works fine. But my server MySQL version is 4.0 and it has the prob described. However I cannot change 4.0 version cause I am not server admin.
Last edited by Abstraiko on Fri Oct 07, 2005 12:12 pm, edited 1 time in total.

xender
Joomla! Apprentice
Joomla! Apprentice
Posts: 26
Joined: Wed Sep 28, 2005 10:15 am

Re: Adding UTF-8 support is so easy!

Post by xender » Fri Oct 07, 2005 1:01 pm

Hi!

AFAIK MySQL 4.1 onwards "officially" supports UTF8. However, you can easily import UTF-8 characters into older versions, they just won't get displayed correctly e.g from phpMyAdmin. I've been doing it for a long time, no problems at all.

Also, see the post: http://forum.joomla.org/index.php/topic,10156.0.html on fixing pre-1.0.2 installations, and removing the BOM marker from the import files, which I think will be necessary to trick an old MySQL into importing utf contents.

Hope it helps,

G.

Abstraiko
Joomla! Apprentice
Joomla! Apprentice
Posts: 32
Joined: Sun Sep 18, 2005 4:58 pm

Re: Adding UTF-8 support is so easy!

Post by Abstraiko » Fri Oct 07, 2005 4:20 pm

they just won't get displayed correctly e.g from phpMyAdmin. I've been doing it for a long time, no problems at all. 
What do you mean with this?

I export the DB using myphpadmin and mySQL 4.3.

then I import it using myphpadmin and mySQL 4.0

The strange part is that the database is appearing well..the problem is the stuff inside the language file (english.php).

If I take utf-8 from index.php than I start to see the words in language file ok, but the DB start to appear with strange chars instead of the special chars like ç, á, etc...

Should I turn the index.php to ISO-8859-1 and change only the language file? and than what should I do when importing the DB?

tkx

xender
Joomla! Apprentice
Joomla! Apprentice
Posts: 26
Joined: Wed Sep 28, 2005 10:15 am

Re: Adding UTF-8 support is so easy!

Post by xender » Fri Oct 07, 2005 8:10 pm

Hi again,

From what you write it seems to me that you have only one of the two converted to utf8 - db import file or language file. Convert BOTH of them to UTF (e.g. using UltraEdit File->Convert->Ascii to UTF8), and then it should be ok.

I suspect you might also need to get rid of the BOM marker from the UTF-8 language file.

G.

Abstraiko
Joomla! Apprentice
Joomla! Apprentice
Posts: 32
Joined: Sun Sep 18, 2005 4:58 pm

Re: Adding UTF-8 support is so easy!

Post by Abstraiko » Sat Oct 08, 2005 12:47 pm

ok I confess that I don't know anything bout this :( can you explain me how can I do:

"I suspect you might also need to get rid of the BOM marker from the UTF-8 language file."

I'm also trying to convert the files using Ultraedit...but is it possible that since the collation of the server is different than the one I use locally, will make this error? or the different version of mySQL?


Tkx

PS - converting both files using UltraEdit didn't help.
Last edited by Abstraiko on Sat Oct 08, 2005 3:40 pm, edited 1 time in total.

xender
Joomla! Apprentice
Joomla! Apprentice
Posts: 26
Joined: Wed Sep 28, 2005 10:15 am

Re: Adding UTF-8 support is so easy!

Post by xender » Sat Oct 08, 2005 8:25 pm

Hi again,

It's quite hard for me to help without seeing the actual problem  :(

BOM = Byte Order Mark. When you view/edit a UTF-8 file in a HEX editor, you will see that it starts with two bytes not visible in a text content of the file. They are, in practice, rather useless, aside from making it easy to some editors to recognize file as UTF-8 encoded, and making java compiler fail upon encountering such file with BOM  ;)

Removig the BOM is necessary ONLY if the SQL import, eg. via phpMyadmin or other means fails, telling you there is a problem with the content of the file - they read the first two bytes, but might be unable to recognize them as BOM, and try to interpret them as a part of an SQL command.
The strange part is that the database is appearing well..the problem is the stuff inside the language file (english.php).
Let me try some questions to help in finding the cause:
* What does your browser say - is it recognizing an UTF-8 encoding of the page?
* Why are you having problems with the file "english.php"? Shouldn't it be spanish.php or other?
* Is your english.php file REALLY UTF-8 encoded? I.e. when you load it into let's say UltraEdit, and switch to HEX view (Ctrl+H), are the problematic non-iso characters taking two bytes or one? Are all regular characters taking just one byte?

I really can't think of anything else.

G.
Last edited by xender on Sat Oct 08, 2005 8:27 pm, edited 1 time in total.

Abstraiko
Joomla! Apprentice
Joomla! Apprentice
Posts: 32
Joined: Sun Sep 18, 2005 4:58 pm

Re: Adding UTF-8 support is so easy!

Post by Abstraiko » Tue Oct 11, 2005 3:10 pm

You can see at the site...there are strange characters all over the place http://rscc.fw380.com

You can switch between encodings using right mouse button. if you put ISO the main window chars appears right and the left side (login form for example) appears wrong. and vice-versa If you put UTF-8 encoding. STRANGE and annoying :(

I use the english.php although I have changed some words into portuguese..for example instead of "Remember me" I put "Login Automático".

Tkx
Abstraiko

xender
Joomla! Apprentice
Joomla! Apprentice
Posts: 26
Joined: Wed Sep 28, 2005 10:15 am

Re: Adding UTF-8 support is so easy!

Post by xender » Tue Oct 11, 2005 9:58 pm

Hmmm... Ok, maybe getting closer ;)

Let's exclude conversion problems on import/export: the poll title on the right, displaying incorrectly in UTF-8 encoding, did you write it on the home computer or on the target server (I understood there are two)? If on the home one, can you try writing some content in the target (online) version of the website? If that fails, then it's database problem, and possibly only thing you can do (aside from making your provider upgrade MySQL) is checking both commented and active versions of this line of code: "@mysql_query("SET NAMES 'utf8'", $this->_resource);"

Cheers,

G.

P.S. Maybe can skype me on "grzegorzkrol", as (1) conversation would be more effective and (2) we are probably gettin into details not so interesting to the public?

Abstraiko
Joomla! Apprentice
Joomla! Apprentice
Posts: 32
Joined: Sun Sep 18, 2005 4:58 pm

Re: Adding UTF-8 support is so easy!

Post by Abstraiko » Wed Oct 12, 2005 10:35 am

Thanks.

I have added you to my skype.

Cya later.

Abstraiko

xender
Joomla! Apprentice
Joomla! Apprentice
Posts: 26
Joined: Wed Sep 28, 2005 10:15 am

Re: Adding UTF-8 support is so easy!

Post by xender » Wed Oct 12, 2005 11:15 am

Ok, problem solved.

Although I already explained it over skype, maybe others can find the solution useful, if they encounter similar problem:

there is a problem with the template you are using. You have:

Code: Select all

content="text/html";charset=utf-8" 
and you should have:

Code: Select all

content="text/html;charset=utf-8"
After fixing, all content will display correctly. Please note that this bug is actually promoted within one of the publicly-available and widely used templates, namely the "247portal-broad".

Cheers,
G.

Abstraiko
Joomla! Apprentice
Joomla! Apprentice
Posts: 32
Joined: Sun Sep 18, 2005 4:58 pm

Re: Adding UTF-8 support is so easy!

Post by Abstraiko » Wed Oct 12, 2005 12:16 pm

Yeah! :D Once again, thank you.

Let me just add, if anyone has the same prob:

in the includes/database.php we have to modify :

$this->_table_prefix = $table_prefix;
    //@mysql_query("SET NAMES 'utf8'", $this->_resource);


To this one

$this->_table_prefix = $table_prefix;
  @mysql_query("SET NAMES 'utf8'", $this->_resource);



Also in the template I've left:

content="text/html;

if I add charset=utf-8 it gives me errors.

Thanks...

Abstraiko

emeyer
Joomla! Explorer
Joomla! Explorer
Posts: 352
Joined: Thu Sep 29, 2005 2:37 am

Re: Adding UTF-8 support is so easy!

Post by emeyer » Wed Nov 02, 2005 7:37 pm

I don't know if others are having the same problem: menu titles cannot contain extended characters? For example (TM) etc. cause menus to break if in menu title.

Note: I am importing strings from external file, so this problem may not occur in wysiwig editors. 

User avatar
Hackwar
Joomla! Virtuoso
Joomla! Virtuoso
Posts: 3788
Joined: Fri Sep 16, 2005 8:41 pm
Location: NRW - Germany
Contact:

Re: Adding UTF-8 support is so easy!

Post by Hackwar » Wed Nov 02, 2005 10:00 pm

characters as (tm) are written with an ampersand (as far as I know) in HTML and when you have this in the URL, you get a new variable, for example for index.php?option=com_content&title=Hello_&_goodbye you would get the variable title with the value Hello_ and the variable amp;_goodbye with the value 0. This is under review by the devs
god doesn't play dice with the universe. not after that drunken night with the devil where he lost classical mechanics in a game of craps.

Since the creation of the Internet, the Earth's rotation has been fueled, primarily, by the collective spinning of English teachers in their graves.

emeyer
Joomla! Explorer
Joomla! Explorer
Posts: 352
Joined: Thu Sep 29, 2005 2:37 am

Re: Adding UTF-8 support is so easy!

Post by emeyer » Thu Nov 03, 2005 2:05 am

So I tried changing them to utf8 (multibyte) characters but it appears the menu text rendering does not work with utf8, it trashes the menus instead.

User avatar
rsphaeroides
Joomla! Ace
Joomla! Ace
Posts: 1369
Joined: Sun Aug 21, 2005 2:57 pm
Location: Colorado, USA
Contact:

Re: Adding UTF-8 support is so easy!

Post by rsphaeroides » Sat Nov 12, 2005 4:57 am

I'm new at this and learning as I go, but I wrote a couple of modules and got UTF-8 working with them as well as in mambo 4.5.2.3 by studying this and other posts, but I've still got a problem. 

My installation created the SQL database with the character set and collation as Latin1 Swedish ci.  My situation now is that I can use special characters just fine as long as the content comes in and out through mambo (special characters look like jibberish if I view them in the database with phpMyAdmin, but they look fine in mambo front and back). 

However, if I edit the database with phpMyAdmin, via an external program, or by uploading CSV or other files then the special characters look fine in the database (as viewed with phpMyAdmin) but show up as jibberish in mambo (front end or back)

This situation persists when I change the character set and collation of the database table to any of the many UTF-8 options I have (still looks ok when I add or display via mambo, still looks like jibberish if I add any other way)

Any suggestions how to go about correcting this would be most appreciated as I need the ability to interact with the database outside mambo.
¡Pura Vida!
Ray,
joomla in testing at Costa Rica Travel: http://costaricamap.net
http://costa-rica-guide.com

xender
Joomla! Apprentice
Joomla! Apprentice
Posts: 26
Joined: Wed Sep 28, 2005 10:15 am

Re: Adding UTF-8 support is so easy!

Post by xender » Sat Nov 12, 2005 11:23 pm

Hello!

The symptoms you described are absolutely clear.

If you PHPMyadmin doesn't allow you to use the UTF, this most probably means you have some old version - usually it happens when you use PHPMyAdmin offered by your hosting provider, which is too lazy to upgrade. In such case simply copy a new version of PHPMyAdmin to your website and use yours.

If, however, it is possible for you to use UTF in PHPMyAdmin, you just click on the "home" icon and, first of all, switch the language to English (UTF-8). Collation doesn't matter very much at this stage, as it - afaik - mostly describes sorting order. Even if your provider doesn't allow you to create new, blank database (at which stage you can decide to make it UTF-8), now you will be able to SEE characters OK in PHPMyAdmin.

My suggestions to fix the database are following:

1) if possible to create blank database selecting UTF-8, just do so.
2) make sure each table in your database has UTF-8 charset forced. You can do this both for joomla installation files BEFORE installing JOOMLA, or just update your existing installation.
  • export the database to text file (if updating existing installation, else just use sql file from /install/ directory in JOOMLA, and fix it once and for good)
  • convert contents to UTF-8 &
  • fix key lengths if necessary (see my opening post in this thread)
  • fix the db enconding of the tables by forcing each individual table to be UTF-8, like that:

Code: Select all

CREATE TABLE IF NOT EXISTS `jos_table_name` (
...
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
  • for existing installations: re-import the database from PHPMyAdmin, making sure that either AUTO or UTF-8 is selected on the import screen
  • for new installations: just use the modified sql file with JOOMLA setup procedure
Let me know if it helps!

Cheers,
Grzegorz

User avatar
rsphaeroides
Joomla! Ace
Joomla! Ace
Posts: 1369
Joined: Sun Aug 21, 2005 2:57 pm
Location: Colorado, USA
Contact:

Re: Adding UTF-8 support is so easy!

Post by rsphaeroides » Sun Nov 13, 2005 12:29 am

Thanks xender,

I have a pretty much free rein on the server so I'll follow the new database path.  I installed whatever was the latest version of PHPMyAdmin a few weeks ago.

This is the first suggestion that I've gotten that sounds like it should provide a comprehensive fix. It makes perfect sense and I'll give it a try as soon as I can set aside a block of time to get through it all at once.  Thanks again.
¡Pura Vida!
Ray,
joomla in testing at Costa Rica Travel: http://costaricamap.net
http://costa-rica-guide.com

tnpa
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 211
Joined: Tue Dec 13, 2005 4:16 pm

Re: Adding UTF-8 support is so easy!

Post by tnpa » Tue Dec 13, 2005 4:30 pm

Hello Xender and all,

My webpage can display utf8 correctly.  But the problem is when I surge the web, reading the news, IE goes to other encoder such as Western European (ISO), or Western Windows...  If I go back to my webpage, it does not go to UTF-8 as it should, but it stays at current encoder, unless I select auto select in the view menu.  How can I setup auto select or my webpage will go to utf8 so whoever open my webpage, it will display correctly?

Tnpa

tijs
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 109
Joined: Mon Aug 29, 2005 7:59 pm

Re: Adding UTF-8 support is so easy!

Post by tijs » Tue Dec 13, 2005 4:39 pm

tnpa wrote:

My webpage can display utf8 correctly.  But the problem is when I surge the web, reading the news, IE goes to other encoder such as Western European (ISO), or Western Windows...  If I go back to my webpage, it does not go to UTF-8 as it should, but it stays at current encoder, unless I select auto select in the view menu.  How can I setup auto select or my webpage will go to utf8 so whoever open my webpage, it will display correctly?
Add this to your .htaccess:

AddDefaultCharset utf-8

Don't ask me any details, but this worked for me...

[email protected]
Joomla! Apprentice
Joomla! Apprentice
Posts: 32
Joined: Mon Feb 27, 2006 4:07 am
Contact:

Re: Adding UTF-8 support is so easy!

Post by [email protected] » Tue Mar 07, 2006 5:04 am

Has anyone had problems with the tinymce editor and UTF-8?

I am trying to track a problem I having entering Chinese charachters in UTF-8. Even though my site has been fully converted to UTF-8 using the stuff in this forum (thanks guys), I see in the source when I click on html for the javascript pop up window in tinymce editor the charachter encoding is still iso-8859-1

I think this is what is scrambling the charachters going into the database, and they are returned as ???????????.

I am using the 1.07 version package.
All Southern Chile www.allsouthernchile.com

tnpa
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 211
Joined: Tue Dec 13, 2005 4:16 pm

Re: Adding UTF-8 support is so easy!

Post by tnpa » Tue Mar 07, 2006 5:34 am

Hello [email protected],

You need to use MosCE editor instead of TinyMCE, or better yet JCE editor 1.0.4  just came out few days ago.

You go to folder language, find file english.php, or in your case maybe chinese.php, then find line DEFINE('_ISO','charset=iso-8859-1');

change it to: DEFINE('_ISO','charset=utf-8');

If you read earlier post, you need to have this line in .htaccess file:

AddDefaultCharset utf-8

at the very last.

Tnpa


   
Last edited by tnpa on Wed May 17, 2006 4:31 pm, edited 1 time in total.

User avatar
raymond
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 245
Joined: Tue Jan 24, 2006 3:24 pm
Location: Philippines
Contact:

Re: Adding UTF-8 support is so easy!

Post by raymond » Sun Mar 12, 2006 7:52 pm

Been trying it out and it works great on a test install of Joomla. However, there seems to be a different way that Joomla writes item titles to the database. I tried inputting "1901—1950 Archives" in the Title field of a content item. Looking at the database, it turns into an ordinary em dash. Visiting the webpage confirms this. The W3C Validator also says its the old invalid em dash character and not the "—" that shows up. Anything I'm doing wrong? Or is it just plain impossible to get special characters into an item title?
http://raymond.santosestrella.net
Santos Estrella Personal Site
http://www.thecorpusjuris.com
The online repository of Philippine law, jurisprudence, administrative issuances and legal research tools.

rzelnik
Joomla! Fledgling
Joomla! Fledgling
Posts: 1
Joined: Wed May 17, 2006 3:52 pm

Re: Adding UTF-8 support is so easy!

Post by rzelnik » Wed May 17, 2006 3:56 pm

[email protected] wrote: Has anyone had problems with the tinymce editor and UTF-8?
You can use tinymce editor with UTF-8, if you insert into mambots/editors/tinymce.php after
tinyMCE.init({
theme : "$theme",
language : "en",
...
this line:
entity_encoding : "raw",

mzorali
Joomla! Intern
Joomla! Intern
Posts: 53
Joined: Fri Mar 16, 2007 10:52 am

Re: Adding UTF-8 support is so easy!

Post by mzorali » Wed Dec 31, 2008 3:44 pm

Guys,

Grzegorz, I tried to find the file database.php in my Joomla 1.5.8 folder but didn't, I uploaded the folder by ftp tp my PC and searched for the file and the text code:
$this->_table_prefix = $table_prefix;
but did not find both
can you tell me where can i find them to add the code, as i have a problem in displaying Arabic Language.

Same for the language folder, i did not find english.php or arabic.php, even when i searched for the text in files for DEFINE('_ISO', to set it to UTF8, couldnt.

Will appreciate your help and tips.

Thanks
Zorali


Locked

Return to “Joomla! 1.0 Coding”