The Joomla! Forum ™





Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 21 posts ] 
Author Message
PostPosted: Tue Mar 25, 2008 9:00 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 17321
Location: **Translation Matters**
Present status.
Transliteration is implemented in 1.5.x for latin languages only.

Concretely it means that when the alias field remains empty, it is automatically filled by:
1. Latin languages
The ANSI equivalent of accented letters i.e. non-accented letters.
à ->a
ë -> e
Ŝ -> s
etc.
2. Non latin languages (Greek, Cyrillic, Chinese, Arabic scripts, etc.
The date when the article is saved.

Proposal
To implement in core a way to get transliteration applied to any language that would provide the necessary ini file.

At first sight, it could look easy in the sense that a parameter could decide to use or not an available ini, based on language defined for site or admin.
At second sight, not so easy as any unicode glyphs can be used in 1.5.x, whatever the choice of languages for front or back-end.

Yvolk proposed, a year ago, a solution which was not followed up:
viewtopic.php?f=231&t=140836&st=0&sk=t&sd=a&hilit=transliteration&start=90#p749972

Could anyone have a look and consider its feasability for 1.6?

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Tue Mar 25, 2008 10:31 am 
User avatar
Joomla! Virtuoso
Joomla! Virtuoso

Joined: Fri Sep 02, 2005 10:06 am
Posts: 3177
Location: Solar system - Earth - European Union
very interesting your thread about a more international language support... I think this problem can be a big issue when you can have an URL directly written in russian / chinese and so on...

_________________
former Q&T WorkGroup Joomla member - Italian Translation Team Member


Top
 Profile  
 
PostPosted: Tue Mar 25, 2008 10:02 pm 
Joomla! Champion
Joomla! Champion

Joined: Wed Nov 22, 2006 3:35 pm
Posts: 7056
Location: Nebraska
I don't know what all this involves and how a solution would be designed, but I would welcome this improvement.

_________________
http://Twitter.com/AmyStephen
http://www.alltogetherasawhole.org/


Top
 Profile  
 
PostPosted: Thu Mar 27, 2008 3:52 pm 
User avatar
Joomla! Guru
Joomla! Guru

Joined: Thu Jun 01, 2006 1:52 pm
Posts: 979
Location: Moscow, Russia
Thank you, infograf768, for relaunching this topic 8)
I'm still ready to help, if I'll be needed.


Top
 Profile  
 
PostPosted: Thu Jul 17, 2008 10:20 am 
User avatar
Joomla! Guru
Joomla! Guru

Joined: Thu Jun 01, 2006 1:52 pm
Posts: 979
Location: Moscow, Russia
Hi, I have good news!
I've moved this last year's work to the working multilingual plugin, yvTransliterate.
The plugin works, although its integration with Joomla! is not smart enough, and this may
give new impulse for Joomla! core team to improve multilingual support of Joomla!

See more information in the thread.


Top
 Profile  
 
PostPosted: Thu Jul 17, 2008 11:21 am 
Joomla! Champion
Joomla! Champion

Joined: Wed Nov 22, 2006 3:35 pm
Posts: 7056
Location: Nebraska
Free software communities are amazing creatures. When barriers are lowered so that anyone can participate, something very cool happens. People pick up problems to solve that they find interesting. Combined, as we share the results of our work, we end up with far more than we could begin to accomplish on individually.

Currently, there are eleven members of the Joomla! core team. Recently, we welcomed our 200,000th forum member and someone downloaded the 5,000,000th copy of Joomla!. Not to discredit the considerable efforts of those who are and who have served on the core team, but, if anyone is waiting on *them* to improve anything without significant help from this great community, those people will be waiting a very, very long time.

Thanks, Yuri, for selecting another problem that you find interesting to solve. I am confident you will solve it. I hope you have fun in the process and that you learn cool things. It would be awesome if you improved this area of study in such a way that other free software communities learned from your improvements.

Amy :)

_________________
http://Twitter.com/AmyStephen
http://www.alltogetherasawhole.org/


Top
 Profile  
 
PostPosted: Thu Jul 17, 2008 12:14 pm 
User avatar
Joomla! Guru
Joomla! Guru

Joined: Thu Jun 01, 2006 1:52 pm
Posts: 979
Location: Moscow, Russia
AmyStephen wrote:
Thanks, Yuri ... I hope you have fun in the process...

Yeah, that was really fun game. I went to sleep at 3AM last night, that is VERY unusial for me :)


Top
 Profile  
 
PostPosted: Thu Jul 17, 2008 12:51 pm 
Joomla! Champion
Joomla! Champion

Joined: Wed Nov 22, 2006 3:35 pm
Posts: 7056
Location: Nebraska
That is so cool when a project is that interesting that we sacrifice sleep! There have been a number of times I worked all through the night to get something done. Talk about feeling alive! Being able to accomplish something that is important to me is fulfilling. Anyway, congratulations and good luck! I hope it continues to be fun.

Amy :)

_________________
http://Twitter.com/AmyStephen
http://www.alltogetherasawhole.org/


Top
 Profile  
 
PostPosted: Tue Oct 21, 2008 12:56 pm 
Joomla! Fledgling
Joomla! Fledgling

Joined: Tue Oct 21, 2008 12:03 pm
Posts: 4
Quote:
Concretely it means that when the alias field remains empty, it is automatically filled by:


I write some Chinese letter in alias field, but it falls back to date.

Is there any particular reason that the alias has to be ASCII only? Is it because the way joomla designed or for SEO/SEF purpose?

For SEO/SEF, search engines (e.g google) support CJK URLs such as "zh.wikipedia.org/wiki/首页", I don't think tranliteration is necessary in this situation, and I would rather like to see a URL in Chinese instead of "zh.wikipedia.org/wiki/1111-22-3-4-5.html".

regards.


Top
 Profile  
 
PostPosted: Sun Oct 26, 2008 8:19 am 
User avatar
Joomla! Guru
Joomla! Guru

Joined: Thu Jun 01, 2006 1:52 pm
Posts: 979
Location: Moscow, Russia
dawnfantasy wrote:
Is there any particular reason that the alias has to be ASCII only? Is it because the way joomla designed or for SEO/SEF purpose?

IMHO it is for both reasons :)
1. Joomla! makes this string 'URLSafe'
2. It is used for SEO/SEF

In my opinion, this is a matter of compatibility. Yes, Google uses non-ASCII chars in URLs and some Browsers understand them in some cases... but even Chinese wikipedia in your example "zh.wikipedia.org/wiki/首页" converts such URLs to something like this:
"http://zh.wikipedia.org/w/index.php?title=%E9%A6%96%E9%A1%B5&variant=zh-cn".

Looking at this URL, I see that transliteration would be much nicer, than HEX codes...

_________________
Text of all my messages is available under the terms of the GNU Free Documentation License: http://www.gnu.org/copyleft/fdl.html


Top
 Profile  
 
PostPosted: Sun Oct 26, 2008 2:59 pm 
Joomla! Fledgling
Joomla! Fledgling

Joined: Tue Oct 21, 2008 12:03 pm
Posts: 4
Would be nice if users could choose the method of transformation.

1. leave it, do not change. saved as utf8 in DB.
2. change to date.
3. Use a transliteration engine.

Regards


Top
 Profile  
 
PostPosted: Thu Dec 11, 2008 5:54 pm 
User avatar
Joomla! Enthusiast
Joomla! Enthusiast

Joined: Sun Nov 27, 2005 9:57 am
Posts: 161
Location: Romania
Hi,
There are a number of php scripts that can transliterate for you.

The simplest method would be checking for iconv library and transliterating the slugs into ascii
as simple as this:

libraries/joomla/language/language.php

replace this line (around line 224)

Code:
$string = htmlentities(utf8_decode($string));

with
Code:
// TMJ MOD
      if (function_exists('iconv')) {
      $string=iconv('UTF-8','ASCII//TRANSLIT',$string);
      $string = htmlentities($string);   
      } else {
         $string = htmlentities(utf8_decode($string));
      }


this will work well on romanian, swedish, danish, german, etc.
I don't know about russian or chinese

_________________
TeachMeJoomla.net - Joomla tutorials, tips, mods, and extensions. Joomla freelance custom programming/development


Top
 Profile  
 
PostPosted: Thu Dec 11, 2008 6:50 pm 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 17321
Location: **Translation Matters**
tudorilisoi wrote:
Hi,
There are a number of php scripts that can transliterate for you.

The simplest method would be checking for iconv library and transliterating the slugs into ascii
as simple as this:

libraries/joomla/language/language.php

replace this line (around line 224)

Code:
$string = htmlentities(utf8_decode($string));

with
Code:
// TMJ MOD
      if (function_exists('iconv')) {
      $string=iconv('UTF-8','ASCII//TRANSLIT',$string);
      $string = htmlentities($string);   
      } else {
         $string = htmlentities(utf8_decode($string));
      }


this will work well on romanian, swedish, danish, german, etc.
I don't know about russian or chinese

I am going to test this right now. ;)

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Sat Dec 13, 2008 8:17 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 17321
Location: **Translation Matters**
My tests show that it solves some issues but not all.
For example:
çĆĂğøµşçıÐöüŸÏ

gives

ccagouscidquotoquotuquotyquoti

i.e. only the first 10 glyphs are correctly transliterated. Not öüŸÏ

which would mean that we get Latin 1 Basic
Latin 1 supplement
Part only of Latin-extended-A

Better than before indeed.

BTW, code should be:
Code:
function transliterate($string)
   {
      // TMJ MOD
      if (function_exists('iconv')) {
      $string=iconv('UTF-8','ASCII//TRANSLIT',$string);
      $string = htmlentities($string);   
      } else {
         $string = htmlentities(utf8_decode($string));
     
      $string = preg_replace(
         array('/ß/','/&(..)lig;/', '/&([aouAOU])uml;/','/&(.)[^;]*;/'),
         array('ss',"$1","$1".'e',"$1"),
         $string);
      }
      return $string;
   }

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Mon Dec 15, 2008 11:30 pm 
User avatar
Joomla! Enthusiast
Joomla! Enthusiast

Joined: Sun Nov 27, 2005 9:57 am
Posts: 161
Location: Romania
Iconv is locale dependent
So when using the correct language and its particular accented/special characters, it will work well
If you set Romanian as the language and write some turkish titles, it will not work as expected
http://taschenorakel.de/mathias/2007/11 ... terations/

_________________
TeachMeJoomla.net - Joomla tutorials, tips, mods, and extensions. Joomla freelance custom programming/development


Top
 Profile  
 
PostPosted: Mon Dec 15, 2008 11:36 pm 
User avatar
Joomla! Enthusiast
Joomla! Enthusiast

Joined: Sun Nov 27, 2005 9:57 am
Posts: 161
Location: Romania
Also, the strings should not be html encoded. rawurlencode after transliteration would be better and would retain the special characters. I did not test it, but I'm sure it's the best way to go

_________________
TeachMeJoomla.net - Joomla tutorials, tips, mods, and extensions. Joomla freelance custom programming/development


Top
 Profile  
 
PostPosted: Tue Dec 16, 2008 5:09 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 17321
Location: **Translation Matters**
rawurlencode has unwanted results in the sense that its transliteration can make an url so long and so non-user friendly that one would not be able to transmit it by other means than electronic devices.

Example:
Code:
http://el.wikipedia.org/wiki/%CE%9C%CE%B5%CE%B3%CE%AC%CE%BB%CE%B7_%CE%AD%CE%BA%CF%81%CE%B7%CE%BE%CE%B7

is in Greek
Quote:
http://el.wikipedia.org/wiki/Μεγάλη έκρηξη


Concerning the locale, I have tested by using the French language packs for 1.5 which locale take care of the glyph " ë "

I am still getting in the alias
" quote " which, as we know by the sample posted above is composed of " quot " and " e "

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Tue Dec 16, 2008 6:56 am 
User avatar
Joomla! Guru
Joomla! Guru

Joined: Thu Jun 01, 2006 1:52 pm
Posts: 979
Location: Moscow, Russia
BTW, I published v.1.1 of yvTransliterate plugin, that uses different translitaration tables (stored within XML file) for each (source) Language. So one may use transliteration, that is "standardized" (ISO...) or any custom...

In v.1.1:
Added another 'hook' into Joomla! core: parameter 'Extend JLanguage class'. Now yvTransliterate may transliterate not only aliases of Articles (or comments in a case of yvComment), but aliases of other elements of Joomla! site interface also: menu items, sections, categories... in fact, yvTransliterate works in every place, where Joomla! core called JLanguage::transliterate method.
Please note, that in this case yvTransliterate uses Language of current user as source Language for transliteration. So, for example, if you want Section alias to be transliterated according to Russian transliteration table, you have to log in to Administrator site (backend) in Russian language.
This feature works MUCH more effective under PHP5 (it creates proxy to the JLanguage object instead of creating (and populating...) second instance of JLanguage class.

_________________
Text of all my messages is available under the terms of the GNU Free Documentation License: http://www.gnu.org/copyleft/fdl.html


Top
 Profile  
 
PostPosted: Fri Oct 02, 2009 9:45 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 17321
Location: **Translation Matters**
For those interested, I have made a unicode slugs system plugin which works OK in 1.5

See : http://info-graf.fr/infografcvs/Des-url ... C3%A9.html
(Alas this forum does not let it show as it could, i.e. info-graf.fr/infografcvs/Des-urls-de-toute-beauté.html)

FYI, core customizable transliteration has been added yesterday in 1.6 SVN 12997 (thanks Ercan ! )

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Sun Nov 15, 2009 9:57 am 
User avatar
Joomla! Enthusiast
Joomla! Enthusiast

Joined: Sun Nov 27, 2005 9:57 am
Posts: 161
Location: Romania
Hi, it's been a while ;)
I stumbled upon a great ASCIIfier, build for performance
It has an example covering accented characters in 130+ languages, including Greek , Hindi, Taiwanese, Chinese.
All transliterate nicely to their ASCII counterparts
There are no PHP locale or extension dependencies.
Looks like the holy Grail to me, as long as you take care to strip evil MS Word 3 byte illegal characters (such as the 0x96 long dash) before transliterating(otherwise the converting fails and throws an error).
http://sourceforge.net/projects/phputf8/files/utf8_to_ascii/

_________________
TeachMeJoomla.net - Joomla tutorials, tips, mods, and extensions. Joomla freelance custom programming/development


Top
 Profile  
 
PostPosted: Mon Dec 14, 2009 2:52 am 
User avatar
Joomla! Explorer
Joomla! Explorer

Joined: Sun Feb 08, 2009 5:10 pm
Posts: 275
If I am not mistaken, this Alias/Transliteration issue is already fixed in SEF components like SH404SEF. However, it will be great to have it in the Joomla Core.

Many thanks for all your efforts.

_________________
- http://www.divulgaterium.com: Divulgaterium - Free Business Website Directory
- http://www.egliseprimitive.org: Christian Website


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 21 posts ] 



Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group