•Language slugs transliteration

Locked
User avatar
infograf768
Joomla! Master
Joomla! Master
Posts: 19133
Joined: Fri Aug 12, 2005 3:47 pm
Location: **Translation Matters**

•Language slugs transliteration

Post by infograf768 » Tue Mar 25, 2008 9:00 am

Present status.
Transliteration is implemented in 1.5.x for latin languages only.

Concretely it means that when the alias field remains empty, it is automatically filled by:
1. Latin languages
The ANSI equivalent of accented letters i.e. non-accented letters.
à ->a
ë -> e
Ŝ -> s
etc.
2. Non latin languages (Greek, Cyrillic, Chinese, Arabic scripts, etc.
The date when the article is saved.

Proposal
To implement in core a way to get transliteration applied to any language that would provide the necessary ini file.

At first sight, it could look easy in the sense that a parameter could decide to use or not an available ini, based on language defined for site or admin.
At second sight, not so easy as any unicode glyphs can be used in 1.5.x, whatever the choice of languages for front or back-end.

Yvolk proposed, a year ago, a solution which was not followed up:
http://forum.joomla.org/viewtopic.php?f ... 90#p749972

Could anyone have a look and consider its feasability for 1.6?
Jean-Marie Simonet / infograf
---------------------------------
ex-Joomla Translation Coordination Team • ex-Joomla! Production Working Group

User avatar
newart
Joomla! Virtuoso
Joomla! Virtuoso
Posts: 3177
Joined: Fri Sep 02, 2005 10:06 am
Location: Solar system - Earth - European Union

Re: •Language slugs transliteration

Post by newart » Tue Mar 25, 2008 10:31 am

very interesting your thread about a more international language support... I think this problem can be a big issue when you can have an URL directly written in russian / chinese and so on...
former Q&T WorkGroup Joomla member - Italian Translation Team Member

AmyStephen
Joomla! Champion
Joomla! Champion
Posts: 7018
Joined: Wed Nov 22, 2006 3:35 pm
Location: Nebraska
Contact:

Re: •Language slugs transliteration

Post by AmyStephen » Tue Mar 25, 2008 10:02 pm

I don't know what all this involves and how a solution would be designed, but I would welcome this improvement.

User avatar
yvolk
Joomla! Guru
Joomla! Guru
Posts: 979
Joined: Thu Jun 01, 2006 1:52 pm
Location: Moscow, Russia
Contact:

Re: Language slugs transliteration

Post by yvolk » Thu Mar 27, 2008 3:52 pm

Thank you, infograf768, for relaunching this topic 8)
I'm still ready to help, if I'll be needed.

User avatar
yvolk
Joomla! Guru
Joomla! Guru
Posts: 979
Joined: Thu Jun 01, 2006 1:52 pm
Location: Moscow, Russia
Contact:

Re: Language slugs transliteration

Post by yvolk » Thu Jul 17, 2008 10:20 am

Hi, I have good news!
I've moved this last year's work to the working multilingual plugin, yvTransliterate.
The plugin works, although its integration with Joomla! is not smart enough, and this may
give new impulse for Joomla! core team to improve multilingual support of Joomla!

See more information in the thread.

AmyStephen
Joomla! Champion
Joomla! Champion
Posts: 7018
Joined: Wed Nov 22, 2006 3:35 pm
Location: Nebraska
Contact:

Re: •Language slugs transliteration

Post by AmyStephen » Thu Jul 17, 2008 11:21 am

Free software communities are amazing creatures. When barriers are lowered so that anyone can participate, something very cool happens. People pick up problems to solve that they find interesting. Combined, as we share the results of our work, we end up with far more than we could begin to accomplish on individually.

Currently, there are eleven members of the Joomla! core team. Recently, we welcomed our 200,000th forum member and someone downloaded the 5,000,000th copy of Joomla!. Not to discredit the considerable efforts of those who are and who have served on the core team, but, if anyone is waiting on *them* to improve anything without significant help from this great community, those people will be waiting a very, very long time.

Thanks, Yuri, for selecting another problem that you find interesting to solve. I am confident you will solve it. I hope you have fun in the process and that you learn cool things. It would be awesome if you improved this area of study in such a way that other free software communities learned from your improvements.

Amy :)

User avatar
yvolk
Joomla! Guru
Joomla! Guru
Posts: 979
Joined: Thu Jun 01, 2006 1:52 pm
Location: Moscow, Russia
Contact:

Re: •Language slugs transliteration

Post by yvolk » Thu Jul 17, 2008 12:14 pm

AmyStephen wrote:Thanks, Yuri ... I hope you have fun in the process...
Yeah, that was really fun game. I went to sleep at 3AM last night, that is VERY unusial for me :)

AmyStephen
Joomla! Champion
Joomla! Champion
Posts: 7018
Joined: Wed Nov 22, 2006 3:35 pm
Location: Nebraska
Contact:

Re: •Language slugs transliteration

Post by AmyStephen » Thu Jul 17, 2008 12:51 pm

That is so cool when a project is that interesting that we sacrifice sleep! There have been a number of times I worked all through the night to get something done. Talk about feeling alive! Being able to accomplish something that is important to me is fulfilling. Anyway, congratulations and good luck! I hope it continues to be fun.

Amy :)

dawnfantasy
Joomla! Fledgling
Joomla! Fledgling
Posts: 4
Joined: Tue Oct 21, 2008 12:03 pm

Re: •Language slugs transliteration

Post by dawnfantasy » Tue Oct 21, 2008 12:56 pm

Concretely it means that when the alias field remains empty, it is automatically filled by:
I write some Chinese letter in alias field, but it falls back to date.

Is there any particular reason that the alias has to be ASCII only? Is it because the way joomla designed or for SEO/SEF purpose?

For SEO/SEF, search engines (e.g google) support CJK URLs such as "zh.wikipedia.org/wiki/首页", I don't think tranliteration is necessary in this situation, and I would rather like to see a URL in Chinese instead of "zh.wikipedia.org/wiki/1111-22-3-4-5.html".

regards.

User avatar
yvolk
Joomla! Guru
Joomla! Guru
Posts: 979
Joined: Thu Jun 01, 2006 1:52 pm
Location: Moscow, Russia
Contact:

Re: •Language slugs transliteration

Post by yvolk » Sun Oct 26, 2008 8:19 am

dawnfantasy wrote:Is there any particular reason that the alias has to be ASCII only? Is it because the way joomla designed or for SEO/SEF purpose?
IMHO it is for both reasons :)
1. Joomla! makes this string 'URLSafe'
2. It is used for SEO/SEF

In my opinion, this is a matter of compatibility. Yes, Google uses non-ASCII chars in URLs and some Browsers understand them in some cases... but even Chinese wikipedia in your example "zh.wikipedia.org/wiki/首页" converts such URLs to something like this:
"http://zh.wikipedia.org/w/index.php?tit ... iant=zh-cn".

Looking at this URL, I see that transliteration would be much nicer, than HEX codes...
Text of all my messages is available under the terms of the GNU Free Documentation License: http://www.gnu.org/copyleft/fdl.html

dawnfantasy
Joomla! Fledgling
Joomla! Fledgling
Posts: 4
Joined: Tue Oct 21, 2008 12:03 pm

Re: •Language slugs transliteration

Post by dawnfantasy » Sun Oct 26, 2008 2:59 pm

Would be nice if users could choose the method of transformation.

1. leave it, do not change. saved as utf8 in DB.
2. change to date.
3. Use a transliteration engine.

Regards

User avatar
tudorilisoi
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 161
Joined: Sun Nov 27, 2005 9:57 am
Location: Romania
Contact:

Re: •Language slugs transliteration

Post by tudorilisoi » Thu Dec 11, 2008 5:54 pm

Hi,
There are a number of php scripts that can transliterate for you.

The simplest method would be checking for iconv library and transliterating the slugs into ascii
as simple as this:

libraries/joomla/language/language.php

replace this line (around line 224)

Code: Select all

$string = htmlentities(utf8_decode($string));
with

Code: Select all

// TMJ MOD
		if (function_exists('iconv')) {
		$string=iconv('UTF-8','ASCII//TRANSLIT',$string);
		$string = htmlentities($string);	
		} else {
			$string = htmlentities(utf8_decode($string));
		}
this will work well on romanian, swedish, danish, german, etc.
I don't know about russian or chinese
TeachMeJoomla.net - Joomla tutorials, tips, mods, and extensions. Joomla freelance custom programming/development

User avatar
infograf768
Joomla! Master
Joomla! Master
Posts: 19133
Joined: Fri Aug 12, 2005 3:47 pm
Location: **Translation Matters**

Re: •Language slugs transliteration

Post by infograf768 » Thu Dec 11, 2008 6:50 pm

tudorilisoi wrote:Hi,
There are a number of php scripts that can transliterate for you.

The simplest method would be checking for iconv library and transliterating the slugs into ascii
as simple as this:

libraries/joomla/language/language.php

replace this line (around line 224)

Code: Select all

$string = htmlentities(utf8_decode($string));
with

Code: Select all

// TMJ MOD
		if (function_exists('iconv')) {
		$string=iconv('UTF-8','ASCII//TRANSLIT',$string);
		$string = htmlentities($string);	
		} else {
			$string = htmlentities(utf8_decode($string));
		}
this will work well on romanian, swedish, danish, german, etc.
I don't know about russian or chinese
I am going to test this right now. ;)
Jean-Marie Simonet / infograf
---------------------------------
ex-Joomla Translation Coordination Team • ex-Joomla! Production Working Group

User avatar
infograf768
Joomla! Master
Joomla! Master
Posts: 19133
Joined: Fri Aug 12, 2005 3:47 pm
Location: **Translation Matters**

Re: •Language slugs transliteration

Post by infograf768 » Sat Dec 13, 2008 8:17 am

My tests show that it solves some issues but not all.
For example:
çĆĂğøµşçıÐöüŸÏ

gives

ccagouscidquotoquotuquotyquoti

i.e. only the first 10 glyphs are correctly transliterated. Not öüŸÏ

which would mean that we get Latin 1 Basic
Latin 1 supplement
Part only of Latin-extended-A

Better than before indeed.

BTW, code should be:

Code: Select all

function transliterate($string)
	{
		// TMJ MOD
      if (function_exists('iconv')) {
      $string=iconv('UTF-8','ASCII//TRANSLIT',$string);
      $string = htmlentities($string);   
      } else {
         $string = htmlentities(utf8_decode($string));
      
		$string = preg_replace(
			array('/ß/','/&(..)lig;/', '/&([aouAOU])uml;/','/&(.)[^;]*;/'),
			array('ss',"$1","$1".'e',"$1"),
			$string);
		}
		return $string;
	}
Jean-Marie Simonet / infograf
---------------------------------
ex-Joomla Translation Coordination Team • ex-Joomla! Production Working Group

User avatar
tudorilisoi
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 161
Joined: Sun Nov 27, 2005 9:57 am
Location: Romania
Contact:

Re: •Language slugs transliteration

Post by tudorilisoi » Mon Dec 15, 2008 11:30 pm

Iconv is locale dependent
So when using the correct language and its particular accented/special characters, it will work well
If you set Romanian as the language and write some turkish titles, it will not work as expected
http://taschenorakel.de/mathias/2007/11 ... terations/
TeachMeJoomla.net - Joomla tutorials, tips, mods, and extensions. Joomla freelance custom programming/development

User avatar
tudorilisoi
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 161
Joined: Sun Nov 27, 2005 9:57 am
Location: Romania
Contact:

Re: •Language slugs transliteration

Post by tudorilisoi » Mon Dec 15, 2008 11:36 pm

Also, the strings should not be html encoded. rawurlencode after transliteration would be better and would retain the special characters. I did not test it, but I'm sure it's the best way to go
TeachMeJoomla.net - Joomla tutorials, tips, mods, and extensions. Joomla freelance custom programming/development

User avatar
infograf768
Joomla! Master
Joomla! Master
Posts: 19133
Joined: Fri Aug 12, 2005 3:47 pm
Location: **Translation Matters**

Re: •Language slugs transliteration

Post by infograf768 » Tue Dec 16, 2008 5:09 am

rawurlencode has unwanted results in the sense that its transliteration can make an url so long and so non-user friendly that one would not be able to transmit it by other means than electronic devices.

Example:

Code: Select all

http://el.wikipedia.org/wiki/%CE%9C%CE%B5%CE%B3%CE%AC%CE%BB%CE%B7_%CE%AD%CE%BA%CF%81%CE%B7%CE%BE%CE%B7
is in Greek
Concerning the locale, I have tested by using the French language packs for 1.5 which locale take care of the glyph " ë "

I am still getting in the alias
" quote " which, as we know by the sample posted above is composed of " quot " and " e "
Jean-Marie Simonet / infograf
---------------------------------
ex-Joomla Translation Coordination Team • ex-Joomla! Production Working Group

User avatar
yvolk
Joomla! Guru
Joomla! Guru
Posts: 979
Joined: Thu Jun 01, 2006 1:52 pm
Location: Moscow, Russia
Contact:

Re: •Language slugs transliteration

Post by yvolk » Tue Dec 16, 2008 6:56 am

BTW, I published v.1.1 of yvTransliterate plugin, that uses different translitaration tables (stored within XML file) for each (source) Language. So one may use transliteration, that is "standardized" (ISO...) or any custom...

In v.1.1:
Added another 'hook' into Joomla! core: parameter 'Extend JLanguage class'. Now yvTransliterate may transliterate not only aliases of Articles (or comments in a case of yvComment), but aliases of other elements of Joomla! site interface also: menu items, sections, categories... in fact, yvTransliterate works in every place, where Joomla! core called JLanguage::transliterate method.
Please note, that in this case yvTransliterate uses Language of current user as source Language for transliteration. So, for example, if you want Section alias to be transliterated according to Russian transliteration table, you have to log in to Administrator site (backend) in Russian language.
This feature works MUCH more effective under PHP5 (it creates proxy to the JLanguage object instead of creating (and populating...) second instance of JLanguage class.
Text of all my messages is available under the terms of the GNU Free Documentation License: http://www.gnu.org/copyleft/fdl.html

User avatar
infograf768
Joomla! Master
Joomla! Master
Posts: 19133
Joined: Fri Aug 12, 2005 3:47 pm
Location: **Translation Matters**

Re: •Language slugs transliteration

Post by infograf768 » Fri Oct 02, 2009 9:45 am

For those interested, I have made a unicode slugs system plugin which works OK in 1.5

See : http://info-graf.fr/infografcvs/Des-url ... C3%A9.html
(Alas this forum does not let it show as it could, i.e. info-graf.fr/infografcvs/Des-urls-de-toute-beauté.html)

FYI, core customizable transliteration has been added yesterday in 1.6 SVN 12997 (thanks Ercan ! )
Jean-Marie Simonet / infograf
---------------------------------
ex-Joomla Translation Coordination Team • ex-Joomla! Production Working Group

User avatar
tudorilisoi
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 161
Joined: Sun Nov 27, 2005 9:57 am
Location: Romania
Contact:

Re: •Language slugs transliteration

Post by tudorilisoi » Sun Nov 15, 2009 9:57 am

Hi, it's been a while ;)
I stumbled upon a great ASCIIfier, build for performance
It has an example covering accented characters in 130+ languages, including Greek , Hindi, Taiwanese, Chinese.
All transliterate nicely to their ASCII counterparts
There are no PHP locale or extension dependencies.
Looks like the holy Grail to me, as long as you take care to strip evil MS Word 3 byte illegal characters (such as the 0x96 long dash) before transliterating(otherwise the converting fails and throws an error).
http://sourceforge.net/projects/phputf8 ... _to_ascii/
TeachMeJoomla.net - Joomla tutorials, tips, mods, and extensions. Joomla freelance custom programming/development

User avatar
joomfriend
Joomla! Explorer
Joomla! Explorer
Posts: 284
Joined: Sun Feb 08, 2009 5:10 pm
Contact:

Re: •Language slugs transliteration

Post by joomfriend » Mon Dec 14, 2009 2:52 am

If I am not mistaken, this Alias/Transliteration issue is already fixed in SEF components like SH404SEF. However, it will be great to have it in the Joomla Core.

Many thanks for all your efforts.
- https://www.adelnipet.com: Adelni Pet - Your Social Pet Network
- https://www.egliseprimitive.org: Christian Website


Locked

Return to “Under Review - Archived”