I was looking for a good character transcoding for supporting international URIs in Community Builder: e.g.
index.php?....&user=gabrièlle, as international URIs are in UTF-8 always...
- looking at Joomla! 1.1 svn, I also saw that all variable codings have now been hardcoded to UTF-8.

- looking at Joomla! 1.0.3, I saw that the transcoding was not really done
- looking at PHP manual, saw that the nice character transcoding library is NOT a standard install, and that support (e.g. in html_entity_decode) is starting with PHP 4.3, and for Multi-Byte-Character-Sets (MBCS) only in PHP 5.0.
I'm wondering if core mambo (and 3PD) UTF-8 support is still backwards compatible to PHP 4.1's prerequisite, or if PHP 4.3 prerequisite will not be implied by the switch to UTF-8...
Here a small piece of library code which took me hours to put together, which may save time to others to try to get php 4.1 - 5.0 compatibility for this stored using the internal encoding: $username = utf8ToISO($_REQUEST['user']);
Code:
function unhtmlentities ($string, $quotes, $charset) {
if ((phpversion() < '5.0.0') && ((phpversion() < '4.3.0') || !((strncmp($charset,"ISO-8859",8)==0) || ereg("125",$charset)))) {
// For 4.1.0 =< PHP < 4.3.0 use this function instead of html_entity_decode: also php < 5.0 does not support UTF-8 outputs !
$trans_tbl = get_html_translation_table (HTML_ENTITIES);
$trans_tbl = array_flip ($trans_tbl);
return strtr ($string, $trans_tbl);
} else {
return html_entity_decode ($string, $quotes, $charset);
}
}
function utf8ToISO ($string) {
$iso = str_replace(array("charset="), array(""),_ISO);
if ($iso == "UTF-8") {
return $string;
} else if (strncmp($iso,"ISO-8859-1",9)==0) {
return utf8_decode($string);
} else {
return unhtmlentities(htmlentities($string,ENT_NOQUOTES,"UTF-8"),ENT_NOQUOTES,$iso);
}
}
Hope this can save some time to others.
In my quest to transcribing character sets, I've come to same conclusion as core team: Unless all is UTF-8, it's a headache which can't be correctly resolved.
I've made following assumption in Joomla! 1.x regarding character sets:
1) database storage and html output do match the same encoding, which is also used internally in php.
2) multilanguage website should have the same "_ISO" encoding into all their language files to work correctly and respect rule above.
Questions:
a) Are my assumptions right ?
b) if no, where is the corresponding transcription code in Joomla! ?
c) is there a way in the $database class to see in which encoding the database is stored ?
d) will php 4.3 be a prerequisite for Joomla! 1.1 or will it stay with php 4.1 and up ?