The Joomla! Forum ™






Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Fri May 18, 2007 5:47 pm 
User avatar
Joomla! Apprentice
Joomla! Apprentice

Joined: Mon Nov 20, 2006 12:41 pm
Posts: 39
Location: London
Many places has this problem, mostly in FireFox; you can see this while using any UTF-8 in the search results or any other engine that uses some "substr"s...

Here is a practical short solution for the search engine (joomla's) problem, which can be generalized easily to other similar problems:

file: include/joomla.php

function mosSmartSubstr

Change to:
Code:
function mosSmartSubstr($text, $length=200, $searchword) {
  /* $wordpos = strpos(strtolower($text), strtolower($searchword));
  $halfside = intval($wordpos - $length/2 - strlen($searchword));
  if ($wordpos && $halfside > 0) {
    return '...' . substr($text, $halfside, $length) . '...';
  } else {
    return substr( $text, 0, $length);
  } */
  return UTFSmartSubstr($text,$length,$searchword);
}

// This is Peleg's edition!!!
function UTFSmartSubstr($text, $length=200, $searchword) {
  $wordpos = strpos(strtolower($text), strtolower($searchword));
  $halfside = intval($wordpos - $length/2 - strlen($searchword));
  if ($wordpos && $halfside > 0) {
       $output= substr($text, $halfside, $length);
       $op_arr = explode(" ",$output);
       array_shift($op_arr );
       array_pop($op_arr );
     $output = '...' . implode(" ",$op_arr) . '...';
  } else {
       $output= substr( $text, 0, $length);
       $output= substr($text, 0, $length);
       $op_arr = explode(" ",$output);
       array_pop($op_arr );
     $output = implode(" ",$op_arr) . '...';
  }
  return $output;
}


Hope it helps.

Cheers,
Peleg.

_________________
No rain, no rainbows.
http://www.freeall.org - sorry - still only in Hebrew...


Top
 Profile  
 
PostPosted: Fri May 18, 2007 8:53 pm 
I've been banned!

Joined: Thu Aug 25, 2005 2:33 pm
Posts: 1868
Pretty cool.


Top
 Profile  
 
PostPosted: Sat May 19, 2007 4:48 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 16631
Location: **Translation Matters**
Forwarding to Q&T

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Thu May 24, 2007 6:58 pm 
User avatar
Joomla! Apprentice
Joomla! Apprentice

Joined: Mon Nov 20, 2006 12:41 pm
Posts: 39
Location: London
This is for the UTF-8 guys who use the SMF-bridge.

In some places, SMF decides to shorten the subject of a post, and as usual, we get the annoying �...

SMF declares that it can manage it (http://support.simplemachines.org/funct ... nction=284) - but it doesn't.

So here is my solution:

This part is optional:
Updating:
/Sources/Recent.php - lines 74, 130 - 24 to 48 (the length, due to UTF-8)

/Sources/BoardIndex.php - line 220 - 24 to 48

/Sources/MessageIndex.php - lines 184, 268 - 24 to 48

/Themes/default/MessageIndex.template.php - line 769 - 24 to 48

SSI.php - lines 289, 391 - 25 to 50

---

This part was due to the TOO-SHORT subjects that happened because of the fact that UTF-8 chars in some languages count as 2 chars.

--

This part is a must:
/Sources/Subs.php - line 911.

Instead of:
Code:
return $func['substr']($subject, 0, $len) . '...';


Put:
Code:
// return $func['substr']($subject, 0, $len) . '...';
// Peleg's addition
$output = $func['substr']($subject, 0, $len);
$op_arr = explode(" ",$output);
array_pop($op_arr );
$output = implode(" ",$op_arr) . '...';
return $output;


Hope it helps.

Cheers,
Peleg.

_________________
No rain, no rainbows.
http://www.freeall.org - sorry - still only in Hebrew...


Top
 Profile  
 
PostPosted: Sun Sep 23, 2007 9:07 am 
Joomla! Fledgling
Joomla! Fledgling

Joined: Sun Sep 23, 2007 6:42 am
Posts: 1
Thanks for the code, it works no out of the box.  :D


Top
 Profile  
 
PostPosted: Thu Dec 06, 2007 5:55 pm 
User avatar
Joomla! Apprentice
Joomla! Apprentice

Joined: Mon Nov 20, 2006 12:41 pm
Posts: 39
Location: London
One more solution, this time it's to "smf_recenttopics" of SMF bridge:

We shall start by adding the following function:

Code:
function freeSubstr( $subject, $start, $len ) {
  if (strlen($subject) <= $len)
    return $subject;

  $output = substr($subject, $start, $len);
  $op_arr = explode(" ",$output);
  array_pop($op_arr );
  $output = implode(" ",$op_arr) . '...';
  return $output;
}


in the beginning of the module file.

After that, in the following lines we will make the following chages:

In about line 275, replace the
Code:
‘short_subject’ => ...


with:
Code:
'short_subject' => mb_strlen(un_htmlspecialchars($row['subject']),'UTF-8') > 25 ? htmlspecialchars(utfSubstr(un_htmlspecialchars($row['subject']), 0, 48)) : $row['subject'],


And in about line 573, replace the
Code:
echo substr(...


with:
Code:
echo freeSubstr(html_entity_decode($post['subject']), 0, $int_num_char*3);


It should work now.

Good luck!

Peleg.

_________________
No rain, no rainbows.
http://www.freeall.org - sorry - still only in Hebrew...


Top
 Profile  
 
PostPosted: Wed Dec 12, 2007 9:37 am 
User avatar
Joomla! Intern
Joomla! Intern

Joined: Fri Nov 03, 2006 7:39 pm
Posts: 57
Location: Thessaloniki, Greece
Here's a solution that respects the number of characters that ought to be desplayed after a search:

Copy the following functions at the end of your index.php ( just before doGzip(); ):

Code:
function substr_utf8( $string, $start, $length=0 ) {
   $s = preg_replace("/\\&\\#([0-9]{3,10})\\;/e", '_html_to_utf8("\\1")', $string);
   $pattern =
     '/^(
        [\x09\x0A\x0D\x20-\x7E]            # ASCII
      | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
      |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
      | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
      |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
      |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
      | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
      |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
     )/x';
   while ( $start-- ) { getChr( $pattern, $s ); }
   if ( $length ) {
      $ret = '';
      while ( $length-- ) { $ret .= getChr( $pattern, $s ); }
      return $ret;
   }
   else { return $s; }
}

function getChr( $pattern, &$s ) {
   if ( preg_match( $pattern, $s, $match ) ) {
      $ret = $match[0];
      $mfr = $match[0] == '/' ? '~' : '/';
      $p = $mfr . "^" . quotemeta( $match[0] ) . $mfr;
      $s = preg_replace( $p, '', $s );
   }
   else {
      $ret = substr( $s, 0, 1 );
      $s = substr( $s, 1 );
   }
   return $ret;
}

// Taken from http://www.php.net/manual/en/function.utf8-encode.php#75942
function _html_to_utf8 ($data) {
    if ($data > 127)
        {
        $i = 5;
        while (($i--) > 0)
            {
            if ($data != ($a = $data % ($p = pow(64, $i))))
                {
                $ret = chr(base_convert(str_pad(str_repeat(1, $i + 1), 8, "0"), 2, 10) + (($data - $a) / $p));
                for ($i; $i > 0; $i--)
                    $ret .= chr(128 + ((($data % pow(64, $i)) - ($data % ($p = pow(64, $i - 1)))) / $p));
                break;
                }
            }
        }
        else
        $ret = "&#$data;";
    return $ret;
}


Go to lines 5721 and 5723 in includes/joomla.php and line 79 of file components/com_search/search.php and change "substr" to "substr_utf8" (without the quotes).


Top
 Profile  
 
PostPosted: Wed Dec 12, 2007 7:33 pm 
User avatar
Joomla! Intern
Joomla! Intern

Joined: Fri Nov 03, 2006 7:39 pm
Posts: 57
Location: Thessaloniki, Greece
Just a small improvement to the code offered above to avoid the possibility of having com_search quote extracts without any matched words.

Code:
function substr_utf8( $string, $start, $length=0 ) {
   if ( $start ) {
      $first = substr( $string, 0, $start );
      $orig_length = strlen( $first );
      $first = preg_replace("/\\&\\#([0-9]{3,10})\\;/e", '_html_to_utf8("\\1")', $first);
      $post_length = strlen( $first );
      $start = $start + $post_length - $orig_length;
   }
   $s = preg_replace("/\\&\\#([0-9]{3,10})\\;/e", '_html_to_utf8("\\1")', $string);
   $pattern =
     '/^(
        [\x09\x0A\x0D\x20-\x7E]            # ASCII
      | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
      |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
      | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
      |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
      |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
      | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
      |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
     )/x';
   while ( $start and !preg_match( $pattern, substr( $s, $start) ) ) { $start--; }
   $s = substr( $s, $start );
   if ( $length ) {
      $ret = '';
      while ( $length-- ) { $ret .= getChr( $pattern, $s ); }
      return $ret;
   }
   else { return $s; }
}

function getChr( $pattern, &$s ) {
   if ( preg_match( $pattern, $s, $match ) ) {
      $ret = $match[0];
      $mfr = $match[0] == '/' ? '~' : '/';
      $p = $mfr . "^" . quotemeta( $match[0] ) . $mfr;
      $s = preg_replace( $p, '', $s );
   }
   else {
      $ret = substr( $s, 0, 1 );
      $s = substr( $s, 1 );
   }
   return $ret;
}

// Taken from http://www.php.net/manual/en/function.utf8-encode.php#75942
function _html_to_utf8 ($data) {
    if ($data > 127)
        {
        $i = 5;
        while (($i--) > 0)
            {
            if ($data != ($a = $data % ($p = pow(64, $i))))
                {
                $ret = chr(base_convert(str_pad(str_repeat(1, $i + 1), 8, "0"), 2, 10) + (($data - $a) / $p));
                for ($i; $i > 0; $i--)
                    $ret .= chr(128 + ((($data % pow(64, $i)) - ($data % ($p = pow(64, $i - 1)))) / $p));
                break;
                }
            }
        }
        else
        $ret = "&#$data;";
    return $ret;
}


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 



Who is online

Users browsing this forum: No registered users and 405 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group