The Joomla! Forum ™






Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Tue May 17, 2011 7:34 am 
Joomla! Apprentice
Joomla! Apprentice

Joined: Thu May 12, 2011 1:51 am
Posts: 6
The default core Joomla search module doesn't acknowledge 2 byte spaces. Meaning, when searching for two or more words in Japanese (or Korean or Chinese), the spaces in between the words should be in 1 byte characters or it will not return search results.

What should I do to accept 2 byte spaces?

Hope you can help me with this one.

Thanks in advance :)


Top
 Profile  
 
PostPosted: Mon May 23, 2011 1:44 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 16668
Location: **Translation Matters**
That is an interesting question indeed.

Looks like it would require changes in the search function.
Can't you at all enter single byte spaces in Japanese?

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Wed May 25, 2011 3:42 am 
Joomla! Apprentice
Joomla! Apprentice

Joined: Thu May 12, 2011 1:51 am
Posts: 6
is there any plug-in for this? so that the search engine can accept two byte character?

I've searched but found nothing :(


Top
 Profile  
 
PostPosted: Wed May 25, 2011 11:26 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 16668
Location: **Translation Matters**
This requires changes in J core files.
Could you please answer to this simple question?

As of today, the single byte space is used as a Separator between words to search for.

Do you need to INCLUDE ALSO 2 bytes spaces IN the search, or would it be OK for you if any 2 byte space character was changed to a single byte one BEFORE the actual search is done, so as to get the right results for the actual words entered?

Thus my question above: can you or NOT enter single byte space character on your PC?

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Wed May 25, 2011 12:11 pm 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 16668
Location: **Translation Matters**
Let me rephrase this:
basically the question is:

do we want in Japanese to get the results:

wordA(space)wordB
return results
wordA(space)wordB

or any occurence of
wordA
AND
wordB

if you folks can enter a single byte space, then you can choose easily to enter:

wordA(2bytes_space)wordB

which would return any occurence of

wordA(2bytes_space)wordB

OR

wordA(1byte_space)wordB

which would return any occurence of wordA AND wordB

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Fri May 27, 2011 11:00 am 
Joomla! Intern
Joomla! Intern

Joined: Tue Jul 08, 2008 3:17 am
Posts: 56
infograf768,

In Japanese, we can produce 2byte and 1byte spaces, however, it would be more convenient to use 2byte spaces in between 2byte character words, which is done very commonly.

Meaning, 2byte spaces should be automatically converted or acknowledged as single byte spaces. This function I believe should be added.


Top
 Profile  
 
PostPosted: Fri May 27, 2011 12:39 pm 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 16668
Location: **Translation Matters**
If you ALWAYS prefer wordA AND wordB, the patch can be done in trunk and released for 1.7
It's just a matter of replacing in the search any occurrences of the double byte space by a single byte one.

I am not sure we will do that for 1.5 at this stage.

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Fri May 27, 2011 2:23 pm 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 16668
Location: **Translation Matters**
Could you test by replacing plugins/search/content.php by this attached file?
Attachment:
content.php.txt


(Delete the .txt suffix to use)

What this does is to replace the double byte space by a single byte one if the search is parametered to All words or Any word. In case the search is for the phrase, it lets it alone.


You do not have the required permissions to view the files attached to this post.

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Fri May 27, 2011 3:46 pm 
Joomla! Intern
Joomla! Intern

Joined: Tue Jul 08, 2008 3:17 am
Posts: 56
infograf768,

Unfortunately, that doesn't seem to work.

I've checked on the net and got some tips.

//replace double byte whitespaces by single byte (Far-East languages)

$str = preg_replace('/\xE3\x80\x80/', ' ', $string);

This seem to be what should be edited for Sphinx search component. Unfortunately, I don't have any intentions of using Sphinx and I don't know much about php so couldn't find a way to implement this.

I hope this tip would help.


Top
 Profile  
 
PostPosted: Fri May 27, 2011 8:22 pm 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 16668
Location: **Translation Matters**
That code is exactly the code I added in that file.

Code:
case 'all':
      case 'any':
      default:
         $text = preg_replace('/\xE3\x80\x80/', ' ', $text);
         $words = explode( ' ', $text );


try

Code:
case 'all':
      case 'any':
      default:
         $txt = preg_replace('/\xE3\x80\x80/', ' ', $text);
         $words = explode( ' ', $txt );

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Sat May 28, 2011 1:28 am 
Joomla! Intern
Joomla! Intern

Joined: Tue Jul 08, 2008 3:17 am
Posts: 56
Hello,

Unfortunately, it didn't work again.

Found another tip;

$a = strtolower(preg_replace(“/[ ]+/”, ” “, $a));
$a = preg_replace(“/\xe3\x80\x80/”, ” “, $a);


Top
 Profile  
 
PostPosted: Sun May 29, 2011 4:57 pm 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 16668
Location: **Translation Matters**
sushismb wrote:
Hello,

Unfortunately, it didn't work again.

Found another tip;

$a = strtolower(preg_replace(“/[ ]+/”, ” “, $a));
$a = preg_replace(“/\xe3\x80\x80/”, ” “, $a);


The original code is right. Unhappily I have difficulties to get double bytes spaces here to test.

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
PostPosted: Fri Jun 10, 2011 6:32 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 3:47 pm
Posts: 16668
Location: **Translation Matters**
We solved and committed the patch for 1.7.

Please look here:
http://joomlacode.org/gf/project/joomla ... m_id=26118

this could be ported to 1.5.

_________________
Jean-Marie Simonet / infograf · http://www.info-graf.fr
Multilanguage in 2.5: http://help.joomla.org/files/EN-GB_multilang_tutorial.pdf
---------------------------------
Joomla Translation Coordination Team • Joomla! Production Working Group


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 



Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group