Joomla! Discussion Forums



It is currently Thu Dec 17, 2009 10:20 pm (All times are UTC )

 




Post new topic Reply to topic  [ 3 posts ] 
Author Message
Posted: Thu Sep 17, 2009 7:50 pm 
Joomla! Intern
Joomla! Intern
Offline

Joined: Fri Jul 11, 2008 9:08 pm
Posts: 75
I have an article text which says "u u" - 3 characters, but when I look at the byte array, it says {3C, C2, A0, 3E} - 4 bytes.

It breaks plugins which use preg_match_all with the flag PREG_OFFSET_CAPTURE followed by JString::substr using the captured offsets.
e.g. when matching "u" on "u u":
preg_match_all reports 2 matches with offset 0 and 3, i.e. it looks at the bytes; whereas
JString::substr("u u", 0, 3) returns "u u", i.e. it looks at the characters

Why is a simple space encoded as two bytes? How do I correct that?


Last edited by Parvus on Wed Sep 23, 2009 7:07 pm, edited 1 time in total.

Top
  E-mail  
 
Posted: Fri Sep 18, 2009 9:02 pm 
Joomla! Intern
Joomla! Intern
Offline

Joined: Fri Jul 11, 2008 9:08 pm
Posts: 75
Of course it has something to do with different encodings.
The text is UTF-8, and in this case, some characters use two bytes. The JString::substr() function is fully UTF-8 aware, but the preg_match_all() call is not. Hence the discrepancy in reported offset.

I searched the php forums, but apparently preg_match_all() can not be made UT-8 aware on the string it has to search - strangely enough only on the search pattern.
What can I do to have the same functionality as preg_match_all(), but then with correct handling of multi-byte characters?


Top
  E-mail  
 
 Post subject:
Posted: Wed Sep 23, 2009 7:05 pm 
Joomla! Intern
Joomla! Intern
Offline

Joined: Fri Jul 11, 2008 9:08 pm
Posts: 75
Well, I don't like monologues, but I feel compelled to write how it was 'solved'.

It isn't solved, it is merely circumvented. I was able to modify the regular expression to include the substrings. Previously I accessed them using the substr function, now that that need is gone, I also have no problems any more with multi-byte access.

Marked as solved as I don't want to fuel the excessive posts here.


Top
  E-mail  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

Quick reply

 



Who is online

Users browsing this forum: No registered users and 17 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group