Migrating from latin1_swedish_ci to utf8_general_ci database
Moderator: General Support Moderators
Forum rules
Forum Rules
Absolute Beginner's Guide to Joomla! <-- please read before posting, this means YOU.
Forum Post Assistant - If you are serious about wanting help, you will use this tool to help you post.
Forum Rules
Absolute Beginner's Guide to Joomla! <-- please read before posting, this means YOU.
Forum Post Assistant - If you are serious about wanting help, you will use this tool to help you post.
-
- Joomla! Apprentice
- Posts: 24
- Joined: Thu Sep 14, 2006 7:51 pm
Migrating from latin1_swedish_ci to utf8_general_ci database
I run a Vietnamese web site and am migrating from version 1.0.13 to 1.5.8. In version 1.0.13, I made a mistake by setting the database collation to latin1_swedish_gi. As a result, this database stores the Unicode characters in an unreadable format, for example, the string (readable):
Tôi không hiểu ý bạn muốn nói gì
is stored as (unreadable):
Tôi không hiểu ý bạn muốn nói gì
The old site, however, has no problem displaying all the text as readable format. When I create the new site, I set the new database collation to utf8_general_ci. With the migrating tool, I have successfully migrated and converted the text into readable format. But this tool only handles the core tables. I have the extension mXcomment and I must migrate the data manually. I need to convert the text into readable format because the new site cannot display it correctly. So I need to do one of the followings:
1) Convert the text in the database into readable format. Or:
2) Reconfigure the php files on the new site to make it display the text in the desired format.
Can someone give me some direction? Thanks.
Note: All the text entered after the migration is stored as readable format, there's no problem with that.
Tôi không hiểu ý bạn muốn nói gì
is stored as (unreadable):
Tôi không hiểu ý bạn muốn nói gì
The old site, however, has no problem displaying all the text as readable format. When I create the new site, I set the new database collation to utf8_general_ci. With the migrating tool, I have successfully migrated and converted the text into readable format. But this tool only handles the core tables. I have the extension mXcomment and I must migrate the data manually. I need to convert the text into readable format because the new site cannot display it correctly. So I need to do one of the followings:
1) Convert the text in the database into readable format. Or:
2) Reconfigure the php files on the new site to make it display the text in the desired format.
Can someone give me some direction? Thanks.
Note: All the text entered after the migration is stored as readable format, there's no problem with that.
-
- Joomla! Ace
- Posts: 1318
- Joined: Thu Aug 18, 2005 9:27 am
- Location: San Jose, CA, USA
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
You can write a migration plugin to handle the extra tables, its not too hard to do in practice.
http://blog.joomlatools.org/2008/04/rol ... ugins.html
Alternatively you can use iconv or a similar tool to convert the SQL file from latin1 to utf8, but if you've got enough PHP experience (you don't need much) building a migrator plugin is pretty easy and will get it in the main migration SQL file which will be converted with everthing else.
http://blog.joomlatools.org/2008/04/rol ... ugins.html
Alternatively you can use iconv or a similar tool to convert the SQL file from latin1 to utf8, but if you've got enough PHP experience (you don't need much) building a migrator plugin is pretty easy and will get it in the main migration SQL file which will be converted with everthing else.
Sam Moffatt
Updater, Installer and Authentication Systems
JoomlaCode Backend Systems
Pie.
Updater, Installer and Authentication Systems
JoomlaCode Backend Systems
Pie.
- leolam
- Joomla! Master
- Posts: 20652
- Joined: Mon Aug 29, 2005 10:17 am
- Location: Netherlands/ Germany/ S'pore/Bogor/ North America
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
the people at Phoca have made this very practical tool to change your existing site's collation in less than a minute. Works flawless and is free! http://www.phoca.cz/phoca-changing-collation/
Leo
Leo
Joomla's #1 Professional Services Provider:
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
- Imago
- Joomla! Enthusiast
- Posts: 113
- Joined: Fri May 12, 2006 8:37 am
- Location: Sofia
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
Using iconv does not produce very consistent results. There were at least three symbols that remained corrupted. Instead, use the following procedure:
1. Export your DB
2. Open it with Notepad++
3. Convert it to UTF-8 w/o BOM
4. Check that your text is readable
5. Replace all your present collation with utf8_general_ci
6. Save as UTF-8
7. Copy your online DB to a new DB and change to it
8. Delete all tables of the new DB
9. Go to Operations and select utf8_general_ci for collation
10. Import the converted dump
11. Change the access in your config to point to the new DB
1. Export your DB
2. Open it with Notepad++
3. Convert it to UTF-8 w/o BOM
4. Check that your text is readable
5. Replace all your present collation with utf8_general_ci
6. Save as UTF-8
7. Copy your online DB to a new DB and change to it
8. Delete all tables of the new DB
9. Go to Operations and select utf8_general_ci for collation
10. Import the converted dump
11. Change the access in your config to point to the new DB
Prof. Dr. Plamen Gradinarov - CEO and Founder, Eurasia Ltd
- leolam
- Joomla! Master
- Posts: 20652
- Joined: Mon Aug 29, 2005 10:17 am
- Location: Netherlands/ Germany/ S'pore/Bogor/ North America
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
Absolute no need Imago. The tool i pointed you at does exactly what you do manually with a lot of work and a lot of error chances. It is a full automated script which works very well. Why doing things manually when you can have the script do this for you?
I consider this an advise which is useful for experienced users but still not needed. You think I will spend 10 minutes on doing this when i can do it in 10 seconds automated without errors?
I consider this an advise which is useful for experienced users but still not needed. You think I will spend 10 minutes on doing this when i can do it in 10 seconds automated without errors?
You do not have the required permissions to view the files attached to this post.
Joomla's #1 Professional Services Provider:
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
- Imago
- Joomla! Enthusiast
- Posts: 113
- Joined: Fri May 12, 2006 8:37 am
- Location: Sofia
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
Changing the collation of DB, tables and columns is only a small portion of the task. The most important thing is to convert the text, say, from cp1251 to utf8. Since there is no option to define the starting encoding of the text in the DB, I doubt the script does this kind of conversion. Would be happy if you prove me wrong in this disbelief.
Prof. Dr. Plamen Gradinarov - CEO and Founder, Eurasia Ltd
- leolam
- Joomla! Master
- Posts: 20652
- Joined: Mon Aug 29, 2005 10:17 am
- Location: Netherlands/ Germany/ S'pore/Bogor/ North America
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
Load and try before commenting. So go ahead as expected, setup a default Joomla site (if convenient for you as all beginners with Fantastico) and run the script. For me the script runs well. I can give you access to one of our 25 development sites if you have yourself no options to test this?
Leo
Leo
Joomla's #1 Professional Services Provider:
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
- Imago
- Joomla! Enthusiast
- Posts: 113
- Joined: Fri May 12, 2006 8:37 am
- Location: Sofia
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
As I expected, the script does NOT convert cp1251 text to utf8. Instead, it managed to ruin 120,000 entries in a Russian encyclopedia. They simply disappeared - ALL of them. So guys, if your language is not English, DO NOT use that killer script.
Prof. Dr. Plamen Gradinarov - CEO and Founder, Eurasia Ltd
-
- Joomla! Ace
- Posts: 1318
- Joined: Thu Aug 18, 2005 9:27 am
- Location: San Jose, CA, USA
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
The topic states latin1 which isn't cp1251, so this is more than likely the reason its not working. If you find that iconv is having issues with a few characters than I'd suggest that this is potentially one of a few problems:
1) Its a bug in iconv (very unlikely, but still possible)
2) The characters you have in there aren't cp1251 - they may already be utf-8 or some other character set again. iconv can't help you where you've got dirty data, most automated tools will have issues there anyway.
If your data is dirty then getting that converted properly is a real pain where you will have to manually eye ball most entries to look for the problem. There was a blog on O'Reilly that I've pasted a few times which is one guy converting his old mysql db into utf-8 and having insane amounts of fun with different character sets and that may help you in this specific case. The problem itself is dirty data and hopefully once we're all on UTF-8 it'll disappear for the majority of users (then we end up with issues for Japanese and Chinese characters but thats another problem).
1) Its a bug in iconv (very unlikely, but still possible)
2) The characters you have in there aren't cp1251 - they may already be utf-8 or some other character set again. iconv can't help you where you've got dirty data, most automated tools will have issues there anyway.
If your data is dirty then getting that converted properly is a real pain where you will have to manually eye ball most entries to look for the problem. There was a blog on O'Reilly that I've pasted a few times which is one guy converting his old mysql db into utf-8 and having insane amounts of fun with different character sets and that may help you in this specific case. The problem itself is dirty data and hopefully once we're all on UTF-8 it'll disappear for the majority of users (then we end up with issues for Japanese and Chinese characters but thats another problem).
Sam Moffatt
Updater, Installer and Authentication Systems
JoomlaCode Backend Systems
Pie.
Updater, Installer and Authentication Systems
JoomlaCode Backend Systems
Pie.
- leolam
- Joomla! Master
- Posts: 20652
- Joined: Mon Aug 29, 2005 10:17 am
- Location: Netherlands/ Germany/ S'pore/Bogor/ North America
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
You should be a little bit more careful in your postings. The thread reads: Migrating from latin1_swedish_ci to utf8_general_ci database The script does this excellent.Imago wrote:As I expected, the script does NOT convert cp1251 text to utf8. Instead, it managed to ruin 120,000 entries in a Russian encyclopedia. They simply disappeared - ALL of them. So guys, if your language is not English, DO NOT use that killer script.
Next I just changed the collation of a default Joomla site from utf8_general_ci into cp1251_ukrainian_ci without a hick. examples of the output:
ALTER DATABASE joomad1_jo151 COLLATE cp1251_ukrainian_ci ... OK
ALTER TABLE jos_banner COLLATE cp1251_ukrainian_ci ... OK
ALTER TABLE jos_banner MODIFY type varchar(30) CHARACTER SET binary ... OK
ALTER TABLE jos_banner MODIFY type varchar(30) COLLATE cp1251_ukrainian_ci DEFAULT 'banner' ... OK
ALTER TABLE jos_banner MODIFY name varchar(255) CHARACTER SET binary ... OK
ALTER TABLE jos_banner MODIFY name varchar(255) COLLATE cp1251_ukrainian_ci ... OK
ALTER TABLE jos_banner MODIFY alias varchar(255) CHARACTER SET binary ... OK
ALTER TABLE jos_banner MODIFY alias varchar(255) COLLATE cp1251_ukrainian_ci ... OK
ALTER TABLE jos_banner MODIFY imageurl varchar(100) CHARACTER SET binary ... OK
ALTER TABLE jos_banner MODIFY imageurl varchar(100) COLLATE cp1251_ukrainian_ci ... OK
ALTER TABLE jos_banner MODIFY clickurl varchar(200) CHARACTER SET binary ... OK
ALTER TABLE jos_banner MODIFY clickurl varchar(200) COLLATE cp1251_ukrainian_ci ... OK
ALTER TABLE jos_banner MODIFY editor varchar(50) CHARACTER SET binary ... OK
ALTER TABLE jos_banner MODIFY editor varchar(50) COLLATE cp1251_ukrainian_ci ... OK
ALTER TABLE jos_banner MODIFY custombannercode text CHARACTER SET binary ... OK
ALTER TABLE jos_banner MODIFY custombannercode text COLLATE cp1251_ukrainian_ci ... OK
ALTER TABLE jos_banner MODIFY description text CHARACTER SET binary ... OK
ALTER TABLE jos_banner MODIFY description text COLLATE cp1251_ukrainian_ci ... OK
ALTER TABLE jos_banner MODIFY tags text CHARACTER SET binary ... OK
ALTER TABLE jos_banner MODIFY tags text COLLATE cp1251_ukrainian_ci ... OK
ALTER TABLE jos_banner MODIFY params text CHARACTER SET binary ... OK
ALTER TABLE jos_banner MODIFY params text COLLATE cp1251_ukrainian_ci ... OK
proof of success is attached.
Also I can show you sites which are in German (no English) which is a language that has very special characters where this script runs very well. It also works well with Greek amongst others........Do not generalize all issues.......The script is a fabulous piece of code for the task.
You can find a specific non Joomla script which works also very, very well (and we use that for non-joomla sites where we run into issues with character sets:
http://bogdan.org.ua/2008/02/08/convert ... other.html
Where it wont work (Chinese most likely and Japanese as Sam mentioned yes we do it manually in the way you described. In most cases not needed though....
Positivism is a virtue and the opposite is destruction
Cheers
Leo
You do not have the required permissions to view the files attached to this post.
Joomla's #1 Professional Services Provider:
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
- leolam
- Joomla! Master
- Posts: 20652
- Joined: Mon Aug 29, 2005 10:17 am
- Location: Netherlands/ Germany/ S'pore/Bogor/ North America
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
Please note the following:
I have entered in Joomadev1.info the following in "we are volunteers" :
Tôi không hiểu ý bạn muốn nói gì Site runs default encoding as was the fact with poster henryvuong
I have than run the script and changed to latin1_swedish_ci (with the script
see image number1 Please note date and time of update (just in case) Last Updated on Friday, 28 November 2008 08:51
I than run the script which can be found at http://joomadev1.info/changing_collation/index.php
result: See http://joomadev1.info
Leo
I have entered in Joomadev1.info the following in "we are volunteers" :
Tôi không hiểu ý bạn muốn nói gì Site runs default encoding as was the fact with poster henryvuong
I have than run the script and changed to latin1_swedish_ci (with the script
see image number1 Please note date and time of update (just in case) Last Updated on Friday, 28 November 2008 08:51
I than run the script which can be found at http://joomadev1.info/changing_collation/index.php
result: See http://joomadev1.info
Leo
You do not have the required permissions to view the files attached to this post.
Joomla's #1 Professional Services Provider:
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
- pe7er
- Joomla! Master
- Posts: 24986
- Joined: Thu Aug 18, 2005 8:55 pm
- Location: Nijmegen, Netherlands
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
MOD note: We've deleted some messages in this thread and made some small edits to the current ones.
Keep all commentary civil, and be courteous at all times. Constructive criticism is welcome, but insults directed towards other users or the site admins will not be tolerated. Coarse/insulting language will not be tolerated.
Keep all commentary civil, and be courteous at all times. Constructive criticism is welcome, but insults directed towards other users or the site admins will not be tolerated. Coarse/insulting language will not be tolerated.
Kind Regards,
Peter Martin, Global Moderator
Company website: https://db8.nl/en/ - Joomla specialist, Nijmegen, Netherlands
The best website: https://the-best-website.com
Peter Martin, Global Moderator
Company website: https://db8.nl/en/ - Joomla specialist, Nijmegen, Netherlands
The best website: https://the-best-website.com
-
- Joomla! Apprentice
- Posts: 24
- Joined: Thu Sep 14, 2006 7:51 pm
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
Thanks to all the advices. My baby is born on the Thanksgiving Day. I'll be busy for a while. I'll try it later.
-
- Joomla! Apprentice
- Posts: 38
- Joined: Tue Dec 25, 2007 8:58 pm
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
It worked for me! thank you phoca! And thanks leolam! (using 1.5.9)
Now I have a question - why did this happen in the first place? I installed joomla using simplescripts, thats supposed to be utf-8, no? Is there a way to control the collation?
Then I imported a new DB from yootheme, could that have made the problem?
Now I have a question - why did this happen in the first place? I installed joomla using simplescripts, thats supposed to be utf-8, no? Is there a way to control the collation?
Then I imported a new DB from yootheme, could that have made the problem?
Compare hosting http://moneyonlinetestimonials.com/hosting
- leolam
- Joomla! Master
- Posts: 20652
- Joined: Mon Aug 29, 2005 10:17 am
- Location: Netherlands/ Germany/ S'pore/Bogor/ North America
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
You can see that in the database which you received from Yootheme. Most unlikely though in case of a Joomla 1.5-DB. If you are not sure you can send that database to the email in my profile and i will have a look.mordechai wrote: Then I imported a new DB from yootheme, could that have made the problem?
I think though that you created a database with the wrong collation maybe manually?
Leo
Joomla's #1 Professional Services Provider:
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
#Joomla Professional Support: https://gws-desk.com -
#Joomla Specialized Hosting Solutions: https://gws-host.com -
-
- Joomla! Apprentice
- Posts: 34
- Joined: Sun Jan 14, 2007 1:25 pm
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
thanks for this....leolam wrote:the people at Phoca have made this very practical tool to change your existing site's collation in less than a minute. Works flawless and is free! http://www.phoca.cz/phoca-changing-collation/
Leo
i have changed collation for my DB using this, it cut-off some words, but it was easier for me to correct it after then to meddle with manual converting
regards
-
- Joomla! Fledgling
- Posts: 1
- Joined: Tue Nov 01, 2011 7:45 pm
Re: Migrating from latin1_swedish_ci to utf8_general_ci data
joomla uses defaults for making tables ..
if you do this BEFORE THE INSTALLATION you will have every collation as "utf8_general_ci"
1. open notepad++ (google it for the download link)
2. crtl + f , and click "find in files" tab
3. Enter "<joomla directory>\installation" in Directory field.
3. enter "latin1" in "find what" and "utf8" in "replace with".
5. press "replace in files"
or simply replace every "latin1" with "utf8" in the <joomla directory>\installation directory.
if you do this BEFORE THE INSTALLATION you will have every collation as "utf8_general_ci"
1. open notepad++ (google it for the download link)
2. crtl + f , and click "find in files" tab
3. Enter "<joomla directory>\installation" in Directory field.
3. enter "latin1" in "find what" and "utf8" in "replace with".
5. press "replace in files"
or simply replace every "latin1" with "utf8" in the <joomla directory>\installation directory.
-
- Joomla! Apprentice
- Posts: 5
- Joined: Mon Jul 29, 2013 8:24 pm
Re: Migrating from latin1_swedish_ci to utf8_general_ci data
how can we use it? i have upload it too my root server and then what should i do?leolam wrote:the people at Phoca have made this very practical tool to change your existing site's collation in less than a minute. Works flawless and is free! http://www.phoca.cz/phoca-changing-collation/
Leo
- dpacadmin
- Joomla! Champion
- Posts: 6029
- Joined: Sat Aug 16, 2008 1:46 pm
- Location: the Bat Cave
- Contact:
Re: Migrating from latin1_swedish_ci to utf8_general_ci data
There are instructions here;
http://www.phoca.cz/documents/38-tools/ ... n-database
please note this point it makes;
"Please, run this script only if you know what you do"
http://www.phoca.cz/documents/38-tools/ ... n-database
please note this point it makes;
"Please, run this script only if you know what you do"
-
- Joomla! Apprentice
- Posts: 27
- Joined: Mon Sep 22, 2008 11:28 am
Re: Migrating from latin1_swedish_ci to utf8_general_ci database
is there such tool to convert the sql file ?