如何:转换旧式编码的MySQL数据库到UTF-8

• 本文约 817 字,阅读大致需要 2 分钟 | Development | #convert | #i18n | #MySQL | #unicode | #utf-8

最近帮公司的一个客户做了一个数据库迁移,客户声称数据是 utf-8 的,然而在使用过程中出现了许多乱码,检查发现数据并非 utf-8,而是 utf-8 编码之后的 big5,而排序方式更是混乱不堪的默认的utf8-swedian-ci

MySQL的国际化支持很差。MySQL从 4.1 版本开始大刀阔斧地进行了不兼容的改动,简单地说,这些改动让相当多的操作默认以UTF-8进行,然而这会给旧的应用程序带来很多问题。许多文献推荐使用 SET CHARACTER SET 作为 workaround,尽管这能够解决一些问题,但这种做法本质上会导致 MySQL 进行额外的转换,因此并不是十分理想。

正确的方法是把所有的东西都转换成 UTF-8。一个比较常见的错误是使用SET NAMES UTF8并直接从原始的 Big5 表中导出。这种情况下,文字会被以UTF-8编码的big5方式保存。比较简单的做法是将其直接导入数据表,并以 latin-1 导出:

mysqldump -u 用户名 -p密码 --skip-extended-insert --default-character-set=latin1 --databases 数据库名 > big5.sql

接着,使用 piconv (随 Perl 提供)来转换编码。我个人认为这个工具比 uconviconv 都要好。

piconv -f big5 -t utf8 < big5.sql > utf8.sql

接下来要把 utf8.sql 中的 latin1 等不正确的编码改为 utf8(通常,简单地sed即可)

最后重新导入。前面 skip-extended-insert 可以让发生错误时显示较具体的行号,这对于较大的数据库而言会比较方便。

⚠️ 免责声明

此 HOWTO 文档以现状方式提供,不提供任何担保。此文档为个人作弊条,本人会对部分疑问进行解释,但同样不提供任何保证。

此篇按下述授权发表:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/*-
 * Copyright (c) 2009 Xin LI <delphij@FreeBSD.org>
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 */