PostgreSQL错误: 编码“UTF8“的字符0x0xe9 0x94 0x99在编码“WIN1252“没有相对应值
错误介绍
今天遇到一个错误,记录一下
2025-02-10 17:04:35.264 HKT [28816] 错误: 编码"WIN1252"的字符0x0x81在编码"UTF8"没有相对应值
2025-02-10 17:04:35.264 HKT [28816] 错误: 编码"UTF8"的字符0x0xe9 0x94 0x99在编码"WIN1252"没有相对应值
2025-02-10 17:04:35.264 HKT [28816] ERROR: character with byte sequence 0xe9 0x94 0x99 in encoding "UTF8" has no equivalent in encoding "WIN1252"
背景描述
说一下背景,公司有个业务场景,需要通过psql.exe工具将sql文件的数据导入到PostgreSQL数据库里面,操作系统是window系统,数据库版本是12.3,其中PostgreSQL服务端创建实例的时候声明了编码是UTF8,sql文件里面也有声明编码方式是UTF8,设置内容:
SET client_encoding to ‘UTF8’;
在这个情况下,测试修改了操作系统的时区,本来时区是国内,改成了英语美国,后面就出现了这个问题,导致一些数据导入失败,因某些原因不能更改系统的配置,需要在这个情况下解决问题。
在windows
系统的cmd
上登录数据库后有这段提示:
WARNING:
Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql referrence
page "Notes for Windows users" for details.
这段错误的大致意思是数据库服务端代码页为437
,但客户端连接时实际使用的编码可能被Windows
终端(1252
)覆盖,导致UTF8
字符无法正确转换为WIN1252
(即Windows
默认的ANSI
编码)。
解释一下为什么客户端的sql文件中设置了UTF8
编码,服务端也设置了UTF8
,为什么通过psql.exe
导入数据会受到操作系统的编码影响。
原因:虽然SQL文件中设置了 SET client_encoding TO 'UTF8'
,但部分客户端工具(如psql
)可能在连接建立时已根据终端环境【windows
系统自己的】自动设置了编码,覆盖了后续的SQL指令。
解决办法
在网上搜了很多教程都没有跟这个错误一样的,或者没有提供一个解决办法,无奈之下去官网查看。
PostgreSQL
官网对每个工具的使用都有详细介绍,其中psql
的地址:https://www.postgresql.org/docs/12/app-psql.html
在psql.exe工具的使用说明中看到这样一段话:
Connecting to a Database
psql is a regular PostgreSQL client application. In order to connect to a database you need to know the name of your target database, the host name and port number of the server, and what user name you want to connect as. psql can be told about those parameters via command line options, namely -d, -h, -p, and -U respectively. If an argument is found that does not belong to any option it will be interpreted as the database name (or the user name, if the database name is already given). Not all of these options are required; there are useful defaults. If you omit the host name, psql will connect via a Unix-domain socket to a server on the local host, or via TCP/IP to localhost on machines that don't have Unix-domain sockets. The default port number is determined at compile time. Since the database server uses the same default, you will not have to specify the port in most cases. The default user name is your operating-system user name, as is the default database name. Note that you cannot just connect to any database under any user name. Your database administrator should have informed you about your access rights.
When the defaults aren't quite right, you can save yourself some typing by setting the environment variables PGDATABASE, PGHOST, PGPORT and/or PGUSER to appropriate values. (For additional environment variables, see Section 33.14.) It is also convenient to have a ~/.pgpass file to avoid regularly having to type in passwords. See Section 33.15 for more information.
An alternative way to specify connection parameters is in a conninfo string or a URI, which is used instead of a database name. This mechanism give you very wide control over the connection. For example:
$ psql "service=myservice sslmode=require"
$ psql postgresql://dbmaster:5433/mydb?sslmode=require
This way you can also use LDAP for connection parameter lookup as described in Section 33.17. See Section 33.1.2 for more information on all the available connection options.
If the connection could not be made for any reason (e.g., insufficient privileges, server is not running on the targeted host, etc.), psql will return an error and terminate.
If both standard input and standard output are a terminal, then psql sets the client encoding to “auto”, which will detect the appropriate client encoding from the locale settings (LC_CTYPE environment variable on Unix systems). If this doesn't work out as expected, the client encoding can be overridden using the environment variable PGCLIENTENCODING.
重点关注最后一句话,内容是:
If both standard input and standard output are a terminal, then psql sets the client encoding to “auto”, which will detect the appropriate client encoding from the locale settings (LC_CTYPE environment variable on Unix systems). If this doesn’t work out as expected, the client encoding can be overridden using the environment variable PGCLIENTENCODING.
翻译:如果由于任何原因(例如权限不足、服务器没有在目标主机上运行等)导致连接无法建立,psql将返回一个错误并且终止。
如果标准输入和标准输出都是一个终端,那么psql会把客户端编码设置成“auto”,这会使psql从区域设置(Unix 系统上的LC_CTYPE环境变量)中检测合适的客户端编码。如果这样不起作用,可以使用环境变量PGCLIENTENCODING覆盖客户端编码。
这里提供了一个解决方法,可以在打开window
系统的cmd
窗口的时候,先设置一下环境变量PGCLIENTENCODING
,亲测有效。
设置方法:
#设置cmd的编码
set PGCLIENTENCODING=UTF8
#执行其他的操作
psql -U username -d dbname -f your_file.sql
知识拓展
官方对这个还有其他的介绍【链接在前面psql.exe那】,原文如下:
Notes for Windows Users
psql is built as a “console application”. Since the Windows console windows use a different encoding than the rest of the system, you must take special care when using 8-bit characters within psql. If psql detects a problematic console code page, it will warn you at startup. To change the console code page, two things are necessary:
Set the code page by entering cmd.exe /c chcp 1252. (1252 is a code page that is appropriate for German; replace it with your value.) If you are using Cygwin, you can put this command in /etc/profile.
Set the console font to Lucida Console, because the raster font does not work with the ANSI code page.
翻译一下,大致意思是:
给 Windows 用户的说明
psql是一个“控制台应用”。由于 Windows 的控制台窗口使用的是一种和系统中其他应用不同的编码,在psql中使用 8 位字符时要特别注意。如果psql检测到一个有问题的控制台代码页,它将会在启动时警告你。要更改控制台代码页,有两件事是必要的:
输入cmd.exe /c chcp 1252可以设置代码页(1252 是适用于德语的一个代码页,请在这里替换成你的值)。如果正在使用 Cygwin,可以把这个命令放在/etc/profile中。
把控制台字体设置为Lucida Console,因为栅格字体无法与 ANSI 代码页一起使用。
这段说明后面还有示例,因为时间有限并没有去一一验证,感兴趣的伙伴可以去官网看看。