828华为云征文 | 云服务器Flexus X实例:向量数据库 pgvector 部署,实现向量检索
目录
一、什么是向量数据库 pgvector ?
二、pgvector 部署
2.1 安装 Docker
2.2 拉取镜像
2.3 添加规则
三、pgvector 运行
3.1 运行 pgvector
3.2 连接 pgvector
3.3 pgvector 常见操作
四、总结
本篇文章通过 云服务器Flexus X实例 部署向量数据库 pgvector,实现向量的相似性检索和存储。云服务器Flexus X实例 能够为 向量数据库 pgvector 提供稳定和安全的运行环境,并且,云服务器Flexus X实例 适用于中负载业务,且期望资源灵活选配的中小企业和开发者,具有灵活自定义规格、性能稳定强劲、按需灵活计费的优势。
一、什么是向量数据库 pgvector ?
Postgres 的开源向量相似性搜索,将向量与其余数据一起存储。pgvector是一个提供向量相似性搜索功能的开源 PostgreSQL 扩展,现已发布v0.7.0。此新版本包含许多新功能和性能特性,用于支持 PostgreSQL 中的向量相似性搜索工作负载。
支持如下功能:
(1)精确和近似最近邻搜索;
(2)单精度、半精度、二进制和稀疏向量;
(3)L2 距离、内积、余弦距离、L1 距离、汉明距离和杰卡德距离;
(4)任何具有 Postgres 客户端的语言。
下面在 云服务器Flexus X实例 上部署 pgvector。
二、pgvector 部署
2.1 安装 Docker
然后,执行命令安装 docker,如下所示。
root@flexusx-7305:~# sudo apt install docker-ce
查看 docker 版本。
root@flexusx-7305:~# docker --version
Docker version 27.2.1, build 9e34c9b
root@flexusx-7305:~#
最后,安装 docker-compose,执行如下命令。
root@flexusx-7305:~# sudo apt install docker-compose
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
redis-server redis-tools
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
python3-cached-property python3-docker python3-dockerpty python3-docopt python3-importlib-metadata python3-jsonschema python3-more-itertools python3-pyrsistent python3-texttable python3-websocket python3-zipp
Suggested packages:
python-jsonschema-doc
Recommended packages:
docker.io
The following NEW packages will be installed:
docker-compose python3-cached-property python3-docker python3-dockerpty python3-docopt python3-importlib-metadata python3-jsonschema python3-more-itertools python3-pyrsistent python3-texttable python3-websocket python3-zipp
0 upgraded, 12 newly installed, 0 to remove and 33 not upgraded.
Need to get 412 kB of archives.
After this operation, 2,414 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://repo.huaweicloud.com/ubuntu focal/universe amd64 python3-cached-property all 1.5.1-4 [10.9 kB]
Get:2 http://repo.huaweicloud.com/ubuntu focal/universe amd64 python3-websocket all 0.53.0-2ubuntu1 [
到这里 Docker 安装完成。
2.2 拉取镜像
拉取 pgvector 镜像,执行如下命令。
root@flexusx-7305:~# docker pull registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
v0.7.0: Pulling from fastgpt/pgvector
Digest: sha256:27df42f0d0be8d5623ff1aea5fea7134e175af1cdef62d9df00b322a3c85edc9
Status: Image is up to date for registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
root@flexusx-7305:~#
如上所示,已经存在镜像,如果没有的话,可以通过如上方式拉取。
2.3 添加规则
pgvector 对应的端口是 5432,需要将 5432 端口加入到准入规则中。
首先,在基本信息中,找到安全组,点击进入安全组,如下所示。
然后,点击 配置规则 配置 5432 端口,如下所示。
设置优先级,然后在协议端口中添加端口,点击确定,如下所示。
可以看到 5432 端口已经被加入到安全规则中,如下所示。
三、pgvector 运行
3.1 运行 pgvector
首先,查看一下本地 pgvector 镜像,执行如下命令。
root@flexusx-7305:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox latest 0f26cf6654ad 2 weeks ago 315MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt v4.8.9 bc394a806301 6 weeks ago 356MB
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/gitea/gitea 1.22.1 b3de72970178 2 months ago 167MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/one-api v0.6.6 40efbc4449c7 4 months ago 79.5MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector v0.7.0 6e0cb183450e 5 months ago 429MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/mysql 8.0.36 f5f171121fa3 6 months ago 603MB
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/justsong/one-api v0.6.0 36bd98ce5a7c 7 months ago 48.4MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo 5.0.18 021e1bd71d92 16 months ago 662MB
daocloud.io/library/mysql 8 26d0ac143221 3 years ago 546MB
daocloud.io/library/mysql latest 8457e9155715 3 years ago 546MB
root@flexusx-7305:~#
如上所示,pgvector 对应的镜像是 registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0。
然后,执行 docker run 命令运行容器,执行如下命令。
root@flexusx-7305:~# docker run --name pgvectorface --restart=always -e POSTGRES_USER=pgvectorface -e POSTGRES_PASSWORD=pgvector -p 54333:5432 -d registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
982a9ed7352450eb192c04ca9f7dbf31bfd9d1ccf9af4a234c85dc85d4338e41
root@flexusx-7305:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
982a9ed73524 registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0 "docker-entrypoint.s…" 11 seconds ago Up 11 seconds 0.0.0.0:54333->5432/tcp, [::]:54333->5432/tcp pgvectorface
68a1f9a73e58 registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt:v4.8.9 "sh -c 'node --max-o…" 10 days ago Up 10 days 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp fastgpt
b57af8cd1b6b registry.cn-hangzhou.aliyuncs.com/fastgpt/one-api:v0.6.6 "/one-api" 10 days ago Up 10 days 0.0.0.0:3001->3000/tcp, [::]:3001->3000/tcp oneapi
2de37c379c6a registry.cn-hangzhou.aliyuncs.com/fastgpt/mysql:8.0.36 "docker-entrypoint.s…" 10 days ago Up 10 days 0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp mysql
9d7906452f26 registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox:latest "docker-entrypoint.s…" 10 days ago Up 10 days sandbox
6f9c7f088d9d registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo:5.0.18 "bash -c 'openssl ra…" 10 days ago Up 10 days 0.0.0.0:27017->27017/tcp, :::27017->27017/tcp mongo
3867cf7f6df9 registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0 "docker-entrypoint.s…" 10 days ago Up 10 days 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp pg
89bb9f7a3dd1 swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/justsong/one-api:v0.6.0 "/one-api" 12 days ago Up 11 days 0.0.0.0:3002->3000/tcp, [::]:3002->3000/tcp one-api
65fe1c102df6 daocloud.io/library/mysql:8 "docker-entrypoint.s…" 2 weeks ago Up 11 days 3306/tcp, 33060/tcp root_db_1
root@flexusx-7305:~#
如上所示, pgvector 已经运行成功。
3.2 连接 pgvector
安装 pgvector 客户端,安装软件包 postgresql-client-common 和 postgresql-client,执行如下命令安装。
root@flexusx-7305:~# apt install postgresql-client-common
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
libcgi-fast-perl libcgi-pm-perl libencode-locale-perl libevent-core-2.1-7 libevent-pthreads-2.1-7 libfcgi-perl libhtml-parser-perl libhtml-tagset-perl libhtml-template-perl libhttp-date-perl libhttp-message-perl libio-html-perl
liblwp-mediatypes-perl libmecab2 libtimedate-perl liburi-perl mecab-ipadic mecab-ipadic-utf8 mecab-utils mysql-server-core-8.0 redis-server redis-tools
Use 'apt autoremove' to remove them.
The following NEW packages will be installed:
postgresql-client-common
0 upgraded, 1 newly installed, 0 to remove and 61 not upgraded.
Need to get 28.2 kB of archives.
After this operation, 182 kB of additional disk space will be used.
Get:1 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client-common all 214ubuntu0.1 [28.2 kB]
Fetched 28.2 kB in 0s (314 kB/s)
Selecting previously unselected package postgresql-client-common.
(Reading database ... 123209 files and directories currently installed.)
Preparing to unpack .../postgresql-client-common_214ubuntu0.1_all.deb ...
Unpacking postgresql-client-common (214ubuntu0.1) ...
Setting up postgresql-client-common (214ubuntu0.1) ...
Processing triggers for man-db (2.9.1-1) ...
root@flexusx-7305:~# apt-get install postgresql-client
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
libcgi-fast-perl libcgi-pm-perl libencode-locale-perl libevent-core-2.1-7 libevent-pthreads-2.1-7 libfcgi-perl libhtml-parser-perl libhtml-tagset-perl libhtml-template-perl libhttp-date-perl libhttp-message-perl libio-html-perl
liblwp-mediatypes-perl libmecab2 libtimedate-perl liburi-perl mecab-ipadic mecab-ipadic-utf8 mecab-utils mysql-server-core-8.0 redis-server redis-tools
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
libpq5 postgresql-client-12
Suggested packages:
postgresql-12 postgresql-doc-12
The following NEW packages will be installed:
libpq5 postgresql-client postgresql-client-12
0 upgraded, 3 newly installed, 0 to remove and 61 not upgraded.
Need to get 1,176 kB of archives.
After this operation, 4,303 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 libpq5 amd64 12.20-0ubuntu0.20.04.1 [117 kB]
Get:2 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client-12 amd64 12.20-0ubuntu0.20.04.1 [1,055 kB]
Get:3 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client all 12+214ubuntu0.1 [3,940 B]
Fetched 1,176 kB in 0s (6,884 kB/s)
Selecting previously unselected package libpq5:amd64.
(Reading database ... 123246 files and directories currently installed.)
Preparing to unpack .../libpq5_12.20-0ubuntu0.20.04.1_amd64.deb ...
Unpacking libpq5:amd64 (12.20-0ubuntu0.20.04.1) ...
Selecting previously unselected package postgresql-client-12.
Preparing to unpack .../postgresql-client-12_12.20-0ubuntu0.20.04.1_amd64.deb ...
Unpacking postgresql-client-12 (12.20-0ubuntu0.20.04.1) ...
Selecting previously unselected package postgresql-client.
Preparing to unpack .../postgresql-client_12+214ubuntu0.1_all.deb ...
Unpacking postgresql-client (12+214ubuntu0.1) ...
Setting up libpq5:amd64 (12.20-0ubuntu0.20.04.1) ...
Setting up postgresql-client-12 (12.20-0ubuntu0.20.04.1) ...
update-alternatives: using /usr/share/postgresql/12/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-client (12+214ubuntu0.1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.16) ...
root@flexusx-7305:~#
如上所示,软件包安装成功。
然后,通过 psql 客户端连接 pgvector,如下所示。
root@flexusx-7305:~# psql -h 0.0.0.0 -p 54333 -U pgvectorface
Password for user pgvectorface:
psql (12.20 (Ubuntu 12.20-0ubuntu0.20.04.1), server 15.6 (Debian 15.6-1.pgdg120+2))
WARNING: psql major version 12, server major version 15.
Some psql features might not work.
Type "help" for help.
pgvectorface=#
3.3 pgvector 常见操作
用于 EXPLAIN ANALYZE 调试性能。
EXPLAIN ANALYZE SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
为了加快没有索引的查询,请增加max_parallel_workers_per_gather。
SET max_parallel_workers_per_gather = 4;
如果向量标准化为长度 1(如OpenAI 嵌入),则使用内积可获得最佳性能。
SELECT * FROM items ORDER BY embedding <#> '[3,1,2]' LIMIT 5;
为了加快使用 IVFFlat 索引的查询速度,请增加倒排列表的数量(以牺牲召回率为代价)。
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 1000);
清理 HNSW 索引可能需要一段时间。请先重新索引以加快速度。
REINDEX INDEX CONCURRENTLY index_name;
VACUUM table_name;
四、总结
通过在 云服务器Flexus X实例 上安装向量数据库 pgvector,展现了 云服务器Flexus X实例 的安全和稳定,在部署的过程中也非常顺利,能够快速实现部署,服务器使用很方便,并且 云服务器Flexus X实例 支持自定义配置系统盘规格及容量,支持多个不同类型的数据盘,赶紧用起来吧!