当前位置: 首页 > article >正文

828华为云征文 | 云服务器Flexus X实例:向量数据库 pgvector 部署,实现向量检索

目录

一、什么是向量数据库 pgvector ?

二、pgvector 部署

 2.1 安装 Docker 

2.2 拉取镜像

2.3 添加规则

三、pgvector 运行

3.1 运行 pgvector

3.2 连接 pgvector

3.3 pgvector 常见操作

四、总结


本篇文章通过 云服务器Flexus X实例 部署向量数据库 pgvector,实现向量的相似性检索和存储。云服务器Flexus X实例 能够为 向量数据库 pgvector 提供稳定和安全的运行环境,并且,云服务器Flexus X实例 适用于中负载业务,且期望资源灵活选配的中小企业和开发者,具有灵活自定义规格、性能稳定强劲、按需灵活计费的优势。

一、什么是向量数据库 pgvector ?

​ Postgres 的开源向量相似性搜索,将向量与其余数据一起存储。pgvector是一个提供向量相似性搜索功能的开源 PostgreSQL 扩展,现已发布v0.7.0。此新版本包含许多新功能和性能特性,用于支持 PostgreSQL 中的向量相似性搜索工作负载。

支持如下功能:

(1)精确和近似最近邻搜索;

(2)单精度、半精度、二进制和稀疏向量;

(3)L2 距离、内积、余弦距离、L1 距离、汉明距离和杰卡德距离;

(4)任何具有 Postgres 客户端的语言。 ​

下面在 云服务器Flexus X实例 上部署 pgvector。

二、pgvector 部署

 2.1 安装 Docker 

然后,执行命令安装 docker,如下所示。

root@flexusx-7305:~# sudo apt install docker-ce

 查看 docker 版本。

root@flexusx-7305:~# docker --version
Docker version 27.2.1, build 9e34c9b
root@flexusx-7305:~#

最后,安装 docker-compose,执行如下命令。 

root@flexusx-7305:~# sudo apt install docker-compose
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  redis-server redis-tools
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  python3-cached-property python3-docker python3-dockerpty python3-docopt python3-importlib-metadata python3-jsonschema python3-more-itertools python3-pyrsistent python3-texttable python3-websocket python3-zipp
Suggested packages:
  python-jsonschema-doc
Recommended packages:
  docker.io
The following NEW packages will be installed:
  docker-compose python3-cached-property python3-docker python3-dockerpty python3-docopt python3-importlib-metadata python3-jsonschema python3-more-itertools python3-pyrsistent python3-texttable python3-websocket python3-zipp
0 upgraded, 12 newly installed, 0 to remove and 33 not upgraded.
Need to get 412 kB of archives.
After this operation, 2,414 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://repo.huaweicloud.com/ubuntu focal/universe amd64 python3-cached-property all 1.5.1-4 [10.9 kB]
Get:2 http://repo.huaweicloud.com/ubuntu focal/universe amd64 python3-websocket all 0.53.0-2ubuntu1 [

到这里 Docker 安装完成。 

2.2 拉取镜像

拉取 pgvector 镜像,执行如下命令。

root@flexusx-7305:~# docker pull registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
v0.7.0: Pulling from fastgpt/pgvector
Digest: sha256:27df42f0d0be8d5623ff1aea5fea7134e175af1cdef62d9df00b322a3c85edc9
Status: Image is up to date for registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
root@flexusx-7305:~#

如上所示,已经存在镜像,如果没有的话,可以通过如上方式拉取。

2.3 添加规则

pgvector 对应的端口是 5432,需要将 5432 端口加入到准入规则中。

首先,在基本信息中,找到安全组,点击进入安全组,如下所示。

 然后,点击 配置规则 配置 5432 端口,如下所示。

设置优先级,然后在协议端口中添加端口,点击确定,如下所示。 

可以看到 5432 端口已经被加入到安全规则中,如下所示。

 

三、pgvector 运行

3.1 运行 pgvector

首先,查看一下本地 pgvector 镜像,执行如下命令。

root@flexusx-7305:~# docker images
REPOSITORY                                                            TAG       IMAGE ID       CREATED         SIZE
registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox             latest    0f26cf6654ad   2 weeks ago     315MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt                     v4.8.9    bc394a806301   6 weeks ago     356MB
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/gitea/gitea        1.22.1    b3de72970178   2 months ago    167MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/one-api                     v0.6.6    40efbc4449c7   4 months ago    79.5MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector                    v0.7.0    6e0cb183450e   5 months ago    429MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/mysql                       8.0.36    f5f171121fa3   6 months ago    603MB
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/justsong/one-api   v0.6.0    36bd98ce5a7c   7 months ago    48.4MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo                       5.0.18    021e1bd71d92   16 months ago   662MB
daocloud.io/library/mysql                                             8         26d0ac143221   3 years ago     546MB
daocloud.io/library/mysql                                             latest    8457e9155715   3 years ago     546MB
root@flexusx-7305:~# 

如上所示,pgvector 对应的镜像是 registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0。

然后,执行 docker run 命令运行容器,执行如下命令。

root@flexusx-7305:~# docker run --name pgvectorface --restart=always -e POSTGRES_USER=pgvectorface -e POSTGRES_PASSWORD=pgvector -p 54333:5432 -d registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
982a9ed7352450eb192c04ca9f7dbf31bfd9d1ccf9af4a234c85dc85d4338e41
root@flexusx-7305:~# docker ps
CONTAINER ID   IMAGE                                                                        COMMAND                  CREATED          STATUS          PORTS                                                  NAMES
982a9ed73524   registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0                    "docker-entrypoint.s…"   11 seconds ago   Up 11 seconds   0.0.0.0:54333->5432/tcp, [::]:54333->5432/tcp          pgvectorface
68a1f9a73e58   registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt:v4.8.9                     "sh -c 'node --max-o…"   10 days ago      Up 10 days      0.0.0.0:3000->3000/tcp, :::3000->3000/tcp              fastgpt
b57af8cd1b6b   registry.cn-hangzhou.aliyuncs.com/fastgpt/one-api:v0.6.6                     "/one-api"               10 days ago      Up 10 days      0.0.0.0:3001->3000/tcp, [::]:3001->3000/tcp            oneapi
2de37c379c6a   registry.cn-hangzhou.aliyuncs.com/fastgpt/mysql:8.0.36                       "docker-entrypoint.s…"   10 days ago      Up 10 days      0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp   mysql
9d7906452f26   registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox:latest             "docker-entrypoint.s…"   10 days ago      Up 10 days                                                             sandbox
6f9c7f088d9d   registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo:5.0.18                       "bash -c 'openssl ra…"   10 days ago      Up 10 days      0.0.0.0:27017->27017/tcp, :::27017->27017/tcp          mongo
3867cf7f6df9   registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0                    "docker-entrypoint.s…"   10 days ago      Up 10 days      0.0.0.0:5432->5432/tcp, :::5432->5432/tcp              pg
89bb9f7a3dd1   swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/justsong/one-api:v0.6.0   "/one-api"               12 days ago      Up 11 days      0.0.0.0:3002->3000/tcp, [::]:3002->3000/tcp            one-api
65fe1c102df6   daocloud.io/library/mysql:8                                                  "docker-entrypoint.s…"   2 weeks ago      Up 11 days      3306/tcp, 33060/tcp                                    root_db_1
root@flexusx-7305:~#

如上所示, pgvector 已经运行成功。

3.2 连接 pgvector

安装 pgvector 客户端,安装软件包 postgresql-client-common 和 postgresql-client,执行如下命令安装。

root@flexusx-7305:~# apt install postgresql-client-common
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libcgi-fast-perl libcgi-pm-perl libencode-locale-perl libevent-core-2.1-7 libevent-pthreads-2.1-7 libfcgi-perl libhtml-parser-perl libhtml-tagset-perl libhtml-template-perl libhttp-date-perl libhttp-message-perl libio-html-perl
  liblwp-mediatypes-perl libmecab2 libtimedate-perl liburi-perl mecab-ipadic mecab-ipadic-utf8 mecab-utils mysql-server-core-8.0 redis-server redis-tools
Use 'apt autoremove' to remove them.
The following NEW packages will be installed:
  postgresql-client-common
0 upgraded, 1 newly installed, 0 to remove and 61 not upgraded.
Need to get 28.2 kB of archives.
After this operation, 182 kB of additional disk space will be used.
Get:1 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client-common all 214ubuntu0.1 [28.2 kB]
Fetched 28.2 kB in 0s (314 kB/s)                    
Selecting previously unselected package postgresql-client-common.
(Reading database ... 123209 files and directories currently installed.)
Preparing to unpack .../postgresql-client-common_214ubuntu0.1_all.deb ...
Unpacking postgresql-client-common (214ubuntu0.1) ...
Setting up postgresql-client-common (214ubuntu0.1) ...
Processing triggers for man-db (2.9.1-1) ...
root@flexusx-7305:~# apt-get install postgresql-client
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libcgi-fast-perl libcgi-pm-perl libencode-locale-perl libevent-core-2.1-7 libevent-pthreads-2.1-7 libfcgi-perl libhtml-parser-perl libhtml-tagset-perl libhtml-template-perl libhttp-date-perl libhttp-message-perl libio-html-perl
  liblwp-mediatypes-perl libmecab2 libtimedate-perl liburi-perl mecab-ipadic mecab-ipadic-utf8 mecab-utils mysql-server-core-8.0 redis-server redis-tools
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  libpq5 postgresql-client-12
Suggested packages:
  postgresql-12 postgresql-doc-12
The following NEW packages will be installed:
  libpq5 postgresql-client postgresql-client-12
0 upgraded, 3 newly installed, 0 to remove and 61 not upgraded.
Need to get 1,176 kB of archives.
After this operation, 4,303 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 libpq5 amd64 12.20-0ubuntu0.20.04.1 [117 kB]
Get:2 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client-12 amd64 12.20-0ubuntu0.20.04.1 [1,055 kB]
Get:3 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client all 12+214ubuntu0.1 [3,940 B]
Fetched 1,176 kB in 0s (6,884 kB/s)           
Selecting previously unselected package libpq5:amd64.
(Reading database ... 123246 files and directories currently installed.)
Preparing to unpack .../libpq5_12.20-0ubuntu0.20.04.1_amd64.deb ...
Unpacking libpq5:amd64 (12.20-0ubuntu0.20.04.1) ...
Selecting previously unselected package postgresql-client-12.
Preparing to unpack .../postgresql-client-12_12.20-0ubuntu0.20.04.1_amd64.deb ...
Unpacking postgresql-client-12 (12.20-0ubuntu0.20.04.1) ...
Selecting previously unselected package postgresql-client.
Preparing to unpack .../postgresql-client_12+214ubuntu0.1_all.deb ...
Unpacking postgresql-client (12+214ubuntu0.1) ...
Setting up libpq5:amd64 (12.20-0ubuntu0.20.04.1) ...
Setting up postgresql-client-12 (12.20-0ubuntu0.20.04.1) ...
update-alternatives: using /usr/share/postgresql/12/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-client (12+214ubuntu0.1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.16) ...
root@flexusx-7305:~# 

如上所示,软件包安装成功。

然后,通过 psql 客户端连接 pgvector,如下所示。

root@flexusx-7305:~# psql -h 0.0.0.0 -p 54333 -U pgvectorface
Password for user pgvectorface: 
psql (12.20 (Ubuntu 12.20-0ubuntu0.20.04.1), server 15.6 (Debian 15.6-1.pgdg120+2))
WARNING: psql major version 12, server major version 15.
         Some psql features might not work.
Type "help" for help.

pgvectorface=# 

 

3.3 pgvector 常见操作

用于 EXPLAIN ANALYZE 调试性能。

EXPLAIN ANALYZE SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

为了加快没有索引的查询,请增加max_parallel_workers_per_gather。

SET max_parallel_workers_per_gather = 4;

如果向量标准化为长度 1(如OpenAI 嵌入),则使用内积可获得最佳性能。

SELECT * FROM items ORDER BY embedding <#> '[3,1,2]' LIMIT 5;

 为了加快使用 IVFFlat 索引的查询速度,请增加倒排列表的数量(以牺牲召回率为代价)。

CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 1000);

清理 HNSW 索引可能需要一段时间。请先重新索引以加快速度。

REINDEX INDEX CONCURRENTLY index_name;
VACUUM table_name;

四、总结

通过在 云服务器Flexus X实例 上安装向量数据库 pgvector,展现了 云服务器Flexus X实例 的安全和稳定,在部署的过程中也非常顺利,能够快速实现部署,服务器使用很方便,并且 云服务器Flexus X实例 支持自定义配置系统盘规格及容量,支持多个不同类型的数据盘,赶紧用起来吧!


http://www.kler.cn/a/325218.html

相关文章:

  • 【原创】大数据治理入门(2)《提升数据质量:质量评估与改进策略》入门必看 高赞实用
  • upload-labs靶场练习
  • 企业级NoSQL数据库Redis
  • Linux使用SSH连接GitHub指南
  • 递归40题!再见递归
  • 某国际大型超市电商销售数据分析和可视化
  • Stable Diffusion零基础学习
  • 基于SpringBoot+Vue+MySQL的体育商城系统
  • Linux,C高级——day4
  • 【AI写作】解释 RESTful API,以及如何使用它构建 web 应用程序。
  • 基于nodejs+vue的水产品销售管理系统
  • 大厂面试真题:G1比CMS好在哪?一定好吗
  • CSR、SSR、SSG
  • 【STM32开发环境搭建】-3-STM32CubeMX Project Manager配置-自动生成一个Keil(MDK-ARM) 5的工程
  • 6.数据结构与算法-线性表的链式表示和实现-单链表
  • wireshark使用要点
  • 【STM32】江科大STM32笔记汇总(已完结)
  • Google BigTable架构详解
  • 无人驾驶车联网5G车载路由器应用
  • C++ 创建型设计模式
  • 怎么获取一个文件夹下的所有文件名?
  • MATLAB读取TIF文件,并可视化
  • 基于SpringBoot+Vue+MySQL的美食点餐管理系统
  • 项目集成SpringSecurity框架
  • Python项目Flask框架整合Redis
  • 揭秘移动硬盘RAW:原因、恢复策略与预防措施