当前位置: 首页 > article >正文

【python qdrant 向量数据库 完整示例代码】

测试一下python版本的dqrant向量数据库的效果,完整代码如下:

安装库

!pip install qdrant-client>=1.1.1
!pip install -U sentence-transformers

导入

from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

encoder = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

准备测试数据集

documents = [
    {
        "name": "The Time Machine",
        "description": "A man travels through time and witnesses the evolution of humanity."
        * 8,
        "author": "H.G. Wells",
        "year": 1895,
    },
    {
        "name": "Ender's Game",
        "description": "A young boy is trained to become a military leader in a war against an alien race."
        * 4,
        "author": "Orson Scott Card",
        "year": 1985,
    },
    {
        "name": "Brave New World",
        "description": "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy."
        * 6,
        "author": "Aldous Huxley",
        "year": 1932,
    },
] * 50000

print(len(documents))

创建存储库

qdrant = QdrantClient(":memory:")  # 内存中
# qdrant = QdrantClient(path='./qdrant')  # 存储到本地

在数据库中创建一个collection(类似一个存储桶)

qdrant.recreate_collection(
    collection_name="my_books",
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(),  # Vector size is defined by used model
        distance=models.Distance.COSINE,
    ),
)

对文档进行向量化

import hashlib
from tqdm import tqdm

def sha256(text):

    hash_object = hashlib.sha256()
    hash_object.update(text.encode("utf-8"))
    hash_value = hash_object.hexdigest()
    return hash_value

records = []
bs = 256
for i in tqdm(range(0, len(documents), bs)):
    docs = documents[i : i + bs]
    vectors = encoder.encode(
        [doc["description"] for doc in docs], normalize_embeddings=True
    ).tolist()

    record = [
        models.Record(id=idx, vector=vec, payload=doc)  # sha256(doc['description'])
        for idx, vec, doc in zip(range(i, i + bs), vectors, docs)
    ]

    records.extend(record)

上传到向量数据库中指定的collection

qdrant.upload_points(
    collection_name="my_books", points=records, batch_size=128, parallel=12
)

语义搜索

query = "Aliens attack our planet"
hits = qdrant.search(
    collection_name="my_books",
    query_vector=encoder.encode(query).tolist(),
    limit=6,
)
for hit in hits:
    print(hit.payload, "score:", hit.score)

条件搜索

search only for books from 21st century

hits = qdrant.search(
    collection_name="my_books",
    query_vector=encoder.encode("Tyranic society").tolist(),
    query_filter=models.Filter(
        must=[models.FieldCondition(key="year", range=models.Range(gte=1980))]
    ),
    limit=3,
)
for hit in hits:
    print(hit.payload, "score:", hit.score)

参考官方GitHub

github

colab


http://www.kler.cn/news/321200.html

相关文章:

  • Centos7 docker 自动补全命令
  • js 接力导出
  • 双token无感刷新
  • AI大语言模型的全面解读
  • 828华为云征文|使用Flexus X实例安装宝塔面板教学
  • 1 elasticsearch安装
  • 什么是开放式耳机?具有什么特色?非常值得入手的蓝牙耳机推荐
  • 【C++位图】构建灵活的空间效率工具
  • 计算机毕业设计选题推荐-基于python的养老院数据可视化分析
  • R18 NES 之SSB-less SCell operation for inter-band CA
  • 基于vue框架的宠物寻回小程序8g7el(程序+源码+数据库+调试部署+开发环境)系统界面在最后面。
  • MATLAB系列09:图形句柄
  • 论文解读《Object-Centric Learning with Slot Attention》
  • 网络模型的保存与读取
  • Testbench编写与Vivado Simulator的基本操作
  • 如何快速免费搭建自己的Docker私有镜像源来解决Docker无法拉取镜像的问题(搭建私有镜像源解决群晖Docker获取注册表失败的问题)
  • 解决SVN蓝色问号的问题
  • 线性基学习DAY2
  • Kafka 面试题
  • 一个证明-待验证
  • 平衡、软技能与持续学习
  • pdf编辑转换器怎么用?分享9个pdf编辑、转换方法(纯干货)
  • 基于深度学习的药品三期OCR字符识别
  • 生成式语言模型底层技术面试
  • 修改Docker默认存储路径,解决系统盘占用90%+问题(修改docker root dir)
  • 【笔记】数据结构|链表算法总结|快慢指针场景和解决方案|链表归并算法和插入算法|2012 42
  • 共享单车轨迹数据分析:以厦门市共享单车数据为例(八)
  • 爬虫过程 | 蜘蛛程序爬取数据流程(初学者适用)
  • P335_0334韩顺平Java_零钱通介绍
  • 华为NAT ALG技术的实现