当前位置：首页 > article >正文

Elasticsearch：在 Elastic 中玩转 DeepSeek R1 来实现 RAG 应用

article 2025/2/11 8:17:41

在这个春节，如一声春雷，DeepSeek R1 横空出世。现在人人都在谈论 DeepSeek R1。这个大语言模型无疑在中国及世界的人工智能发展史上留下了重要的里程碑。那么我们改如何结合 DeepSeek R1 及 Elasticsearch 来实现 RAG 呢？在之前的文章 “使用 Ollama 和 Kibana 在本地为 RAG 测试 DeepSeek R1” 里，我们详细描述了如何使用 Ollama 在本地为 RAG 测试 DeepSeek R1。在今天的文章中，我们将提供更为详细的步骤来进行展示。

安装

Elasticsearch 及 Kibana

如果你还没有安装好自己的 Elasticsearch 及 Kibana，那么我们可以参考如下的文章来进行安装：

如何在 Linux，MacOS 及 Windows 上进行安装 Elasticsearch
Kibana：如何在 Linux，MacOS 及 Windows 上安装 Elastic 栈中的 Kibana

特别值得注意的是，我们选择 “Elastic Stack 8.x 安装” 安装指南。在本次的练习中，我们将使用最新的 Elastic Stack 8.17.1。

我们记下上面的密码，并在下面的代码中进行使用。

另外，为了能够使得我们避免警告，我们在 Kibana 中针对 xpack.encryptedSavedObjects.encryptionKey 进行设置。这个也是我们需要使用 Playground 所必须的。详细布置也可以参考文章 “Elasticsearch：使用 Playground 与你的 PDF 聊天”。我们在 terminal 中打入如下的命令：

bin/kibana-encryption-keys generate

上述命令将生成如上所示的 3 个 keys。我们把上面的三个 keys 拷贝到 config/kibana.yml 文件的最底部，并保存。我们需要重新启动 Kibana。

启动白金试用功能

为了能够创建 OpenAI 连接器，我们需要打开白金版试用功能：

这样我们的白金版试用功能就设置好了。有了这个我们在下面就可以创建 OpenAI 的连接器了。

安装 ES 向量模型

在我们的搜索中，我们需要使用一个嵌入向量模型来针对数据进行向量化。在本次练习中，我们使用 ES。这也是 Elasticsearch 自带的模型。我们需要对它进行配置：

从上面的显示中，我们已经成功地把 .multilingual-e5-small 模型部署到我们的 Elasticsearch 中了。

我们可以在 Kibana 中进行查看：

GET _inference

我们可以看到一个叫做 .multilingual-e5-small-elasticsearch 的 inference id 已经生成。

创建一个 API key

在最下面的 Python 代码中，我们需要使用代码来访问。在这里，我们先创建一个 API key 来供之后的代码使用：

点击上面的拷贝按钮，并保存上面所生成的 API key 供下面的代码进行使用。

UkVCVDc1UUJCWFdEY29hdGhMdHc6WjRibTJZRlRTOGVZWDBQUkpPX0xRUQ==

安装 Ollama

在本文中，我们将自己部署 DeepSeek R1。当然你也可以直接在 DeepSeek 上去申请自己的开发者 key 而使用 DeepSeek 所提供的大语言模型。Ollama 是一种快速测试精选的本地推理开源模型集的好方法，也是 AI 开发人员的热门工具。我们可以参照文章 “Elasticsearch：使用在本地计算机上运行的 LLM 以及 Ollama 和 Langchain 构建 RAG 应用程序” 来完整 Ollama。在本文中，我们使用 docker 来进行安装。在 Docker 中启动和运行 Ollama 非常简单，只需执行：

mkdir ollama_deepseek
cd ollama_deepseek
mkdir ollama
docker run -d -v ./ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

$ mkdir ollama_deepseek
$ cd ollama_deepseek/
$ mkdir ollama
$ ls
ollama
$ docker run -d -v ./ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Unable to find image 'ollama/ollama:latest' locally
latest: Pulling from ollama/ollama
a186900671ab: Pull complete 
b0130a66c113: Pull complete 
16dfb65baac7: Pull complete 
03856fb3ee73: Pull complete 
898a890e221d: Pull complete 
db1a326c8c34: Pull complete 
Digest: sha256:7e672211886f8bd4448a98ed577e26c816b9e8b052112860564afaa2c105800e
Status: Downloaded newer image for ollama/ollama:latest
48375d1f61740d9acee49302903153834d6bc5412ce97d28d243585f133d77a2

注意：在启动上述命令之前，需要启动 docker desktop。

这将在当前目录中创建一个名为 “ollama” 的目录并将其挂载到容器内，以存储 Ollama 配置以及模型。根据使用的参数数量，它们可以从几 GB 到几十 GB 不等，因此请确保选择具有足够可用空间的卷。

注意：如果你的机器恰好有 Nvidia GPU，请确保安装 Nvidia 容器工具包并在上面的 docker run 命令中添加 “--gpus=all”。

在 Mac、Linux 或 Windows 上本地安装是利用你可能拥有的任何本地 GPU 功能的最简单方法，尤其是对于那些拥有 M 系列 Apple 芯片的用户。安装 Ollama 后，你可以使用以下命令下载并运行 DeepSeek R1。

docker exec -it ollama ollama pull deepseek-r1:7b

$ docker exec -it ollama ollama pull deepseek-r1:7b
pulling manifest 
pulling 96c415656d37... 100% ▕█████████████████████████████████████████▏ 4.7 GB                         
pulling 369ca498f347... 100% ▕█████████████████████████████████████████▏  387 B                         
pulling 6e4c38e1172f... 100% ▕█████████████████████████████████████████▏ 1.1 KB                         
pulling f4d24e9138dd... 100% ▕█████████████████████████████████████████▏  148 B                         
pulling 40fb844194b2... 100% ▕█████████████████████████████████████████▏  487 B                         
verifying sha256 digest 
writing manifest 
success

你可能需要将参数大小调整为适合你硬件的大小。可用大小可在此处找到。

你可以在终端中与模型聊天，但当你按 CTL+d 退出命令或输入 “/bye” 时，模型仍会继续运行。我们也可以采用如下的方式来下载模型：

docker exec -it ollama bash

$ docker exec -it ollama bash
root@48375d1f6174:/# ollama run deepseek-r1:7b

$ docker exec -it ollama bash
root@48375d1f6174:/# ollama ps
NAME    ID    SIZE    PROCESSOR    UNTIL 
root@48375d1f6174:/# ollama run deepseek-r1:7b
>>> what is Elasticsearch?
<think>

</think>

Elasticsearch is a open-source search engine built on top of the Elasticsearch framework. It is 
designed to quickly and easily discover and analyze information from structured or unstructured 
data. Elasticsearch provides a fast, scalable, and flexible platform for building search engines 
and real-time analytics systems.

Key features of Elasticsearch include:

1. **Real-time search**: Elasticsearch allows you to index documents in real-time and perform 
searches against those documents without waiting for the entire dataset to be indexed.

2. **Scalability**: Elasticsearch is designed to handle large volumes of data, both on-premises and 
in the cloud (using AWS Elasticsearch Service or Azure Elasticsearch managed service).

...

要查看仍在运行的模型，请输入：

ollama ps

root@48375d1f6174:/# ollama ps
NAME              ID              SIZE      PROCESSOR    UNTIL              
deepseek-r1:7b    0a8c26691023    5.5 GB    100% CPU     4 minutes from now

当然，我们也可以通过如下的方式在 terminal 中进行而不用进入到 container 中：

docker exec -it ollama ollama ps

$ docker exec -it ollama ollama ps
NAME              ID              SIZE      PROCESSOR    UNTIL              
deepseek-r1:7b    0a8c26691023    5.5 GB    100% CPU     3 minutes from now

从上面的显示输出中，我们可以看到 deepseek-r1:7b 的部署是成功的。我们还可以试一下它的中文能力：

从上面的输出中，我们可以看到 deepseek 还是非常强大的。就像和一些评论者评论的那样，很多回答中还掺有英文的单词。

使用 curl 测试我们的本地推理

很多开发者更喜欢使用 REST API 接口来测试我们的部署，这样它能更好地集成到我们的代码中去。要使用 curl 测试本地推理，你可以运行以下命令。我们使用 stream:false 以便我们可以轻松读取 JSON 叙述响应：

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:7b",
  "stream": false,
  "prompt":"Why is Elastic so cool?"
}'

根据本人测试，上述命令在发出后，需要等好长的时间才能得到响应，所以需要大家的耐心等待：

测试 “与 OpenAI 兼容” 的 Ollama 和 RAG 提示

方便的是，Ollama 还提供一个 REST 端点，模仿 OpenAI 的行为，以便与包括 Kibana 在内的各种工具兼容。

curl http://localhost:11434/v1/chat/completions -d '{
  "model": "deepseek-r1:7b",
  "stream": false,
  "messages": [
    { 
      "role": "system", 
      "content": "You are a helpful AI Assistant that uses the following context to answer questions only use the following context. \n\nContext:  小明今天早上和妈妈一起去上学 "},
    { "role": "user", 
      "content": "小明今天和谁一起去上学的？" 
    }
  ]
}'

测试这个更复杂的提示会产生一个内容，其中包含一个 <think> 部分，其中模型已经过训练可以推理该问题。

注意上面的 \u003c 是 < unicode，而 \u003e 是 > 的 unicode 表示。

将 Ollama 连接到 Kibana

创建连接到 DeepSeek 的连接器

我们按照如下的步骤来进行配置：

使用以下设置配置连接器

Connector name：Deepseek (Ollama)
选择 OpenAI provider：other (OpenAI Compatible Service)
URL：http://localhost:11434/v1/chat/completions
- 调整到你的 ollama 的正确路径。如果你从容器内调用，请记住替换 host.docker.internal 或等效项
默认模型：deepseek-r1:7b
API 密钥：编造一个，需要输入，但值无关紧要

请注意，在连接器设置中测试到 Ollama 的自定义连接器目前在 8.17 中出现故障，但已在即将推出的 Kibana 8.18 版本中修复。

我们的连接器如下所示：

这样我们就成功地创建了一个叫做 Deepseek (Ollama) 的 OpenAI 连接器。

将嵌入向量的数据导入 Elasticsearch

如果你已经熟悉 Playground 并设置了数据，则可以跳至下面的 Playground 步骤，但如果你需要一些快速测试数据，我们需要确保设置了 _inference API。从 8.17 开始，机器学习分配是动态的，因此要下载并打开 e5 多语言密集向量，我们只需在 Kiban Dev 工具中运行以下命令即可。

GET /_inference


POST /_inference/text_embedding/.multilingual-e5-small-elasticsearch
{
   "input": "are internet memes about deepseek sound investment advice?"
}

如果你还没有这样做，这将触发从 Elastic 的模型存储库下载 e5 模型。在上面的安装部分，我们已经成功地部署了 es 模型，所以执行下面的命令时，不会触发下载模型。我们从右边的输出中可以看到文字被转换后的向量表示。

接下来，让我们加载一本公共领域的书作为我们的 RAG 上下文。这是从 Project Gutenberg 下载 “爱丽丝梦游仙境” 的地方：链接。将其保存为 .txt 文件。

wget https://www.gutenberg.org/cache/epub/11/pg11.txt

$ pwd
/Users/liuxg/data/alice
$ wget https://www.gutenberg.org/cache/epub/11/pg11.txt
--2025-02-10 16:59:19--  https://www.gutenberg.org/cache/epub/11/pg11.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 174357 (170K) [text/plain]
Saving to: ‘pg11.txt’

pg11.txt                   100%[=====================================>] 170.27K   304KB/s    in 0.6s    

2025-02-10 16:59:21 (304 KB/s) - ‘pg11.txt’ saved [174357/174357]

等下载完文件后，我们可以使用如下的方法来上传文件到 Elasticsearch 中：

我们或者使用如下的方法：

我选择刚刚下好的文件：

当加载和推理完成后，我们就可以前往 Playground 了。

在 Playground 中测试 RAG

在 Kibana 中导航到 Elasticsearch > Playground。

在 Playground 屏幕上，你应该会看到一个绿色复选标记和 “LLM Connected”，表示连接器已存在。这是我们刚刚在上面创建的 Ollama 连接器。可以在此处找到 Playground 的详细指南。

单击蓝色的 Add data sources，然后选择我们之前创建的 book_alice 索引或你之前配置的其他索引，该索引使用推理 API 进行嵌入。

Deepseek 是一个具有强对齐特征的思维链模型。从 RAG 的角度来看，这既有好处也有坏处。思维链训练可能有助于 Deepseek 合理化引文中看似矛盾的陈述，但与训练知识的强一致性可能使其更喜欢自己的世界事实版本而不是我们的背景基础。虽然意图良好，但众所周知，这种强一致性使得 LLM 在讨论我们的私人知识收缩或未在训练数据集中得到很好体现的主题时难以指导。

在我们的 Playground 设置中，我们输入了以下系统提示 “You are an assistant for question-answering tasks using relevant text passages from the book Alice in wonderland - 你是使用《爱丽丝梦游仙境》一书中的相关文本段落进行问答任务的助手”，并接受其他默认设置。

对于 “Who was at the tea party? - 谁参加了茶话会？”这个问题，我们得到的答案是：“The March Hare, the Hatter, and the Dormouse were at the tea party. [Citation: position 1 and 2] - 答案：三月兔、帽匠和睡鼠参加了茶话会。[引用：位置 1 和 2]”，这是正确的。

Who was at the tea party?

系统提示是：

You are an assistant for question-answering tasks using relevant text passages from the book Alice in wonderland

从上面的输出中我们看到了我们希望的答案。我们可以详细查看原文，在如下的位置可以看到我们想要的答案：

我们可以从 <think> 标签中看出，Deepseek 确实对引文的内容进行了深思熟虑，以回答问题。

注意：在测试的过程中，依赖于你的部署，推理的时间可能会有不同，你需要耐心等待结果的输出！

我们接下来使用中文来进行测试。我们的测试问题是：

那些人在茶会上？

非常好！ Deepseek 给出了我们想要的中文答案。

使用代码来实现推理

从上面的展示中，我们可以看到 deepseek 具有很强的推理功能。我们可以使用 Playground 所提供的代码来进行测试。我们点击右上角的 View code：

我们点击上面的拷贝代码按钮，并把文件保存下来。我们把文件命名为 alice.py：

alice.py

## Install the required packages
## pip install -qU elasticsearch openai
import os
from elasticsearch import Elasticsearch
from openai import OpenAI
es_client = Elasticsearch(
    "<your-elasticsearch-url>",
    api_key=os.environ["ES_API_KEY"]
)
      
openai_client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
)
index_source_fields = {
    "book_alice": [
        "content"
    ]
}
def get_elasticsearch_results():
    es_query = {
        "retriever": {
            "standard": {
                "query": {
                    "nested": {
                        "path": "content.inference.chunks",
                        "query": {
                            "knn": {
                                "field": "content.inference.chunks.embeddings",
                                "query_vector_builder": {
                                    "text_embedding": {
                                        "model_id": ".multilingual-e5-small-elasticsearch",
                                        "model_text": query
                                    }
                                }
                            }
                        },
                        "inner_hits": {
                            "size": 2,
                            "name": "book_alice.content",
                            "_source": [
                                "content.inference.chunks.text"
                            ]
                        }
                    }
                }
            }
        },
        "size": 3
    }
    result = es_client.search(index="book_alice", body=es_query)
    return result["hits"]["hits"]
def create_openai_prompt(results):
    context = ""
    for hit in results:
        inner_hit_path = f"{hit['_index']}.{index_source_fields.get(hit['_index'])[0]}"
        ## For semantic_text matches, we need to extract the text from the inner_hits
        if 'inner_hits' in hit and inner_hit_path in hit['inner_hits']:
            context += '\n --- \n'.join(inner_hit['_source']['text'] for inner_hit in hit['inner_hits'][inner_hit_path]['hits']['hits'])
        else:
            source_field = index_source_fields.get(hit["_index"])[0]
            hit_context = hit["_source"][source_field]
            context += f"{hit_context}\n"
    prompt = f"""
  Instructions:
  
  - You are an assistant for question-answering tasks using relevant text passages from the book Alice in wonderland
  - Answer questions truthfully and factually using only the context presented.
  - If you don't know the answer, just say that you don't know, don't make up an answer.
  - You must always cite the document where the answer was extracted using inline academic citation style [], using the position.
  - Use markdown format for code examples.
  - You are correct, factual, precise, and reliable.
  
  Context:
  {context}
  
  """
    return prompt
def generate_openai_completion(user_prompt, question):
    response = openai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": user_prompt},
            {"role": "user", "content": question},
        ]
    )
    return response.choices[0].message.content
if __name__ == "__main__":
    question = "my question"
    elasticsearch_results = get_elasticsearch_results()
    context_prompt = create_openai_prompt(elasticsearch_results)
    openai_completion = generate_openai_completion(context_prompt, question)
    print(openai_completion)

为了能够使得这个代码能正常工作，我们需要做一些修改。在进行下面的修改之前，我们可以参考我之前的文章 “Elasticsearch：关于在 Python 中使用 Elasticsearch 你需要知道的一切 - 8.x”。我们在当前的目录下创建一个叫做 .env 的文件：

.env

$ pwd
/Users/liuxg/data/alice
$ code alice.py
$ touch .env

它的内容如下：

ELASTICSEARCH_URL="https://localhost:9200"
OPENAI_API_KEY="Anything"
ES_API_KEY="UkVCVDc1UUJCWFdEY29hdGhMdHc6WjRibTJZRlRTOGVZWDBQUkpPX0xRUQ=="
DEEPSEEK_URL="http://localhost:11434/v1"

上面的配置你需要根据自己的安装进行响应的修改。注意：由于我们的 deepseek 在本地部署，没有设置开发者 key，在上面的配置中，我们可以把 key 设置为任意值（除空值外，否则会有错误）。

同时，为了能够使得我们的 python 代码访问本地部署的 Elasticsearch 集群，我们必须拷贝集群的证书文件到当前的目录下：

 cp ~/elastic/elasticsearch-8.17.1/config/certs/http_ca.crt .

$ pwd
/Users/liuxg/data/alice
$ cp ~/elastic/elasticsearch-8.17.1/config/certs/http_ca.crt .
$ ls -al
total 368
drwxr-xr-x  6 liuxg  staff     192 Feb 10 18:17 .
drwxr-xr-x  5 liuxg  staff     160 Feb 10 16:54 ..
-rw-r--r--  1 liuxg  staff     134 Feb 10 18:15 .env
-rw-r--r--  1 liuxg  staff    3495 Feb 10 18:12 alice.py
-rw-r-----  1 liuxg  staff    1915 Feb 10 18:17 http_ca.crt
-rw-r--r--  1 liuxg  staff  174357 Feb  1 16:32 pg11.txt

接下来，我们需要安装我们所需要的 python 代码依赖包：

pip3 install elasticsearch python-dotenv openai

我们参考文章及链接来先测试一下我们的 openai 相兼容的端点是工作正常的。我们创建如下的一个测试应用：

test_ds_completion.py

from openai import OpenAI

client = OpenAI(api_key="ollama", base_url="http://localhost:11434/v1")

response = client.chat.completions.create(
    model="deepseek-r1:7b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Tell me about DeepSeek"},
    ],
    stream=False
)

print(response.choices[0].message.content)

我们运行上面的代码：

$ python test_ds_completion.py 
<think>

</think>

DeepSeek Artificial Intelligence Co., Ltd. (referred to as "DeepSeek" or "深度求索") , founded in 2023, is a Chinese company dedicated to making AGI a reality.

很显然我们的测试是成功的。

我们接下来修改上面从 Playground 拷贝的代码如下：

alice.py

## Install the required packages
## pip install -qU elasticsearch openai
import os
from dotenv import load_dotenv
from elasticsearch import Elasticsearch
from openai import OpenAI

load_dotenv()

ELASTICSEARCH_URL = os.getenv('ELASTICSEARCH_URL')
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
ES_API_KEY = os.getenv("ES_API_KEY")
DEEPSEEK_URL = os.getenv("DEEPSEEK_URL")

es_client = Elasticsearch(
    ELASTICSEARCH_URL,
    ca_certs="./http_ca.crt",
    api_key=ES_API_KEY,
    verify_certs = True
)

# resp = es_client.info()
# print(resp)
      
openai_client = OpenAI(
   api_key=OPENAI_API_KEY,
   base_url=DEEPSEEK_URL
)

index_source_fields = {
    "book_alice": [
        "content"
    ]
}

def get_elasticsearch_results(query):
    es_query = {
        "retriever": {
            "standard": {
                "query": {
                    "nested": {
                        "path": "content.inference.chunks",
                        "query": {
                            "knn": {
                                "field": "content.inference.chunks.embeddings",
                                "query_vector_builder": {
                                    "text_embedding": {
                                        "model_id": ".multilingual-e5-small-elasticsearch",
                                        "model_text": query
                                    }
                                }
                            }
                        },
                        "inner_hits": {
                            "size": 2,
                            "name": "book_alice.content",
                            "_source": [
                                "content.inference.chunks.text"
                            ]
                        }
                    }
                }
            }
        },
        "size": 3
    }
    result = es_client.search(index="book_alice", body=es_query)
    return result["hits"]["hits"]

def create_openai_prompt(results):
    context = ""
    for hit in results:
        inner_hit_path = f"{hit['_index']}.{index_source_fields.get(hit['_index'])[0]}"
        ## For semantic_text matches, we need to extract the text from the inner_hits
        if 'inner_hits' in hit and inner_hit_path in hit['inner_hits']:
            context += '\n --- \n'.join(inner_hit['_source']['text'] for inner_hit in hit['inner_hits'][inner_hit_path]['hits']['hits'])
        else:
            source_field = index_source_fields.get(hit["_index"])[0]
            hit_context = hit["_source"][source_field]
            context += f"{hit_context}\n"
    prompt = f"""
  Instructions:
  
  - You are an assistant for question-answering tasks using relevant text passages from the book Alice in wonderland
  - Answer questions truthfully and factually using only the context presented.
  - If you don't know the answer, just say that you don't know, don't make up an answer.
  - You must always cite the document where the answer was extracted using inline academic citation style [], using the position.
  - Use markdown format for code examples.
  - You are correct, factual, precise, and reliable.
  
  Context:
  {context}
  
  """
    return prompt

# def generate_openai_completion(user_prompt, question):
#     response = openai_client.chat.completions.create(
#         model="gpt-3.5-turbo",
#         messages=[
#             {"role": "system", "content": user_prompt},
#             {"role": "user", "content": question},
#         ]
#     )
#     return response.choices[0].message.content

def generate_openai_completion(user_prompt, question):
    response = openai_client.chat.completions.create(
        model="deepseek-r1:7b",
        messages=[
            {"role": "system", "content": user_prompt},
            {"role": "user", "content": question},
        ],
        stream=False
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    question = "Who was at the tea party?"
    elasticsearch_results = get_elasticsearch_results(question)
    context_prompt = create_openai_prompt(elasticsearch_results)
    openai_completion = generate_openai_completion(context_prompt, question)
    print(openai_completion)

我们运行上面的代码：