当前位置：首页 > article >正文

深入浅出LangChain:从模型调用到Agents开发的全流程指南

article 2025/3/11 15:51:39

2024最新LangChain全面解析:从基础组件到AI应用构建

LangChain、LangGraph、LangSmith:打造完整AI解决方案的利器

本文将对于LangChain的基本组件、用途、用法进行介绍。

LangChain、LangGraph以及LangSmith的组合，极大的简化了开发者构建AI应用、Agents、Tools的工作量，抹平了各个AI厂家间的调用差异，适配了大量了中间件及组件，形成了一个完整的解决方案。

通过本文的阅读，可以帮助大家加深对于AI产品、功能点下底层原理及实现的理解。

本文将从最基本的models调用开始介绍，涵盖memory、chain、RAG，最终以tools的定义及agents的调用结束。

注：LangChain发展很快，本文截止于2024年8月23日，基于此时最新版本V0.2编写。

一、Chat Models & Memory demo

Langchain的第一个优势是对于各大API供应商、开源模型进行了适配，将杂乱的调用，整合为一个统一的标准，最终的效果就是Just Invoke It。

0.2版本支持的models如下：

https://python.langchain.com/v0.2/docs/integrations/chat/

支持大厂：

也支持小厂：

如下是一个实例创建的代码，建立之后可供调用的方法是统一的，当然不同API支持的方法的数量不同，其中支持最全面的还是openai。

#最简单的方式，将API key放到环境变量``from langchain_openai import ChatOpenAI``from dotenv import load_dotenv``# Load environment variables from .env file``load_dotenv()``llm = ChatOpenAI(model="gpt-4o-mini")``   ``# 定义一个构建函数，然后再调用，Gemini的案例。``from dotenv import load_dotenv``from langchain_google_genai import ChatGoogleGenerativeAI``# Load environment variables from .env``load_dotenv()``# Create a Gemini model``def google_model_init(model):`    `model = ChatGoogleGenerativeAI(`        `model=model,`        `temperature=0,`        `max_tokens=None,`        `timeout=None,`        `max_retries=2,`        `# other params...`    `)`    `return model``   ``import model_init``model = model_init.google_model_init("gemini-1.5-flash"）

同时还有一些本地部署的LLM不在支持的列表，只要将接口标准转化为openai的标准，也可以直接定义：

from langchain_openai import ChatOpenAI``   ``llm = ChatOpenAI(`    `model_name="your-model-name",  # 根据你的模型名称进行修改`    `openai_api_key="your-api-key",  # 你的API密钥`    `openai_api_base="https://your-custom-openai-api.com/v1",  # 你的自定义API基础URL`    `temperature=0.7,`    `max_tokens=100``)

memory部分

LangChain帮助大家完成了很多事项的开发，例如聊天记录的持久化，你无需花时间去进行各个不同中间件的适配，LangChain已经全部做过了。

支持的中间件非常多：

https://python.langchain.com/v0.2/docs/integrations/memory/

本文以mangodb为例，写了个demo：

https://python.langchain.com/v0.2/docs/integrations/memory/mongodb_chat_message_history/

使用docker快速拉起一个mangodb：

docker run -d --name mongodb -p 27017:27017 -e MONGO_INITDB_ROOT_USERNAME=langchain -e MONGO_INITDB_ROOT_PASSWORD=langchain mongo

**利用LangChain将聊天记录持久化至mangodb：**

model = model_init.google_model_init("gemini-1.5-flash")``   ``DATABASE_NAME = "langchain"``SESSION_ID = "user_session_new"  # 用户ID，此处写死``COLLECTION_NAME = "chat_history"``   ``# Initialize MongoDB Chat Message History``print("Initializing MongoDB Chat Message History...")``   ``chat_history = MongoDBChatMessageHistory(`    `session_id=SESSION_ID,`    `connection_string="mongodb://langchain:langchain@192.168.137.3:27017", #上边本地部署的连接，结合实际修改。`    `database_name=DATABASE_NAME,`    `collection_name=COLLECTION_NAME,``)``   ``print("Chat History Initialized.")``print("Current Chat History:", chat_history.messages)``   ``print("Start chatting with the AI. Type 'exit' to quit.")``   ``while True:`    `human_input = input("User: ")`    `if human_input.lower() == "exit":`        `break``   `    `chat_history.add_user_message(human_input)`    `ai_response = model.invoke(chat_history.messages)`    `chat_history.add_ai_message(ai_response.content)``   `    `print(f"AI: {ai_response.content}")

chat过程的记录按照LangChain的标准存放于mongodb，相当的省心。

二、Prompt Templates

Prompt 的质量直接影响到 LLM 反馈结果的优劣。因此，将关键的 Prompt 进行结构化处理，并将可变部分留给程序填充，是确保 Prompt 质量的有效方法。

目前（以及未来），LLM 对英语的支持，将领先于所有其他语言。观察一些国内的项目，可以发现其核心 Prompt 也是以英语编写。因此，建议大家在未来逐步习惯使用英语的 Prompt。

Prompt Templates的定义很简单，此处将几种常见的情况列出，熟悉使用即可。

#单个替换``template = "Tell me a joke about {topic}."``prompt_template = ChatPromptTemplate.from_template(template)``prompt = prompt_template.invoke({"topic": "cats"})``   ``#多个替换``template_multiple = """You are a helpful assistant.``Human: Tell me a {adjective} short story about a {animal}.``Assistant:"""``prompt_multiple = ChatPromptTemplate.from_template(template_multiple)``prompt = prompt_multiple.invoke({"adjective": "funny", "animal": "panda"})``   ``#多行下Tuple的替换``messages = [`    `("system", "You are a comedian who tells jokes about {topic}."),`    `("human", "Tell me {joke_count} jokes."),``]``prompt_template = ChatPromptTemplate.from_messages(messages)``prompt = prompt_template.invoke({"topic": "lawyers", "joke_count": 3})``三、How to use Chains``LangChain定义了一种LangChain Expression Language(LCEL)，来简化书写，完成多个LLM的任务的执行，简洁 且 优雅。``3.1 LCEL``chain = prompt | model``result = chain.invoke({"key":"value"})``# Define prompt templates (no need for separate Runnable chains)``prompt_template = ChatPromptTemplate.from_messages(`    `[`        `("system", "You are a comedian who tells jokes about {topic}."),`        `("human", "Tell me {joke_count} jokes."),`    `]``)``   ``# Create the combined chain using LangChain Expression Language (LCEL)``chain = prompt_template | model | StrOutputParser()``# chain = prompt_template | model``   ``# Run the chain``result = chain.invoke({"topic": "cars", "joke_count": 3})``

3.2 how chains work,under the hood

对于chain而言，我们大致了解其底层实现，3 main things：

runnables：Runnables是LangChain中的基本构建块。它们是可以执行某些操作的对象，通常接受输入并产生输出。Runnables可以是简单的函数、复杂的模型或其他任何可以处理数据的组件。它们的关键特征是可以被"运行"，即给定输入后能够产生输出。
runnable lambdas：Runnable Lambdas是一种特殊类型的Runnable，它们通常是简单的、匿名的函数。在LangChain中，你可以使用lambda函数来快速定义简单的操作，这些操作可以轻松地集成到更大的处理流程中。Runnable Lambdas提供了一种灵活且简洁的方式来定义自定义的数据处理步骤。
runnable sequences：Runnable Sequences是将多个Runnables组合在一起的方式。它允许你创建一个处理流水线，其中一个Runnable的输出可以作为下一个Runnable的输入。这种序列化的方法使得创建复杂的处理链变得简单，每个步骤都可以独立定义和测试，然后组合成一个完整的工作流程。

如下是一个事例，实际我们使用中，直接使用LCEL就可以了。如下的代码仅仅用于参考，理解背后的原理：

# Create individual runnables (steps in the chain)``# **x是Python中的解包操作符，用于字典。它的作用是将字典x中的所有键值对作为单独的关键字参数传递给函数。``format_prompt = RunnableLambda(lambda x: prompt_template.format_prompt(**x))``invoke_model = RunnableLambda(lambda x: model.invoke(x.to_messages()))``parse_output = RunnableLambda(lambda x: x.content)``# Create the RunnableSequence (equivalent to the LCEL chain)``chain = RunnableSequence(first=format_prompt, middle=[invoke_model], last=parse_output)``

3.2 Chain的三种运行模式：

Extended：串连执行
Parallel：并行执行
Branching：判断执行

chain module/extended

我们可以自己利用lambda函数，写一些runnable lambda，随后可以按照LCEL加入到chain中进行串联运行。这个的好处在于，你可以把你任何想做的事情，嵌入到一个lambda函数中，例如，嵌入一个API call。

# Define prompt templates``prompt_template = ChatPromptTemplate.from_messages(`    `[`        `("system", "You are a comedian who tells jokes about {topic}."),`        `("human", "Tell me {joke_count} jokes."),`    `]``)``   ``# Define additional processing steps using RunnableLambda``uppercase_output = RunnableLambda(lambda x: x.upper())``count_words = RunnableLambda(lambda x: f"Word count: {len(x.split())}\n{x}")``   ``# Create the combined chain using LangChain Expression Language (LCEL)``chain = prompt_template | model | StrOutputParser() | uppercase_output | count_words``   ``# Run the chain``result = chain.invoke({"topic": "lawyers", "joke_count": 3})

chain module/parallel

Langchian提供了并行运行的功能，可以在LCEL进行调用。之所以需要并行，我的理解是，LLM本身就存在token生成的过程，相比传统的数据库查询，慢了不是一个数量级。因此，当多个任务进行串行时，等待的时间势必会进一步拉长，因此在这种场景下，parallel就是很刚需的了。

以下是一个调用方法的简单的示例：

# Simplify branches with LCEL``pros_branch_chain = (`    `RunnableLambda(lambda x: analyze_pros(x)) | model | StrOutputParser()``)``   ``cons_branch_chain = (`    `RunnableLambda(lambda x: analyze_cons(x)) | model | StrOutputParser()``)``   ``# Create the combined chain using LangChain Expression Language (LCEL)``chain = (`    `prompt_template`    `| model`    `| StrOutputParser()`    `| RunnableParallel(branches={"pros": pros_branch_chain, "cons": cons_branch_chain})`    `| RunnableLambda(lambda x: combine_pros_cons(x["branches"]["pros"], x["branches"]["cons"]))``)

chain module/branching

这个就是Langchian的if语句，按照不同的结果，执行不同的branch。

最常见的案例就是，对于客户的评价进行分类，按照不同的分类使用不同的prompt进行处理。

案例：

# Define the feedback classification template``classification_template = ChatPromptTemplate.from_messages(`    `[`        `("system", "You are a helpful assistant."),`        `("human","Classify the sentiment of this feedback as positive, negative, neutral, or escalate: {feedback}."),`    `]``)``   ``# Define the runnable branches for handling feedback``branches = RunnableBranch(`    `(`        `lambda x: "positive" in x,`        `positive_feedback_template | model | StrOutputParser()  # 正面反馈chain`    `),`    `(`        `lambda x: "negative" in x,`        `negative_feedback_template | model | StrOutputParser()  # 负面反馈chain`    `),`    `(`        `lambda x: "neutral" in x,`        `neutral_feedback_template | model | StrOutputParser()  # 普通反馈chain`    `),`    `escalate_feedback_template | model | StrOutputParser()     # 搞不定的chain``)``   ``# Create the classification chain``classification_chain = classification_template | model | StrOutputParser()``   ``# Combine classification and response generation into one chain``chain = classification_chain | branches

四、RAG (Retrieval-Augmented Generation)

4.1 What’s RAG

RAG（检索增强生成）技术目前使用范围非常广泛，一般的介绍强调RAG为LLM增加了获取数据的能力，实现了LLM与现实世界（如Web和API）的连接，并增强了LLM获取私有知识库的能力。我倒是觉得，RAG更像是一种高级的检索方式，实现了基于语义理解的搜索。

传统搜索引擎主要基于关键词匹配，而RAG利用LLM的语义理解能力，将检索从单纯的词语匹配提升到了语义层面的相似性搜索。这种转变使得检索结果更加精确和相关。并且有了多模态的加持之后，可以进一步实现图片、语音、视频的检索。

4.2 文字RAG的原理

当我们使用RAG时，本质其实是在原有问题的基础上，进一步的加上我们根据语义所检索回来的信息，最终拼凑出来一个完整的prompt给LLM。因此对于一个知识库而言，是必须提前做切分的，做成小块的chunks，再对于chunks做embedding，最终再存放至向量数据库。

不同的模型，最大的上下文窗口的大小差异较大，我们最常用的GPT-4，实际上只有8000左右的token数。实际的chunk切分过程中，可以切分成1000-2000 tokens的chunks，最终3个chunk加原始的问题，基本上就足够了。对于其他的模型而言，可以结合实际情况测试验证。

文本分块：将长文本切分成较小的chunk（通常约1000-2000tokens），以适应LLM的输入窗口限制（如ChatGPT约8000 tokens）。
文本嵌入：使用嵌入模型将文本转换为向量表示。语义相近的词汇在向量空间中距离较近，这使得基于相似性的搜索成为可能。
问题嵌入：用户的问题也会被转换为向量表示。
相似度检索：通过比较问题向量和文本chunk向量的相似度，找出最相关的内容。
向量存储：使用如Chroma等向量数据库存储和检索这些向量。

常见厂家模型的窗口大小：

公司	模型名称	最大上下文窗口大小 (tokens)
OpenAI	GPT-3 (davinci)	4,096
OpenAI	GPT-3.5-turbo	4,096
OpenAI	GPT-4 (8K context)	8,192
Google	Gemini Pro	32,768
Google	Gemini Ultra	1,048,576
Anthropic	Claude	9,000
Anthropic	Claude Instant	9,000