当前位置：首页 > article >正文

本地大模型编程实战(04)给文本自动打标签

article 2025/1/31 0:58:46

文章目录

- 准备
- 实例化本地大模型
- 情感分析
- 更精细的控制
- 总结
- 代码

使用本地大模型可以根据需要给文本打标签，本文介绍了如何基于 langchain 和本地部署的大模型给文本打标签。

本文使用 llama3.1 作为本地大模型，它的性能比非开源大模型要查一下，不过在我们可以调整提示词后，它也基本能达到要求。

准备

在正式开始撸代码之前，需要准备一下编程环境。

计算机
本文涉及的所有代码可以在没有显存的环境中执行。我使用的机器配置为：
- CPU: Intel i5-8400 2.80GHz
- 内存: 16GB
Visual Studio Code 和 venv
这是很受欢迎的开发工具，相关文章的代码可以在 Visual Studio Code 中开发和调试。我们用 python 的 venv 创建虚拟环境, 详见：
在Visual Studio Code中配置venv。
Ollama
在 Ollama 平台上部署本地大模型非常方便，基于此平台，我们可以让 langchain 使用 llama3.1、qwen2.5 等各种本地大模型。详见：
在langchian中使用本地部署的llama3.1大模型。

实例化本地大模型

from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3.1",temperature=0.2,verbose=True)

情感分析

下面的代码定义了一个类 Classification 用来限定大模型对文本打标签后的格式。大模型需要给文本打如下三个标签：

sentiment/情绪： positive/积极的，negative/消极的
aggressiveness/攻击性：以1-10代表
language/语言: 文本的语言

def simple_control(s):

    tagging_prompt = ChatPromptTemplate.from_template(
        """
    Extract the desired information from the following passage.

    Only extract the properties mentioned in the 'Classification' function.

    Passage:
    {input}
    """
    )

    # 指定 Pydantic 模型控制返回内容格式
    class Classification(BaseModel):
        sentiment: str = Field(description="The sentiment of the text")
        aggressiveness: int = Field(
            description="How aggressive the text is on a scale from 1 to 10"
        )
        language: str = Field(description="The language the text is written in")


    llm_structured = llm.with_structured_output(Classification)
    prompt = tagging_prompt.invoke({"input": s})
    response = llm_structured.invoke(prompt)

    return response.model_dump()

我们测试一下：

s = "I'm incredibly glad I met you! I think we'll be great friends!"
result = simple_control(s)
print(f'result:\n{result}')

{'sentiment': 'positive', 'aggressiveness': 1, 'language': 'English'}

s = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
result = simple_control(s)
print(f'result:\n{result}')

{'sentiment': 'negative', 'aggressiveness': 10, 'language': 'Spanish'}

更精细的控制

下面我们尝试对打标签的结果进行更加精细的控制：

sentiment/情绪： happy,neutral,sad 中的一种
aggressiveness/攻击性：以1-10代表
language/语言: English,Spanish,Chinese 中的一种

提示词不需要做改变，我们只是修改了 Classification 。

def finer_control(s):    
    """
    官网使用OpenAI，我们使用的是本地大模型。
    直接用官网的代码效果不好：sentiment无法按预期标记出happy,neutral,sad，依然只能标记出：positive、negative；aggressiveness的值一直为0。
    """

    # 指定 Pydantic 模型控制返回内容格式
    class Classification(BaseModel):
        sentiment: str = Field(description="The sentiment of the text,it must be one of happy,neutral,sad")
        aggressiveness: int = Field(description="The aggressive of the text,it must be one of 1,2,3,4,5,6,7,8,9,10,the higher the number the more aggressive")
        language: str = Field(description="The language the text is written in,it must be one of English,Spanish,Chinese")


    tagging_prompt = ChatPromptTemplate.from_template(
        """
        Extract the desired information from the following passage.

        Only extract the properties mentioned in the 'Classification' function.

        Passage:
        {input}
        """
    )


    llm_structured = llm.with_structured_output(Classification)

    prompt = tagging_prompt.invoke({"input": s})
    response = llm_structured.invoke(prompt)
    return response.model_dump()

我们来测试一下：

s = "I'm incredibly glad I met you! I think we'll be great friends!"
result = finer_control(s)
print(f'finer_control result:\n{result}')

{'sentiment': 'happy', 'aggressiveness': 1, 'language': 'English'}

s = "Weather is ok here, I can go outside without much more than a coat"
result = finer_control(s)
print(f'finer_control result:\n{result}')

{'sentiment': 'neutral', 'aggressiveness': 5, 'language': 'English'}

s="今天的天气糟透了，我什么都不想干！"
result = finer_control(s)
print(f'finer_control result:\n{result}')

{'sentiment': 'sad', 'aggressiveness': 10, 'language': 'Chinese'}

总结

我们可以看到，使用本地部署的 llama3.1 给文本打标签的能力还可以，我想这种本地部署方案可以解决一般的情感分析等给文本打标签的任务。

代码

本文涉及的所有代码以及相关资源都已经共享，参见：

github
gitee

参考:

Classify Text into Labels

🪐祝好运🪐

查看全文

http://www.kler.cn/a/522166.html

展示统计信息收集情况

DeepSeek大模型技术解析：从架构到应用的全面探索

HTML 标题

Flutter_学习记录_基本组件的使用记录

【javaweb项目idea版】蛋糕商城（可复用成其他商城项目）

数据分析系列--③RapidMiner算子说明及数据预处理

【Spring】Spring概述

寒假1.26

【深度学习】常见模型-Transformer模型

基于微信小程序游泳馆管理系统游泳馆管理系统小程序（设计与实现）

梯度下降优化算法-RMSProp

【源码+文档+调试讲解】基于Spring Boot的摇滚乐鉴赏网站的设计与实现

Git 出现 Please use your personal access token instead of the password 解决方法

发布 VectorTraits v3.1（支持 .NET 9.0，支持原生AOT）

基于微信小程序的助农扶贫系统设计与实现（LW+源码+讲解）

98.1 AI量化开发：长文本AI金融智能体(Qwen-Long)对金融研报大批量处理与智能分析的实战应用

高阶C语言|深入理解字符串函数和内存函数

【C++高并发服务器WebServer】-10：网络编程基础概述

寒假刷题Day16

Compose笔记(一)--LifecycleEventObserver

能量提升法三：赞美

设置jmeter外观颜色

EasyExcel写入和读取多个sheet

【景区导游——LCA】

《深入Python子域名扫描：解锁网络空间的隐藏宝藏》

CPP-存储区域

文章目录

准备

实例化本地大模型

情感分析

更精细的控制

总结

代码

相关文章：