当前位置: 首页 > article >正文

【大语言模型_1】VLLM部署Qwen模型

1、模型下载:

              魔塔社区:魔搭社区

              huggingface:https://huggingface.co/Qwen

2、安装python环境

             1、python官网安装python 【推荐要3.8以上版本】

             2、安装vllm模块

3、启动模型

      

CUDA_VISIBLE_DEVICES=0,1 /root/vendor/Python3.10.12/bin/python3.10 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 25010 --served-model-name mymodel --model //root/qwen2.5/qwen2.5-coder-7b-instruct/ --tensor-parallel-size 2 --max-model-len 8096

出现以下内容代表运行成功

INFO 09-20 15:22:59 model_runner.py:1335] Graph capturing finished in 11 secs.
(VllmWorkerProcess pid=101403) INFO 09-20 15:22:59 model_runner.py:1335] Graph capturing finished in 11 secs.
INFO 09-20 15:22:59 api_server.py:224] vLLM to use /tmp/tmplc42ak3s as PROMETHEUS_MULTIPROC_DIR
WARNING 09-20 15:22:59 serving_embedding.py:190] embedding_mode is False. Embedding API will not work.
INFO 09-20 15:22:59 launcher.py:20] Available routes are:
INFO 09-20 15:22:59 launcher.py:28] Route: /openapi.json, Methods: HEAD, GET
INFO 09-20 15:22:59 launcher.py:28] Route: /docs, Methods: HEAD, GET
INFO 09-20 15:22:59 launcher.py:28] Route: /docs/oauth2-redirect, Methods: HEAD, GET
INFO 09-20 15:22:59 launcher.py:28] Route: /redoc, Methods: HEAD, GET
INFO 09-20 15:22:59 launcher.py:28] Route: /health, Methods: GET
INFO 09-20 15:22:59 launcher.py:28] Route: /tokenize, Methods: POST
INFO 09-20 15:22:59 launcher.py:28] Route: /detokenize, Methods: POST
INFO 09-20 15:22:59 launcher.py:28] Route: /v1/models, Methods: GET
INFO 09-20 15:22:59 launcher.py:28] Route: /version, Methods: GET
INFO 09-20 15:22:59 launcher.py:28] Route: /v1/chat/completions, Methods: POST
INFO 09-20 15:22:59 launcher.py:28] Route: /v1/completions, Methods: POST
INFO 09-20 15:22:59 launcher.py:28] Route: /v1/embeddings, Methods: POST
INFO 09-20 15:22:59 launcher.py:33] Launching Uvicorn with --limit_concurrency 32765. To avoid this limit at the expense of performance run with --disable-frontend-multiprocessing
INFO:     Started server process [101179]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:25010

4、利用python脚本调用测试

from openai import OpenAI

# 初始化客户端
client = OpenAI(base_url="http://localhost:25010/v1", api_key="EMPTY")

print("欢迎使用Qwen智能问答机器人!输入'退出'以结束对话。")

while True:
    # 获取用户输入
    print("您: ", end='', flush=True)
    user_input = input()
    if user_input.lower() in ['退出', '再见', '拜拜']:
        print("qwen: 再见!期待下次与您交谈。")
        break

    # 构造消息列表
    messages = [
        {"role": "system", "content": "你的角色是名为“qwen”的智能问答机器人"},
        {"role": "user", "content": user_input}
    ]

    try:
        # 发送请求并获取回复
        chat_completion = client.chat.completions.create(
            model="mymodel",
            messages=messages,
            #stop=[ "。"],
            stop=["<|endoftext|>", "<|im_end|>", "<|im_start|>"],
            stream = False,
        )

        # 打印模型回复
        print("qwen:", chat_completion.choices[0].message.content)

    except Exception as e:
        print("出现错误: {e}",e)
        print("请稍后再试或检查您的网络连接及API配置。")


http://www.kler.cn/a/318172.html

相关文章:

  • Linux服务器定时执行jar重启命令
  • uni-app表格带分页,后端处理过每页显示多少条
  • LeetCode面试经典150题C++实现,更新中
  • MaxKB
  • 鸿蒙之多选框(Checkbox)
  • idea 弹窗 delete remote branch origin/develop-deploy
  • 【速成Redis】04 Redis 概念扫盲:事务、持久化、主从复制、哨兵模式
  • 2-102基于matlab的蒙特卡洛仿真
  • C语言——文件操作
  • [数据结构]动态顺序表的实现与应用
  • 第二证券:“产业+科技” 中国并购重组市场持续升温
  • 【微服务即时通讯系统】——etcd一致性键值存储系统,etcd的介绍,etcd的安装,etcd使用和功能测试
  • Scikit-learn 识别手写数字
  • Qt:NULL与nullptr的区别(手写nullptr)
  • 数据处理与统计分析篇-day10-Matplotlib数据可视化
  • Leetcode 每日一题:Diameter of Binary Tree
  • DataWhale X 南瓜书学习笔记 task03笔记
  • vue3+Element-plus el-input 输入框组件二次封装(支持金额、整数、电话、小数、身份证、小数点位数控制,金额显示中文提示等功能)
  • rust属性宏
  • HTML段落,换行,水平线标签与其属性
  • c/c++八股文
  • MySQL 生产环境性能优化
  • 使用分布式调度框架时需要考虑的问题——详解
  • python 实现 P-Series algorithm算法
  • Seamless:Facebook推出的跨语言语音识别/翻译/合成大模型
  • 计算总体方差statistics.pvariance()