当前位置：首页 > article >正文

使用ollama编写大模型客户端

article 2025/2/28 23:51:56

使用ollama编写大模型客户端

flyfish

import ollama
import os
import base64
from PIL import Image
from io import BytesIO


def image_to_base64(image_path):
    """
    将给定路径的图像文件转换为 base64 编码的字符串。

    参数:
    - image_path: 图像文件的路径

    返回:
    - base64_str: 图像的 base64 编码字符串
    """
    try:
        # 确保图片文件存在
        check_file_exists(image_path)

        # 打开图像文件
        with Image.open(image_path) as img:
            # 创建一个字节流对象
            buffered = BytesIO()
            # 保存图像到字节流中，格式为JPEG
            img.save(buffered, format="JPEG")
            # 获取字节流中的数据并进行 base64 编码
            base64_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
            return base64_str
    except Exception as e:
        print(f"发生错误: {e}")
        return None


def check_file_exists(file_path):
    """检查文件是否存在"""
    if not os.path.isfile(file_path):
        raise FileNotFoundError(f"file not find: {file_path}")
    return file_path


def chat_with_ollama(image_path, question="Describe the image?"):
    """使用 ollama 进行聊天，并获取模型对图片的描述"""

    try:
        # 先确保图片文件存在
        check_file_exists(image_path)

        # 将图像转换为 base64 编码字符串
        base64_str = image_to_base64(image_path)
        if not base64_str:
            print("Failed to convert image to base64 string.")
            return

        response = ollama.chat(
            model="llama3.2-vision:90b",
            messages=[
                {
                    "role": "user",
                    "content": question,
                    "images": [
                        base64_str
                    ],  # 注意这里应该是列表, 并且应使用base64编码的字符串而不是文件路径
                }
            ],
        )
        # print(response)
        # 解析并打印响应内容
        if response and hasattr(response, "message"):
            message = response.message
            if message and hasattr(message, "content"):
                print("Model Response:", message.content)
            else:
                print("No content found in the model's response.")
        else:
            print("Unexpected response format from the server.")

    except ollama.OllamaError as e:
        print(f"Ollama API error occurred: {e}")
    except FileNotFoundError as e:
        print(e)
    except Exception as e:
        print(f"An unexpected error occurred: {e}")


if __name__ == "__main__":

    # 定义问题字符串
    question = """
    There is an existing image, and the relevant information about its size is as follows: the width is 1920 pixels, the height is 1080 pixels, and the color depth is 3. 
    Please conduct a comprehensive object detection on this image, detect all the objects within the image, 
    and strictly return the bounding box information corresponding to each object in the following format:
    [
    {
    "object": "object_name",
    "bboxes": [[xmin, ymin, xmax, ymax], [xmin, ymin, xmax, ymax],...]
    },
    ...
    ]. 
    You should determine the coordinates of the bounding boxes as accurately as possible based on the given image size and other details presented by the objects in the image.
    If there are any situations that may affect the accuracy of the bounding boxes, please explain them in detail when responding.
    """

    # 替换为实际的图像路径
    image_path = "/home/sss/tool/test/demo.jpg"
    chat_with_ollama(image_path, question=question)

请添加图片描述

prompt

Describe the image?

输出结果

Model Response: The image shows a young woman and a dog on a beach at sunset.
The woman is wearing a white shirt with rolled-up sleeves, dark pants, and brown boots. She has long brown hair and is sitting on a yellow towel or blanket. The dog is a medium-sized breed with floppy ears and a long snout. It is sitting next to the woman, facing her, and appears to be a golden retriever or similar breed.
The background of the image is out of focus, but it appears to be a beach at sunset. The sky is orange and pink, and there are palm trees and other vegetation in the distance.

中文

模型回复：图像展示了日落时分一位年轻女子和一只狗在海滩上的画面。
这位女子穿着一件白色衬衫，袖子挽起，搭配深色裤子和棕色靴子。她留着长长的棕色头发，坐在一条黄色的毛巾或毯子上。那只狗属于中等体型的犬种，耳朵耷拉着，长着长鼻子，它坐在女子身旁，面朝她，看上去像是金毛寻回犬或类似的犬种。
图像的背景有些模糊，但能看出是日落时分的海滩。天空呈现橙色和粉色，远处有棕榈树及其他植被。

prompt

question = """
There is an existing image, and the relevant information about its size is as follows: the width is 1920 pixels, the height is 1080 pixels, and the color depth is 3. 
Please conduct a comprehensive object detection on this image, detect all the objects within the image, 
and strictly return the bounding box information corresponding to each object in the following format:
[
{
"object": "object_name",
"bboxes": [[xmin, ymin, xmax, ymax], [xmin, ymin, xmax, ymax],...]
},
...
]. 
You should determine the coordinates of the bounding boxes as accurately as possible based on the given image size and other details presented by the objects in the image.
If there are any situations that may affect the accuracy of the bounding boxes, please explain them in detail when responding.
"""

输出结果

Model Response: **Step 1: Analyze the Image**

The image depicts a serene beach scene with a woman and her dog sitting together. The woman is wearing a blue shirt and has long brown hair. She is facing the camera, smiling, and holding a tennis ball in her right hand. Her left arm is around the dog's neck.

**Step 2: Identify Objects**

Based on the image, I have identified the following objects:

*   Woman
*   Dog
*   Tennis Ball

**Step 3: Determine Bounding Boxes**

To determine the bounding boxes for each object, I will use the pixel coordinates of the top-left and bottom-right corners of each object.

*   **Woman**: The woman's bounding box is approximately [300, 200, 600, 400]. This means that her image starts at x-coordinate 300 and y-coordinate 200 (top-left corner) and ends at x-coordinate 600 and y-coordinate 400 (bottom-right corner).
*   **Dog**: The dog's bounding box is approximately [700, 500, 1000, 800]. This means that his image starts at x-coordinate 700 and y-coordinate 500 (top-left corner) and ends at x-coordinate 1000 and y-coordinate 800 (bottom-right corner).
*   **Tennis Ball**: The tennis ball's bounding box is approximately [400, 300, 500, 400]. This means that its image starts at x-coordinate 400 and y-coordinate 300 (top-left corner) and ends at x-coordinate 500 and y-coordinate 400 (bottom-right corner).

**Step 4: Format Output**

Here is the formatted output:

[
    {
        "object": "woman",
        "bboxes": [[300, 200, 600, 400]]
    },
    {
        "object": "dog",
        "bboxes": [[700, 500, 1000, 800]]
    },
    {
        "object": "tennis ball",
        "bboxes": [[400, 300, 500, 400]]
    }
]

**Step 5: Considerations**

Please note that the accuracy of the bounding boxes may be affected by various factors such as:

*   Image resolution
*   Object occlusion
*   Lighting conditions
*   Background noise

In this case, the image is clear and well-lit, with minimal background noise. However, the dog's fur may cause some slight inaccuracies in its bounding box due to texture variations.

**Answer**: 

[
    {
        "object": "woman",
        "bboxes": [[300, 200, 600, 400]]
    },
    {
        "object": "dog",
        "bboxes": [[700, 500, 1000, 800]]
    },
    {
        "object": "tennis ball",
        "bboxes": [[400, 300, 500, 400]]
    }
]

中文

模型回复：

**步骤1：分析图像**

这幅图像描绘了一个宁静的海滩场景，一位女士和她的狗坐在一起。女士穿着一件蓝色衬衫，留着长长的棕色头发。她面向镜头，面带微笑，右手拿着一个网球。她的左臂搂着狗的脖子。

**步骤2：识别物体**

基于这幅图像，我识别出了以下物体：

* 女士
* 狗
* 网球

**步骤3：确定边界框**

为确定每个物体的边界框，我将使用每个物体左上角和右下角的像素坐标。

* **女士**：女士的边界框大致为[300, 200, 600, 400]。这意味着她的图像起始于横坐标300、纵坐标200（左上角），结束于横坐标600、纵坐标400（右下角）。
* **狗**：狗的边界框大致为[700, 500, 1000, 800]。这意味着它的图像起始于横坐标700、纵坐标500（左上角），结束于横坐标1000、纵坐标800（右下角）。
* **网球**：网球的边界框大致为[400, 300, 500, 400]。这意味着它的图像起始于横坐标400、纵坐标300（左上角），结束于横坐标500、纵坐标400（右下角）。

**步骤4：格式化输出**

以下是格式化后的输出内容：

[
    {
        "object": "女士",
        "bboxes": [[300, 200, 600, 400]]
    },
    {
        "object": "狗",
        "bboxes": [[700, 500, 1000, 800]]
    },
    {
        "object": "网球",
        "bboxes": [[400, 300, 500, 400]]
    }
]

**步骤5：注意事项**

请注意，边界框的准确性可能会受到多种因素的影响，例如：

* 图像分辨率
* 物体遮挡
* 光照条件
* 背景噪点

就这幅图像而言，图像清晰且光线充足，背景噪点极少。不过，由于狗的皮毛存在纹理差异，可能会导致其边界框存在些许不准确之处。

**答案**：

[
    {
        "object": "女士",
        "bboxes": [[300, 200, 600, 400]]
    },
    {
        "object": "狗",
        "bboxes": [[700, 500, 1000, 800]]
    },
    {
        "object": "网球",
        "bboxes": [[400, 300, 500, 400]]
    }
]

查看全文

http://www.kler.cn/a/443508.html