当前位置: 首页 > article >正文

Question mutiple pdf‘s using openai, pinecone, langchain

题意:使用 OpenAI、Pinecone 和 LangChain 对多个 PDF 文件进行提问。

问题背景:

I am trying to ask questions against a multiple pdf using pinecone and openAI but I dont know how to.

我正在尝试使用 Pinecone 和 OpenAI 对多个 PDF 文件进行提问,但我不知道该怎么做。

The code below works for asking questions against one document. but I would like to have multiple documents to ask questions against:

下面的代码可以用于对一个文档进行提问,但我想要能够对多个文档提问:

# process_message.py
from flask import request
import pinecone
# from PyPDF2 import PdfReader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
import os
import json
# from constants.company import file_company_id_column, file_location_column, file_name_column
from services.files import FileFireStorage
from middleware.auth import check_authorization
import configparser
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter


def process_message():
    
    # Create a ConfigParser object and read the config.ini file
    config = configparser.ConfigParser()
    config.read('config.ini')
    # Retrieve the value of OPENAI_API_KEY
    openai_key = config.get('openai', 'OPENAI_API_KEY')
    pinecone_env_key = config.get('pinecone', 'PINECONE_ENVIRONMENT')
    pinecone_api_key = config.get('pinecone', 'PINECONE_API_KEY')


    loader = PyPDFLoader("docs/ops.pdf")
    data = loader.load()
    # data = body['data'][1]['name']
    # Print information about the loaded data
    print(f"You have {len(data)} document(s) in your data")
    print(f"There are {len(data[30].page_content)} characters in your document")

    # Chunk your data up into smaller documents
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
    texts = text_splitter.split_documents(data)
   

    embeddings = OpenAIEmbeddings(openai_api_key=openai_key)

    pinecone.init(api_key=pinecone_api_key, environment=pinecone_env_key)
    index_name = "pdf-chatbot"  # Put in the name of your Pinecone index here

    docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
    # Query those docs to get your answer back
    llm = OpenAI(temperature=0, openai_api_key=openai_key)
    chain = load_qa_chain(llm, chain_type="stuff")

    query = "Are there any other documents listed in this document?"
    docs = docsearch.similarity_search(query)
    answer = chain.run(input_documents=docs, question=query)
    print(answer)

    return answer

I added as many comments as I could there. I got this information from 

我在代码中添加了尽可能多的注释。我从以下来源获取了这些信息:https://www.youtube.com/watch?v=h0DHDp1FbmQ

I tried to look at other stackoverflow questions about this but could not find anything similar

我试图查看其他与此相关的 Stack Overflow 问题,但没有找到类似的内容。

问题解决:

You can load multiple PDFS with PyPDFDirectoryLoader

你可以使用 `PyPDFDirectoryLoader` 加载多个 PDF 文件。


http://www.kler.cn/a/286288.html

相关文章:

  • 使用where子句筛选记录
  • jinfo命令详解
  • 事务03之MVCC机制
  • JAVA(SpringBoot)集成Kafka实现消息发送和接收。
  • 【Numpy核心编程攻略:Python数据处理、分析详解与科学计算】1.30 性能巅峰:NumPy代码优化全攻略
  • Leetcode刷题-不定长滑动窗口
  • [pytorch] --- pytorch基础之transforms
  • Python算法L2:排序算法(详细版)
  • 前端提高Web/App/小程序开发效率的工具
  • CSS 的值与单位——WEB开发系列21
  • 【高阶数据结构】图的应用--最小生成树
  • 考研系列-408真题数据结构篇(10-17)
  • 003-LoadBalancer负载均衡服务调用
  • 钉钉-即时通讯-工作通知
  • 【ragflow】安装2:源码安装依赖
  • NVI技术创新联盟成立,BOSMA博冠IP轻量化制播已运用
  • 计算机毕业设计选题推荐-传统文化网站-Java/Python项目实战
  • 【Hot100】LeetCode—74. 搜索二维矩阵
  • SpringBoot——请求响应(简单参数、实体参数、数组集合参数、日期参数、JSON参数、路径参数、统一响应结果)
  • MySQL——事务与存储过程(一)事务管理(2)事务的提交
  • 商圣集团:数字创新,引领智慧生活新篇章
  • IM即时通讯软件,企业即时通讯系统就选WorkPlus
  • Unet改进17:添加ShuffleAttention||减少冗余计算和同时存储访问
  • 布偶猫应该怎么喂?希喂、交响乐金罐、尾巴生活彩虹泥适合布偶猫吗?
  • 将vue项目打包为安卓软件
  • 二元分类逻辑回归python代码实现