当前位置：首页 > article >正文

如何提升RAG系统整体效果：从索引构建-问句理解-混合搜索+语义排序着手，评估系统

article 2025/3/1 11:58:31

如何提升RAG系统整体效果：从索引构建-问句理解-混合搜索+语义排序着手，评估系统

1.Query understanding

1.1 构建数据索引

一个主要的问题是， query 的 embedding 与文档的 embedding 在向量空间并没有对齐。改善这种情况的常见方法是从文档中提取信息并用它来回答问题。可以对文档提取、总结和生成潜在问题以改进的 embedding 匹配的问题。

例如，给定一个文档时，可以尝试：

提取 keywords 和 topics
生成HyDE-假设性问句：针对每个chunk级别生成，关键句–提炼”抽象“点金句子
生成摘要

class Extraction(BaseModel):
    topic: str
    summary: str
    hypothetical_questions: List[str] = Field(
        default_factory=list,
        description="Hypothetical questions that this document could answer",
    )
    keywords: List[str] = Field(
        default_

查看全文

http://www.kler.cn/a/471212.html