当前位置：首页 > article >正文

打造你的第一个AI Agent：从需求分析到架构设计

article 2025/1/23 5:42:58

前面几篇文章，我们讨论了 AI Agent 的概念和技术选型。今天，我想和大家分享如何从零开始打造一个 AI Agent。我会用一个实际的项目案例，带大家走一遍完整的开发流程。

项目背景

事情要从一个月前说起。那天我正在整理自己的笔记库，突然发现一个痛点：我的笔记散落在各个工具里（Notion、飞书、本地 Markdown），想找一个知识点经常要翻好几个地方。

于是我就想：能不能做一个 AI 助手，帮我管理和查询这些笔记呢？需求大概是这样的：

能统一检索多个来源的笔记
理解我的问题，给出准确的答案
可以帮我总结和归纳知识点
最好还能帮我发现知识盲点

看起来需求不算复杂，但要做好还真不容易。

需求分析

首先，我列了个需求清单，按优先级排序：

必要功能（MVP）：

统一检索
- 支持 Markdown 文件
- 支持 Notion API
- 支持飞书云文档
智能问答
- 理解自然语言问题
- 从相关笔记中提取答案
- 给出引用来源
知识总结
- 按主题汇总相关笔记
- 生成知识图谱
- 提供学习建议

进阶功能（后期优化）：

知识更新
- 定期同步各个来源
- 检测知识点更新
- 提醒复习过期内容
个性化推荐
- 学习进度跟踪
- 知识盲点分析
- 个性化学习建议

架构设计

有了需求清单，接下来就是设计系统架构。我的思路是这样的：

1. 整体架构

┌─────────────────┐
│    Web UI       │
└────────┬────────┘
         │
┌────────┴────────┐
│   API Server    │
└────────┬────────┘
         │
    ┌────┴────┐
    │  Core   │
    └────┬────┘
         │
┌────────┴────────┐
│  Knowledge Base │
└────────┬────────┘
         │
    ┌────┴────┐
│ Data Sources │
└─────────────┘

2. 核心模块

class KnowledgeAgent:
    def __init__(self):
        # 初始化知识库
        self.knowledge_base = KnowledgeBase([
            MarkdownSource("./notes"),
            NotionSource(notion_token),
            LarkSource(lark_token)
        ])

        # 初始化 LLM
        self.llm = ChatOpenAI(
            model="gpt-4",
            temperature=0.7
        )

        # 初始化向量数据库
        self.vector_store = QdrantClient()

    async def query(self, question: str) -> Answer:
        # 1. 向量检索相关内容
        relevant_docs = await self.vector_store.search(
            question,
            limit=5
        )

        # 2. 生成回答
        answer = await self.llm.generate_answer(
            question=question,
            context=relevant_docs
        )

        # 3. 添加引用源
        answer.add_references(relevant_docs)

        return answer

    async def summarize(
        self,
        topic: str
    ) -> KnowledgeGraph:
        # 1. 检索主题相关内容
        docs = await self.knowledge_base.search(topic)

        # 2. 生成知识图谱
        graph = await self.llm.generate_knowledge_graph(
            docs=docs,
            topic=topic
        )

        # 3. 添加学习建议
        suggestions = await self.generate_suggestions(
            graph=graph,
            user_history=self.user_history
        )

        return KnowledgeGraph(
            nodes=graph.nodes,
            edges=graph.edges,
            suggestions=suggestions
        )

3. 数据模型

@dataclass
class Note:
    id: str
    content: str
    source: str
    last_updated: datetime
    metadata: dict

@dataclass
class Answer:
    content: str
    confidence: float
    references: List[Note]

@dataclass
class KnowledgeNode:
    id: str
    title: str
    content: str
    importance: float
    mastery: float

@dataclass
class KnowledgeEdge:
    source: str
    target: str
    relation: str
    strength: float

@dataclass
class KnowledgeGraph:
    nodes: List[KnowledgeNode]
    edges: List[KnowledgeEdge]
    suggestions: List[str]

关键设计决策

在设计过程中，我做了几个重要的决策：

1. 采用模块化设计

# 数据源接口
class DataSource(Protocol):
    async def fetch(self) -> List[Note]: ...
    async def update(self, note: Note): ...

# Markdown 数据源实现
class MarkdownSource:
    def __init__(self, directory: str):
        self.directory = directory

    async def fetch(self) -> List[Note]:
        notes = []
        for file in glob.glob(f"{self.directory}/**/*.md"):
            with open(file) as f:
                content = f.read()
                notes.append(Note(
                    id=file,
                    content=content,
                    source="markdown",
                    last_updated=os.path.getmtime(file)
                ))
        return notes

# Notion 数据源实现
class NotionSource:
    def __init__(self, token: str):
        self.client = NotionClient(token)

    async def fetch(self) -> List[Note]:
        pages = await self.client.search()
        return [
            Note(
                id=page.id,
                content=page.content,
                source="notion",
                last_updated=page.last_edited
            )
            for page in pages
        ]

这样设计的好处是：

容易添加新的数据源
统一的数据处理流程
便于单元测试

2. 使用异步处理

class KnowledgeBase:
    async def update(self):
        # 并行更新所有数据源
        tasks = [
            source.fetch()
            for source in self.sources
        ]
        results = await asyncio.gather(*tasks)

        # 更新向量数据库
        for notes in results:
            await self.vector_store.update(notes)

    async def search(
        self,
        query: str,
        limit: int = 5
    ) -> List[Note]:
        # 1. 向量检索
        vectors = await self.vector_store.search(
            query,
            limit=limit
        )

        # 2. 加载原始内容
        notes = await asyncio.gather(*[
            self.load_note(vector.id)
            for vector in vectors
        ])

        return notes

异步处理可以：

提高响应速度
更好地处理并发
减少资源占用

3. 实现增量更新

class VectorStore:
    async def update(
        self,
        notes: List[Note]
    ):
        # 1. 获取已存在的笔记
        existing = await self.get_existing_notes()

        # 2. 找出需要更新的笔记
        to_update = [
            note for note in notes
            if self.needs_update(note, existing)
        ]

        # 3. 批量更新向量
        if to_update:
            vectors = await self.compute_vectors(
                to_update
            )
            await self.store.upsert(vectors)

    def needs_update(
        self,
        note: Note,
        existing: Dict[str, datetime]
    ) -> bool:
        # 检查是否需要更新
        if note.id not in existing:
            return True

        return note.last_updated > existing[note.id]

增量更新可以：

节省计算资源
加快更新速度
减少 API 调用

开发环境搭建

有了设计方案，接下来就是搭建开发环境。我创建了一个新项目：

# 1. 创建项目目录
mkdir knowledge-agent
cd knowledge-agent

# 2. 初始化 Python 项目
python -m venv venv
source venv/bin/activate
pip install poetry

# 3. 配置依赖
poetry init
poetry add fastapi uvicorn
poetry add openai langchain
poetry add qdrant-client
poetry add python-notion
poetry add lark-sdk

项目结构：

knowledge-agent/
├── pyproject.toml
├── README.md
├── src/
│   ├── api/
│   │   ├── __init__.py
│   │   └── routes.py
│   ├── core/
│   │   ├── __init__.py
│   │   ├── agent.py
│   │   └── knowledge_base.py
│   ├── models/
│   │   ├── __init__.py
│   │   └── schemas.py
│   └── sources/
│       ├── __init__.py
│       ├── markdown.py
│       ├── notion.py
│       └── lark.py
└── tests/
    ├── __init__.py
    ├── test_agent.py
    └── test_sources.py