AutoGen学习笔记系列(十一)Advanced - Magentic-One
这篇文章瞄的是AutoGen官方教学文档 Advanced
章节中的 Magentic-One
篇章,介绍了AutoGen库最新(2024年11月之后)引入的一个强大工具子库Magentic-One
,核心是提供了一个更接近全自主的Team,其主要特点如下:
- 该库自带了访问 开放网络 的工具,能够比较轻松地搜索网页内容;
- 该库提供了文件分析、代码执行、网络搜索的Agent可以直接使用;
- 该库自带一个 编排器 用来将任务进行拆分并控制整个task工作流;
- 由于该Team的自由度相比较而言更高,所以会存在一些数据与系统安全问题;
【注意】:从这一章节开始由于任务复杂度增加容易出现API响应超时的异常,在 确认API KEY 正确的前提下如果出现了有关 API 的报错可以多运行几次。
- 官网链接:https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/magentic-one.html ;
Magentic-One
我们在之前demo中实现了不少功能,但你会发现其更偏向于 定好一个大致任务,然后通过自己写好的工具+Agent的方式让其代替你与LLM进行对话生成结果。如果此时我们想要LLM有更强大的能力,特别是能够执行 开放网络 和 文件操作 的话就可以使用这个 Magentic-One
子库。
Magentic-One
是一个类似于 RoundRobinGroupChat
的 Team 类型对象,支持所有之前使用的标准Agent;同样 Magentic-One
中的一些工具 MultimodalWebSurfer
、 FileSurfer
、MagenticOneCoderAgent
也可以给其他的Agent使用;
微软提供了 Magentic-One
详细的博客链接和论文,感兴趣的可以去学习下:
- Blog:https://www.microsoft.com/en-us/research/articles/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/
- Technical Report:https://arxiv.org/abs/2411.04468 ;
下面这张图展示了 Magentic-One
如何完成 GAIA
基准测试中的复杂任务,其主要任务是踪Task进度并动态修改计划。FileSurfer
Agent用来读取和处理文件,WebSurfer
Agent用来操作web浏览器,或Coder
或Computer Terminal
Agent用来编写或执行代码。
因为 Magentic-One
在运行过程中会设计到一些交互功能,微软在这里写了一些警告与建议:
- 使用容器:在docker中运行可以防止直接对系统进行更改;
- 虚拟环境:在虚拟环境中运行可以防止其访问到敏感数据;
- 监控日志:在运行过程中监控Log,避免触发风险保护;
- 人工监督:仍然是降低风险,特别是需要避免LLM和Team中出现死循环的情况;
- 限制访问:对Agent限定允许或不允许访问的网络或本地资源,以防止出现未授权操作;
- 保护数据:不要让Agent访问敏感数据,特别是有时候Agent会执行一些冒险行为比如接受网页的
Cookie
,因为有些网站可以通过注入的方式来工具Agent;
上面就是微软强调的几处风险,因为自由度越大的Agent或Team其执行冒险行为的概率也就越大,特别是 Magentic-One
的其中一个核心功能是访问开放网络。
Getting started
安装以下依赖:
$ pip install "autogen-agentchat" "autogen-ext[magentic-one,openai]"
$ playwright install --with-deps chromium
$ pip install olefile
MagenticOneGroupChat 示例
官网提供的演示demo很容易出现死循环的情况,即便是使用满血的在线 gpt-4o
也一样,这一点其实也很好理解,越是自由度高的Agent越容易忽略或者难以识别终止,因为其无法 明确界定 终止条件,加上我们在之前提到过AutoGen的Agent有一个空task重询问机制,这就更容易导致死循环的情况。比如Team已经接受到了 TERMINATE
这个关键字,但没有设置关键字条件检测,Team没有自动触发终止,超时后Team会重新将上一个Task再询问一次。
【注意】我在这里添加了一个文本关键字终止检测条件避免出现死循环
import asyncio
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import MagenticOneGroupChat
from autogen_agentchat.ui import Console
from autogen_agentchat.conditions import TextMentionTermination
import os, asyncio
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
async def main() -> None:
#---------------------------------------------------------#
# Step1. 设置终止条件
termination = TextMentionTermination("TERMINATE")
#---------------------------------------------------------#
# Step2. 指派模型
model_client = OpenAIChatCompletionClient(model="gpt-4o")
#---------------------------------------------------------#
# Step3. 定义Agent和Team
assistant = AssistantAgent(
"Assistant",
model_client=model_client,
)
team = MagenticOneGroupChat(
[assistant],
model_client=model_client,
termination_condition= termination
)
#---------------------------------------------------------#
# Step4. 执行任务:提供费马大定理的不同证明
await Console(team.run_stream(task="Provide a different proof for Fermat's Last Theorem"))
asyncio.run(main())
运行结果如下:
$ python demo.py
因为这个输出值得仔细研究,所以我在这里将详细内容贴上来,总体流程分为以下几步:
user
传入任务描述;MagenticOneOrchestrator
接受任务并调用LLM生成prompt
;MagenticOneOrchestrator
将提示词传递给Team中唯一一个Agent,即AssistantAgent
;Assistant
使用prompt
向LLM进行问询,LLM在分析后会请求授权文献检索;MagenticOneOrchestrator
授权LLM执行文献检索;Assistant
将MagenticOneOrchestrator
的授权信息再传回给LLM;- 检测到LLM返回内容中的关键字结束task;
(LLM) ~/Desktop/AutoGen $ python demo.py
---------- Step1. user 传入任务描述 ----------
Provide a different proof for Fermat's Last Theorem
---------- Step2. MagenticOneOrchestrator 接受任务并调用LLM生成prompt ----------
We are working to address the following user request:
Provide a different proof for Fermat's Last Theorem
To answer this request we have assembled the following team:
Assistant: An agent that provides assistance with ability to use tools.
Here is an initial fact sheet to consider:
1. GIVEN OR VERIFIED FACTS
- The request is to provide a different proof for Fermat's Last Theorem.
2. FACTS TO LOOK UP
- A comprehensive understanding of the original proof provided by Andrew Wiles, which can be found in mathematical journals or books on number theory.
- Existing literature on any alternative approaches or attempted proofs of Fermat's Last Theorem.
3. FACTS TO DERIVE
- Identify any requirements or methodologies that would be necessary to construct a new proof.
- Any logical deductions or computations needed to test potential pathways for an alternative proof.
4. EDUCATED GUESSES
- The problem is likely complex due to the historical difficulty in proving Fermat’s Last Theorem, suggesting any new proof will require advanced mathematics such as algebraic geometry or modular forms.
- It may be insightful to explore connections with known results in number theory or related fields that have not yet been fully leveraged.
Here is the plan to follow as best as possible:
- Begin by reviewing the original proof of Fermat's Last Theorem by Andrew Wiles to understand its framework and identify any potential areas for alternative approaches.
- Conduct a comprehensive literature search for any existing alternative proofs or methodologies that have been proposed for Fermat's Last Theorem.
- Evaluate the mathematical domains that intersect with Wiles’ proof, such as algebraic geometry and modular forms, to explore potential avenues for a new proof.
- Use logical deduction to analyze possible new approaches, possibly by reformulating the theorem within different mathematical contexts or frameworks.
- Explore the utility of computational tools (if available) to simulate potential proof scenarios or verify any new approach.
- If applicable, leverage collaboration with experts or additional resources in number theory for insights or peer review on proposed approaches.
- Compile findings and attempt to construct a coherent alternative proof, ensuring that the approach aligns with rigorous mathematical standards.
---------- Step3. MagenticOneOrchestrator 将提示词传递给Team中唯一一个Agent ----------
Start by reviewing the original proof of Fermat's Last Theorem by Andrew Wiles to understand its framework and to identify potential areas for alternative approaches. Simultaneously, conduct a literature search for any existing alternative proofs or methodologies that have been proposed for Fermat's Last Theorem.
---------- Step4. Assistant 使用提示器向LLM进行问询 ----------
To begin with, I'll provide a brief overview of Andrew Wiles' original proof of Fermat's Last Theorem and the mathematical framework it uses.
### Overview of Wiles' Proof:
- **Fermat's Last Theorem (FLT):** States that there are no whole number solutions to the equation \(x^n + y^n = z^n\) for \(n > 2\).
- **Strategy:** Wiles proved FLT by demonstrating a special case of the Taniyama-Shimura-Weil conjecture, which links elliptic curves and modular forms.
- **Key Components:**
- **Elliptic Curves:** Algebraic structures defined by cubic equations in two variables.
- **Modular Forms:** Complex analysis objects with transformation properties under modular transformations.
- **Ribet's Theorem:** Proved that if the Taniyama-Shimura conjecture were true for semistable elliptic curves, then FLT would be true.
- **Framework:** Wiles showed that every semistable elliptic curve over the rational numbers is modular, thereby proving FLT by connection.
### Literature Search for Alternative Proofs:
【注意】:这里LLM请求进行文献检索
I will now conduct a literature search to identify any known alternative attempts or proofs related to Fermat's Last Theorem.
---------- Step5. MagenticOneOrchestrator 授权LLM执行文献检索 ----------
Perform a literature search to identify any notable alternative proofs or methodologies related to Fermat's Last Theorem, and summarize the findings.
---------- Step6. Assistant 将MagenticOneOrchestrator 的授权信息再传回给LLM ----------
To conduct a literature search for alternative proofs or methodologies related to Fermat's Last Theorem, I will summarize key findings from mathematical texts, research papers, and any notable efforts made beyond Wiles' work.
### Findings from Literature Search:
1. **Wiles' Extension and Verification:**
- After the initial proof by Andrew Wiles in 1994, an extension and verification process followed, including contributions by Richard Taylor.
- Their work remains the only complete and accepted proof for Fermat's Last Theorem within the mathematical community.
2. **Ribet's Theorem and Taniyama-Shimura-Weil Conjecture:**
- No new alternative proof exists independently of the modularity theorem (formerly known as the Taniyama-Shimura conjecture), which was crucial to Wiles' approach.
3. **Attempts at Alternative Approaches:**
- Various mathematicians have explored different mathematical concepts, such as the Langlands program and connections between Galois representations and modular forms.
- These efforts have primarily reinforced existing theories rather than introducing completely new methods for proving FLT.
4. **Literature on Non-traditional Methods:**
- Some speculative or less conventional approaches have emerged in mathematical literature; however, these lack the rigorous peer-reviewed foundation found in Wiles' proof and are not recognized by the mainstream mathematical community.
5. **Publications and Discussions:**
- Numerous papers discuss the implications of Wiles' proof and explore theoretical extensions rather than alternatives to FLT's proof.
- Interest persists in educational and historical context rather than in providing a new logical framework for proof.
Overall, while academic interest in Fermat's Last Theorem remains robust, concrete developments beyond Wiles' work fall within discussions of related mathematical theories or conjectures rather than forging completely distinct pathways for proving FLT.
With the current understanding, Wiles' proof remains unparalleled in its acceptance and rigor in solving this centuries-old theorem.
# Step7. 检测到LLM返回内容中的关键字结束task
TERMINATE
(LLM) ~/Desktop/AutoGen $
MultimodalWebSurfer 示例
下面的demo提供了一个最简单的聊天形式的Team使用,其核心在于使用 MultimodalWebSurfer
搜索信息:
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.teams import MagenticOneGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.agents.web_surfer import MultimodalWebSurfer
import os, asyncio
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
async def main() -> None:
model_client = OpenAIChatCompletionClient(model="gpt-4o")
surfer = MultimodalWebSurfer(
"WebSurfer",
model_client=model_client,
)
team = MagenticOneGroupChat([surfer], model_client=model_client)
await Console(team.run_stream(task="What is the UV index in Melbourne today?"))
asyncio.run(main())
运行结果如下:
$ python demo.py
FileSurfer & MagenticOneCoderAgent & CodeExecutorAgent
该demo 可能 会运行LLM返回的代码,所以会存在一些风险,建议在虚拟环境中执行,但由于官方给的这个task过于简单,大多数情况下LLM会只让Team调用MultimodalWebSurfer
搜索网页信息返回结果:
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.teams import MagenticOneGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.agents.web_surfer import MultimodalWebSurfer
from autogen_ext.agents.file_surfer import FileSurfer
from autogen_ext.agents.magentic_one import MagenticOneCoderAgent
from autogen_agentchat.agents import CodeExecutorAgent
from autogen_ext.code_executors.local import LocalCommandLineCodeExecutor
import os, asyncio
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
async def main() -> None:
model_client = OpenAIChatCompletionClient(model="gpt-4o")
surfer = MultimodalWebSurfer(
"WebSurfer",
model_client=model_client,
)
file_surfer = FileSurfer( "FileSurfer",model_client=model_client)
coder = MagenticOneCoderAgent("Coder",model_client=model_client)
terminal = CodeExecutorAgent("ComputerTerminal",code_executor=LocalCommandLineCodeExecutor())
team = MagenticOneGroupChat([surfer, file_surfer, coder, terminal], model_client=model_client)
await Console(team.run_stream(task="What is the UV index in Melbourne today?"))
asyncio.run(main())
运行结果如下:
$ python demo.py
MagenticOne
下面这个demo运行过程会消耗比较多的Token,因为涉及到将网页截图数据回传给LLM,并且容易陷入死循环:
import asyncio, os
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.teams.magentic_one import MagenticOne
from autogen_agentchat.ui import Console
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
async def example_usage():
client = OpenAIChatCompletionClient(model="gpt-4o")
m1 = MagenticOne(client=client)
task = "Write a Python script to fetch data from an API."
result = await Console(m1.run_stream(task=task))
print(result)
asyncio.run(example_usage())
运行结果如下:
$ python demo.py
Architecture
剩余部分将介绍整个 Magentic-One
的设计框架,总体结构如下图所示:
Magentic-One
基于多Agent框架,其主要内容如下:
- 主编排器
Orchestrator
负责高级规划、引导其他Agent、跟踪总体task进度,编排器首先会根据task创建一个计划,在其维护的Task Ledger
中收集必要的已知信息与假设; - 在运行的每一步,编排器都会创建一个
Progress Ledger
来检查当前步骤是否完成,如果未完成则会为指派Magentic-One
剩余Agent中其中一个来重做这一步。 - Agent完成自己的子task后,编排器会更新
Progress Ledger
直到整个大的task完成; - 如果编排器发现步骤不够,则会更新
Task Ledger
并创建一个新的计划;
编排器的指责就是更新外部循环 Task Ledger
和内部循环 Progress Ledger
。
因此 Magentic-One
由一下几个Agent组成:
- 编排器
Orchestrator
:负责分解任务并规划,指导其他Agent执行子task,并在必要的时候采取纠偏措施; - 浏览器内核执行者
WebSurfer
:能够执行Chromium
内核的命令(访问URL、执行搜索、总结信息等)并获得网页内容,每一步操作都会返回一个网页的新状态; - 文件处理器
FileSurfer
:能够读取大多数本地文件并返回基于markdown
格式的信息,还可以列出文件目录; - 代码编写器具
Coder
:根据提供的信息向LLM请求代码格式的返回,主要通过prompt限定LLM的回答内容; - 代码执行终端
ComputerTerminal
:相当于一个 SHELL,用来执行所给的代码或命令;
总而言之,Magentic-One
的Agent共同为编排器 Orchestrator
提供了解决各种开放式问题所需的工具和功能,以及自主适应和应对动态和不断变化的web和文件系统环境的能力,微软建议在选择LLM时选用如 GPT-4o
这类的强大模型。