crawl4ai 大模型友好格式输入爬虫框架
参考:
https://github.com/unclecode/crawl4ai
底层用的微软的 playwright 爬虫架构
1、安装
# Install the package
pip install -U crawl4ai
# Run post-installation setup
crawl4ai-setup
2、使用
import asyncio
from crawl4ai import *
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://sj.qq.com/appdetail/com.xingin.xhs",
)
print(result.markdown)
if __name__ == "__main__":
asyncio.run(main())