litellm - 简化大模型 API 调用的工具
更多AI开源软件:
AI开源 - 小众AIhttps://www.aiinn.cn/sources
11000 Stars 1300 Forks 445 Issues 275 贡献者 MIT License Python 语言
代码: GitHub - BerriAI/litellm: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
主页: https://docs.litellm.ai/
litellm该项目能够将各种 AI 大模型和服务的接口,统一转换成 OpenAI 的格式,简化了在不同 AI 服务/大模型切换和管理的工作。此外,它还支持设置预算、限制请求频率、管理 API Key 和配置 OpenAI 代理服务器等功能。
主要功能
使用相同的输入输出格式调用100多个大型语言模型(LLMs)
- 将输入转换为服务提供商的完成、嵌入和图像生成等端点所需的格式
- 确保输出的一致性,文本响应始终位于[‘choices’][0][‘message’][‘content’]路径下
- 在多个部署(如Azure、OpenAI等)之间实现重试/回退逻辑 - 路由器
- 跟踪支出并为每个项目设置预算 - OpenAI代理服务器
使用说明
LiteLLM v1.0.0 now requires openai>=1.0.0. Migration guide here
LiteLLM v1.40.14+ now requires pydantic>=2.0.0. No changes required.
pip install litellm
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["COHERE_API_KEY"] = "your-cohere-key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)
Call any model supported by a provider, with model=<provider_name>/<model_name>. There might be provider-specific details here, so refer to provider docs for more information
异步
from litellm import acompletion
import asyncio
async def test_get_response():
user_message = "Hello, how are you?"
messages = [{"content": user_message, "role": "user"}]
response = await acompletion(model="gpt-3.5-turbo", messages=messages)
return response
response = asyncio.run(test_get_response())
print(response)
流
liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
# claude 2
response = completion('claude-2', messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
日志
LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack
from litellm import completion
## set env variables for logging tools
os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["ATHINA_API_KEY"] = "your-athina-api-key"
os.environ["OPENAI_API_KEY"]
# set callbacks
litellm.success_callback = ["lunary", "langfuse", "athina", "helicone"] # log input/output to lunary, langfuse, supabase, athina, helicone etc
#openai call
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])