OpenAI
LiteLLM supports OpenAI Chat + Embedding calls.
Required API Keysโ
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
Usageโ
import os
from litellm import completion
os.environ["OPENAI_API_KEY"] = "your-api-key"
# openai call
response = completion(
model = "gpt-4o",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
Usage - LiteLLM Proxy Serverโ
Here's how to call OpenAI models with the LiteLLM Proxy Server
1. Save key in your environmentโ
export OPENAI_API_KEY=""
2. Start the proxyโ
- config.yaml
- config.yaml - proxy all OpenAI models
- CLI
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo # The `openai/` prefix will call openai.chat.completions.create
api_key: os.environ/OPENAI_API_KEY
- model_name: gpt-3.5-turbo-instruct
litellm_params:
model: text-completion-openai/gpt-3.5-turbo-instruct # The `text-completion-openai/` prefix will call openai.completions.create
api_key: os.environ/OPENAI_API_KEY
Use this to add all openai models with one API Key. WARNING: This will not do any load balancing
This means requests to gpt-4
, gpt-3.5-turbo
, gpt-4-turbo-preview
will all go through this route
model_list:
- model_name: "*" # all requests where model not in your config go to this deployment
litellm_params:
model: openai/* # set `openai/` to use the openai route
api_key: os.environ/OPENAI_API_KEY
$ litellm --model gpt-3.5-turbo
# Server running on http://0.0.0.0:4000
3. Test itโ
- Curl Request
- OpenAI v1.0.0+
- Langchain
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000", # set openai_api_base to the LiteLLM Proxy
model = "gpt-3.5-turbo",
temperature=0.1
)
messages = [
SystemMessage(
content="You are a helpful assistant that im using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)
print(response)
Optional Keys - OpenAI Organization, OpenAI API Baseโ
import os
os.environ["OPENAI_ORGANIZATION"] = "your-org-id" # OPTIONAL
os.environ["OPENAI_BASE_URL"] = "https://your_host/v1" # OPTIONAL
OpenAI Chat Completion Modelsโ
Model Name | Function Call |
---|---|
gpt-4.1 | response = completion(model="gpt-4.1", messages=messages) |
gpt-4.1-mini | response = completion(model="gpt-4.1-mini", messages=messages) |
gpt-4.1-nano | response = completion(model="gpt-4.1-nano", messages=messages) |
o4-mini | response = completion(model="o4-mini", messages=messages) |
o3-mini | response = completion(model="o3-mini", messages=messages) |
o3 | response = completion(model="o3", messages=messages) |
o1-mini | response = completion(model="o1-mini", messages=messages) |
o1-preview | response = completion(model="o1-preview", messages=messages) |
gpt-4o-mini | response = completion(model="gpt-4o-mini", messages=messages) |
gpt-4o-mini-2024-07-18 | response = completion(model="gpt-4o-mini-2024-07-18", messages=messages) |
gpt-4o | response = completion(model="gpt-4o", messages=messages) |
gpt-4o-2024-08-06 | response = completion(model="gpt-4o-2024-08-06", messages=messages) |
gpt-4o-2024-05-13 | response = completion(model="gpt-4o-2024-05-13", messages=messages) |
gpt-4-turbo | response = completion(model="gpt-4-turbo", messages=messages) |
gpt-4-turbo-preview | response = completion(model="gpt-4-0125-preview", messages=messages) |
gpt-4-0125-preview | response = completion(model="gpt-4-0125-preview", messages=messages) |
gpt-4-1106-preview | response = completion(model="gpt-4-1106-preview", messages=messages) |
gpt-3.5-turbo-1106 | response = completion(model="gpt-3.5-turbo-1106", messages=messages) |
gpt-3.5-turbo | response = completion(model="gpt-3.5-turbo", messages=messages) |
gpt-3.5-turbo-0301 | response = completion(model="gpt-3.5-turbo-0301", messages=messages) |
gpt-3.5-turbo-0613 | response = completion(model="gpt-3.5-turbo-0613", messages=messages) |
gpt-3.5-turbo-16k | response = completion(model="gpt-3.5-turbo-16k", messages=messages) |
gpt-3.5-turbo-16k-0613 | response = completion(model="gpt-3.5-turbo-16k-0613", messages=messages) |
gpt-4 | response = completion(model="gpt-4", messages=messages) |
gpt-4-0314 | response = completion(model="gpt-4-0314", messages=messages) |
gpt-4-0613 | response = completion(model="gpt-4-0613", messages=messages) |
gpt-4-32k | response = completion(model="gpt-4-32k", messages=messages) |
gpt-4-32k-0314 | response = completion(model="gpt-4-32k-0314", messages=messages) |
gpt-4-32k-0613 | response = completion(model="gpt-4-32k-0613", messages=messages) |
These also support the OPENAI_BASE_URL
environment variable, which can be used to specify a custom API endpoint.
OpenAI Vision Modelsโ
Model Name | Function Call |
---|---|
gpt-4o | response = completion(model="gpt-4o", messages=messages) |
gpt-4-turbo | response = completion(model="gpt-4-turbo", messages=messages) |
gpt-4-vision-preview | response = completion(model="gpt-4-vision-preview", messages=messages) |
Usageโ
import os
from litellm import completion
os.environ["OPENAI_API_KEY"] = "your-api-key"
# openai call
response = completion(
model = "gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Whatโs in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
)
PDF File Parsingโ
OpenAI has a new file
message type that allows you to pass in a PDF file and have it parsed into a structured output. Read more
- SDK
- PROXY
import base64
from litellm import completion
with open("draconomicon.pdf", "rb") as f:
data = f.read()
base64_string = base64.b64encode(data).decode("utf-8")
completion = completion(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "file",
"file": {
"filename": "draconomicon.pdf",
"file_data": f"data:application/pdf;base64,{base64_string}",
}
},
{
"type": "text",
"text": "What is the first dragon in the book?",
}
],
},
],
)
print(completion.choices[0].message.content)
- Setup config.yaml
model_list:
- model_name: openai-model
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
- Start the proxy
litellm --config config.yaml
- Test it!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "openai-model",
"messages": [
{"role": "user", "content": [
{
"type": "file",
"file": {
"filename": "draconomicon.pdf",
"file_data": f"data:application/pdf;base64,{base64_string}",
}
}
]}
]
}'
OpenAI Fine Tuned Modelsโ
Model Name | Function Call |
---|---|
fine tuned gpt-4-0613 | response = completion(model="ft:gpt-4-0613", messages=messages) |
fine tuned gpt-4o-2024-05-13 | response = completion(model="ft:gpt-4o-2024-05-13", messages=messages) |
fine tuned gpt-3.5-turbo-0125 | response = completion(model="ft:gpt-3.5-turbo-0125", messages=messages) |
fine tuned gpt-3.5-turbo-1106 | response = completion(model="ft:gpt-3.5-turbo-1106", messages=messages) |
fine tuned gpt-3.5-turbo-0613 | response = completion(model="ft:gpt-3.5-turbo-0613", messages=messages) |
OpenAI Audio Transcriptionโ
LiteLLM supports OpenAI Audio Transcription endpoint.
Supported models:
Model Name | Function Call |
---|---|
whisper-1 | response = completion(model="whisper-1", file=audio_file) |
gpt-4o-transcribe | response = completion(model="gpt-4o-transcribe", file=audio_file) |
gpt-4o-mini-transcribe | response = completion(model="gpt-4o-mini-transcribe", file=audio_file) |
- SDK
- PROXY
from litellm import transcription
import os
# set api keys
os.environ["OPENAI_API_KEY"] = ""
audio_file = open("/path/to/audio.mp3", "rb")
response = transcription(model="gpt-4o-transcribe", file=audio_file)
print(f"response: {response}")
- Setup config.yaml
model_list:
- model_name: gpt-4o-transcribe
litellm_params:
model: gpt-4o-transcribe
api_key: os.environ/OPENAI_API_KEY
model_info:
mode: audio_transcription
general_settings:
master_key: sk-1234
- Start the proxy
litellm --config config.yaml
- Test it!
curl --location 'http://0.0.0.0:8000/v1/audio/transcriptions' \
--header 'Authorization: Bearer sk-1234' \
--form 'file=@"/Users/krrishdholakia/Downloads/gettysburg.wav"' \
--form 'model="gpt-4o-transcribe"'
Advancedโ
Getting OpenAI API Response Headersโ
Set litellm.return_response_headers = True
to get raw response headers from OpenAI
You can expect to always get the _response_headers
field from litellm.completion()
, litellm.embedding()
functions
- litellm.completion
- litellm.completion + stream
- litellm.embedding
litellm.return_response_headers = True
# /chat/completion
response = completion(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": "hi",
}
],
)
print(f"response: {response}")
print("_response_headers=", response._response_headers)
litellm.return_response_headers = True
# /chat/completion
response = completion(
model="gpt-4o-mini",
stream=True,
messages=[
{
"role": "user",
"content": "hi",
}
],
)
print(f"response: {response}")
print("response_headers=", response._response_headers)
for chunk in response:
print(chunk)
litellm.return_response_headers = True
# embedding
embedding_response = litellm.embedding(
model="text-embedding-ada-002",
input="hello",
)
embedding_response_headers = embedding_response._response_headers
print("embedding_response_headers=", embedding_response_headers)
Expected Response Headers from OpenAI
{
"date": "Sat, 20 Jul 2024 22:05:23 GMT",
"content-type": "application/json",
"transfer-encoding": "chunked",
"connection": "keep-alive",
"access-control-allow-origin": "*",
"openai-model": "text-embedding-ada-002",
"openai-organization": "*****",
"openai-processing-ms": "20",
"openai-version": "2020-10-01",
"strict-transport-security": "max-age=15552000; includeSubDomains; preload",
"x-ratelimit-limit-requests": "5000",
"x-ratelimit-limit-tokens": "5000000",
"x-ratelimit-remaining-requests": "4999",
"x-ratelimit-remaining-tokens": "4999999",
"x-ratelimit-reset-requests": "12ms",
"x-ratelimit-reset-tokens": "0s",
"x-request-id": "req_cc37487bfd336358231a17034bcfb4d9",
"cf-cache-status": "DYNAMIC",
"set-cookie": "__cf_bm=E_FJY8fdAIMBzBE2RZI2.OkMIO3lf8Hz.ydBQJ9m3q8-1721513123-1.0.1.1-6OK0zXvtd5s9Jgqfz66cU9gzQYpcuh_RLaUZ9dOgxR9Qeq4oJlu.04C09hOTCFn7Hg.k.2tiKLOX24szUE2shw; path=/; expires=Sat, 20-Jul-24 22:35:23 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None, *cfuvid=SDndIImxiO3U0aBcVtoy1TBQqYeQtVDo1L6*Nlpp7EU-1721513123215-0.0.1.1-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None",
"x-content-type-options": "nosniff",
"server": "cloudflare",
"cf-ray": "8a66409b4f8acee9-SJC",
"content-encoding": "br",
"alt-svc": "h3=\":443\"; ma=86400"
}
Parallel Function callingโ
See a detailed walthrough of parallel function calling with litellm here
import litellm
import json
# set openai api key
import os
os.environ['OPENAI_API_KEY'] = "" # litellm reads OPENAI_API_KEY from .env and sends the request
# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
"""Get the current weather in a given location"""
if "tokyo" in location.lower():
return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
elif "san francisco" in location.lower():
return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
elif "paris" in location.lower():
return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
else:
return json.dumps({"location": location, "temperature": "unknown"})
messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
response = litellm.completion(
model="gpt-3.5-turbo-1106",
messages=messages,
tools=tools,
tool_choice="auto", # auto is default, but we'll be explicit
)
print("\nLLM Response1:\n", response)
response_message = response.choices[0].message
tool_calls = response.choices[0].message.tool_calls
Setting extra_headers
for completion callsโ
import os
from litellm import completion
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = completion(
model = "gpt-3.5-turbo",
messages=[{ "content": "Hello, how are you?","role": "user"}],
extra_headers={"AI-Resource Group": "ishaan-resource"}
)
Setting Organization-ID for completion callsโ
This can be set in one of the following ways:
- Environment Variable
OPENAI_ORGANIZATION
- Params to
litellm.completion(model=model, organization="your-organization-id")
- Set as
litellm.organization="your-organization-id"
import os
from litellm import completion
os.environ["OPENAI_API_KEY"] = "your-api-key"
os.environ["OPENAI_ORGANIZATION"] = "your-org-id" # OPTIONAL
response = completion(
model = "gpt-3.5-turbo",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
Set ssl_verify=False
โ
This is done by setting your own httpx.Client
- For
litellm.completion
setlitellm.client_session=httpx.Client(verify=False)
- For
litellm.acompletion
setlitellm.aclient_session=AsyncClient.Client(verify=False)
import litellm, httpx
# for completion
litellm.client_session = httpx.Client(verify=False)
response = litellm.completion(
model="gpt-3.5-turbo",
messages=messages,
)
# for acompletion
litellm.aclient_session = httpx.AsyncClient(verify=False)
response = litellm.acompletion(
model="gpt-3.5-turbo",
messages=messages,
)
If you are using OpenAI models together with models from other providers (e.g. Anthropic), we recommend setting this parameter along with the other parameter mentioned on this page.
Using HTTP/HTTPS Proxy with LiteLLMโ
This is done by setting your own httpx.Client
- For
litellm.completion
setlitellm.client_session=httpx.Client(proxy="http://proxy.com")
- For
litellm.acompletion
setlitellm.aclient_session=AsyncClient.Client(proxy="http://proxy.com")
import litellm, httpx
# for completion
litellm.client_session = httpx.Client(proxy="http://proxy.com")
response = litellm.completion(
model="gpt-3.5-turbo",
messages=messages,
)
# for acompletion
litellm.aclient_session = httpx.AsyncClient(proxy="http://proxy.com")
response = litellm.acompletion(
model="gpt-3.5-turbo",
messages=messages,
)
If you are using OpenAI models together with models from other providers (e.g. Anthropic), we recommend setting this parameter along with the other parameter mentioned on this page.
Using OpenAI Proxy with LiteLLMโ
import os
import litellm
from litellm import completion
os.environ["OPENAI_API_KEY"] = ""
# set custom api base to your proxy
# either set .env or litellm.api_base
# os.environ["OPENAI_BASE_URL"] = "https://your_host/v1"
litellm.api_base = "https://your_host/v1"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion("openai/your-model-name", messages)
If you need to set api_base dynamically, just pass it in completions instead - completions(...,api_base="your-proxy-api-base")
For more check out setting API Base/Keys
Forwarding Org ID for Proxy requestsโ
Forward openai Org ID's from the client to OpenAI with forward_openai_org_id
param.
- Setup config.yaml
model_list:
- model_name: "gpt-3.5-turbo"
litellm_params:
model: gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
general_settings:
forward_openai_org_id: true # ๐ KEY CHANGE
- Start Proxy
litellm --config config.yaml --detailed_debug
# RUNNING on http://0.0.0.0:4000
- Make OpenAI call
from openai import OpenAI
client = OpenAI(
api_key="sk-1234",
organization="my-special-org",
base_url="http://0.0.0.0:4000"
)
client.chat.completions.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello world"}])
In your logs you should see the forwarded org id
LiteLLM:DEBUG: utils.py:255 - Request to litellm:
LiteLLM:DEBUG: utils.py:255 - litellm.acompletion(... organization='my-special-org',)