
Clientside token counting + price estimation for LLM apps and AI agents.
<a href="https://pypi.org/project/tokencost/" target="_blank">
<img alt="Python" src="https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54" />
<img alt="Version" src="https://img.shields.io/pypi/v/tokencost?style=for-the-badge&color=3670A0">
</a>
🐦 Twitter • 📢 Discord • 🖇️ AgentOps
Tokencost helps calculate the USD cost of using major Large Language Model (LLMs) APIs by calculating the estimated cost of prompts and completions.
Building AI agents? Check out AgentOps
from tokencost import calculate_prompt_cost, calculate_completion_cost
model = "gpt-3.5-turbo"
prompt = [{ "role": "user", "content": "Hello world"}]
completion = "How may I assist you today?"
prompt_cost = calculate_prompt_cost(prompt, model)
completion_cost = calculate_completion_cost(completion, model)
print(f"{prompt_cost} + {completion_cost} = {prompt_cost + completion_cost}")
# 0.0000135 + 0.000014 = 0.0000275
pip install tokencost
Calculating the cost of prompts and completions from OpenAI requests
from openai import OpenAI
client = OpenAI()
model = "gpt-3.5-turbo"
prompt = [{ "role": "user", "content": "Say this is a test"}]
chat_completion = client.chat.completions.create(
messages=prompt, model=model
)
completion = chat_completion.choices[0].message.content
# "This is a test."
prompt_cost = calculate_prompt_cost(prompt, model)
completion_cost = calculate_completion_cost(completion, model)
print(f"{prompt_cost} + {completion_cost} = {prompt_cost + completion_cost}")
# 0.0000180 + 0.000010 = 0.0000280
Calculating cost using string prompts instead of messages:
from tokencost import calculate_prompt_cost
prompt_string = "Hello world"
response = "How may I assist you today?"
model= "gpt-3.5-turbo"
prompt_cost = calculate_prompt_cost(prompt_string, model)
print(f"Cost: ${prompt_cost}")
# Cost: $3e-06
Counting tokens
from tokencost import count_message_tokens, count_string_tokens
message_prompt = [{ "role": "user", "content": "Hello world"}]
# Counting tokens in prompts formatted as message lists
print(count_message_tokens(message_prompt, model="gpt-3.5-turbo"))
# 9
# Alternatively, counting tokens in string prompts
print(count_string_tokens(prompt="Hello world", model="gpt-3.5-turbo"))
# 2
Under the hood, strings and ChatML messages are tokenized using Tiktoken, OpenAI's official tokenizer. Tiktoken splits text into tokens (which can be parts of words or individual characters) and handles both raw strings and message formats with additional tokens for message formatting and roles.
For Anthropic models above version 3 (i.e. Sonnet 3.5, Haiku 3.5, and Opus 3), we use the Anthropic beta token counting API to ensure accurate token counts. For older Claude models, we approximate using Tiktoken with the cl100k_base encoding.
Units denominated in USD. All prices can be located here.
| Model Name | Prompt Cost (USD) per 1M tokens | Completion Cost (USD) per 1M tokens | Max Prompt Tokens | Max Output Tokens |
|---|---|---|---|---|
| gpt-4 | $30 | $60 | 8192 | 4096 |
| gpt-4o | $2.5 | $10 | 128,000 | 16384 |
| gpt-4o-audio-preview | $2.5 | $10 | 128,000 | 16384 |
| gpt-4o-audio-preview-2024-10-01 | $2.5 | $10 | 128,000 | 16384 |
| gpt-4o-mini | $0.15 | $0.6 | 128,000 | 16384 |
| gpt-4o-mini-2024-07-18 | $0.15 | $0.6 | 128,000 | 16384 |
| o1-mini | $1.1 | $4.4 | 128,000 | 65536 |
| o1-mini-2024-09-12 | $3 | $12 | 128,000 | 65536 |
| o1-preview | $15 | $60 | 128,000 | 32768 |
| o1-preview-2024-09-12 | $15 | $60 | 128,000 | 32768 |
| chatgpt-4o-latest | $5 | $15 | 128,000 | 4096 |
| gpt-4o-2024-05-13 | $5 | $15 | 128,000 | 4096 |
| gpt-4o-2024-08-06 | $2.5 | $10 | 128,000 | 16384 |
| gpt-4-turbo-preview | $10 | $30 | 128,000 | 4096 |
| gpt-4-0314 | $30 | $60 | 8,192 | 4096 |
| gpt-4-0613 | $30 | $60 | 8,192 | 4096 |
| gpt-4-32k | $60 | $120 | 32,768 | 4096 |
| gpt-4-32k-0314 | $60 | $120 | 32,768 | 4096 |
| gpt-4-32k-0613 | $60 | $120 | 32,768 | 4096 |
| gpt-4-turbo | $10 | $30 | 128,000 | 4096 |
| gpt-4-turbo-2024-04-09 | $10 | $30 | 128,000 | 4096 |
| gpt-4-1106-preview | $10 | $30 | 128,000 | 4096 |
| gpt-4-0125-preview | $10 | $30 | 128,000 | 4096 |
| gpt-4-vision-preview | $10 | $30 | 128,000 | 4096 |
| gpt-4-1106-vision-preview | $10 | $30 | 128,000 | 4096 |
| gpt-3.5-turbo | $1.5 | $2 | 16,385 | 4096 |
| gpt-3.5-turbo-0301 | $1.5 | $2 | 4,097 | 4096 |
| gpt-3.5-turbo-0613 | $1.5 | $2 | 4,097 | 4096 |
| gpt-3.5-turbo-1106 | $1 | $2 | 16,385 | 4096 |
| gpt-3.5-turbo-0125 | $0.5 | $1.5 | 16,385 | 4096 |
| gpt-3.5-turbo-16k | $3 | $4 | 16,385 | 4096 |
| gpt-3.5-turbo-16k-0613 | $3 | $4 | 16,385 | 4096 |
| ft:gpt-3.5-turbo | $3 | $6 | 16,385 | 4096 |
| ft:gpt-3.5-turbo-0125 | $3 | $6 | 16,385 | 4096 |
| ft:gpt-3.5-turbo-1106 | $3 | $6 | 16,385 | 4096 |
| ft:gpt-3.5-turbo-0613 | $3 | $6 | 4,096 | 4096 |
| ft:gpt-4-0613 | $30 | $60 | 8,192 | 4096 |
| ft:gpt-4o-2024-08-06 | $3.75 | $15 | 128,000 | 16384 |
| ft:gpt-4o-mini-2024-07-18 | $0.3 | $1.2 | 128,000 | 16384 |
| ft:davinci-002 | $2 | $2 | 16,384 | 4096 |
| ft:babbage-002 | $0.4 | $0.4 | 16,384 | 4096 |
| text-embedding-3-large | $0.13 | $0 | 8,191 | nan |
| text-embedding-3-small | $0.02 | $0 | 8,191 | nan |
| text-embedding-ada-002 | $0.1 | $0 | 8,191 | nan |
| text-embedding-ada-002-v2 | $0.1 | $0 | 8,191 | nan |
| text-moderation-stable | $0 | $0 | 32,768 | 0 |
| text-moderation-007 | $0 | $0 | 32,768 | 0 |
| text-moderation-latest |
$ claude mcp add tokencost \
-- python -m otcore.mcp_server <graph>