
📢 Check out our detailed Berkeley Function Calling Leaderboard changelog (Last updated:
) for the latest dataset / model updates to the Berkeley Function Calling Leaderboard!
🎯 [10/04/2024] Introducing the Agent Arena by Gorilla X LMSYS Chatbot Arena! Compare different agents in tasks like search, finance, RAG, and beyond. Explore which models and tools work best for specific tasks through our novel ranking system and community-driven prompt hub. [Blog] [Arena] [Leaderboard] [Dataset] [Tweet]
📣 [09/21/2024] Announcing BFCL V3 - Evaluating multi-turn and multi-step function calling capabilities! New state-based evaluation system tests models on handling complex workflows, sequential functions, and service states. [Blog] [Leaderboard] [Code] [Tweet]
🚀 [08/20/2024] Released BFCL V2 • Live! The Berkeley Function-Calling Leaderboard now features enterprise-contributed data and real-world scenarios. [Blog] [Live Leaderboard] [V2 Categories Leaderboard] [Tweet]
⚡️ [04/12/2024] Excited to release GoEx - a runtime for LLM-generated actions like code, API calls, and more. Featuring "post-facto validation" for assessing LLM actions after execution, "undo" and "damage confinement" abstractions to manage unintended actions & risks. This paves the way for fully autonomous LLM agents, enhancing interaction between apps & services with human-out-of-loop. [Blog] [Code] [Paper] [Tweet]
⏰ [04/01/2024] Introducing cost and latency metrics into Berkeley function calling leaderboard!
Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke.
With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. This repository contains inference code for running Gorilla finetuned models, evaluation code for reproducing results from our paper, and APIBench - the largest collection of APIs, curated and easy to be trained on!
Since our initial release, we've served ~500k requests and witnessed incredible adoption by developers worldwide. The project has expanded to include tools, evaluations, leaderboard, end-to-end finetuning recipes, infrastructure components, and the Gorilla API Store:
| Project | Type | Description (click to expand) |
|---|---|---|
| Gorilla Paper | 🤖 Model |
📝 Fine-tuning
📚 Dataset
📊 Evaluation
🔧 Infra |
Large Language Model Connected with Massive APIs
• Novel finetuning approach for API invocation
• Evaluation on 1,600+ APIs (APIBench)
• Retrieval-augmented training for test-time adaptation | | Gorilla OpenFunctions-V2 | 🤖 Model |
Drop-in alternative for function calling, supporting multiple complex data types and parallel execution
• Multiple & parallel function execution with OpenAI-compatible endpoints
• Native support for Python, Java, JavaScript, and REST APIs with expanded data types
• Function relevance detection to reduce hallucinations
• Enhanced RESTful API formatting capabilities
• State-of-the-art performance among open-source models
| | Berkeley Function Calling Leaderboard (BFCL) | 📊 Evaluation
🏆 Leaderboard
🔧 Function Calling Infra
📚 Dataset |
Comprehensive evaluation of function-calling capabilities
• V1: Expert-curated dataset for evaluating single-turn function calling
• V2: Enterprise-contributed data for real-world scenarios
• V3: Multi-turn & multi-step function calling evaluation
• Cost and latency metrics for all models
• Interactive API explorer for testing
• Community-driven benchmarking platform
| | Agent Arena | 📊 Evaluation
🏆 Leaderboard |
Compare LLM agents across models, tools, and frameworks
• Head-to-head agent comparisons with ELO rating system
• Framework compatibility testing (LangChain, AutoGPT)
• Community-driven evaluation platform
• Real-world task performance metrics
| | Gorilla Execution Engine (GoEx) | 🔧 Infra |
Runtime for executing LLM-generated actions with safety guarantees
• Post-facto validation for verifying LLM actions after execution
• Undo capabilities and damage confinement for risk mitigation
• OAuth2 and API key authentication for multiple services
• Support for RESTful APIs, databases, and filesystem operations
• Docker-based sandboxed execution environment
| | Retrieval-Augmented Fine-tuning (RAFT) | 📝 Fine-tuning
🤖 Model |
Fine-tuning LLMs for robust domain-specific retrieval
• Novel fine-tuning recipe for domain-specific RAG
• Chain-of-thought answers with direct document quotes
• Training with oracle and distractor documents
• Improved performance on PubMed, HotpotQA, and Gorilla benchmarks
• Efficient adaptation of smaller models for domain QA
| | Gorilla CLI | 🤖 Model
🔧 Local CLI Infra |
LLMs for your command-line interface
• User-friendly CLI tool supporting ~1500 APIs (Kubernetes, AWS, GCP, etc.)
• Natural language command generation with multi-LLM fusion
• Privacy-focused with explicit execution approval
• Command history and interactive selection interface
| | Gorilla API Zoo | 📚 Dataset |
A community-maintained repository of up-to-date API documentation
• Centralized, searchable index of APIs across domains
• Structured documentation format with arguments, versioning, and examples
• Community-driven updates to keep pace with API changes
• Rich data source for model training and fine-tuning
• Enables retrieval-augmented training and inference
• Reduces hallucination through up-to-date documentation
|
Try Gorilla in your browser: - 🚀 Gorilla Colab Demo: Try the base Gorilla model - 🌐 Gorilla Gradio Demo: Interactive web interface - 🔥 OpenFunctions Colab Demo: Try the latest OpenFunctions model - 🎯 OpenFunctions Website Demo: Experiment with function calling - 📊 Berkeley Function Calling Leaderboard: Compare function calling capabilities
pip install gorilla-cli
gorilla generate 100 random characters into a file called test.txt
Learn more about Gorilla CLI →
git clone https://github.com/ShishirPatil/gorilla.git
cd gorilla/inference
Detailed local setup instructions →
import openai
openai.api_key = "EMPTY"
openai.api_base = "http://luigi.millennium.berkeley.edu:8000/v1"
# Define your functions
functions = [{
"name": "get_current_weather",
"description": "Get weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}]
# Make API call
completion = openai.ChatCompletion.create(
model="gorilla-openfunctions-v2",
messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
functions=functions
)
Gorilla Paper Evaluation Scripts: Run your own evaluations
🛠️ Development Tools
Yes! We now have models that you can use commercially without any obligations.
Absolutely! You've highlighted a great aspect of our tools. Gorilla is an end-to-end model, specifically tailored to serve correct API calls (tools) without requiring any additional coding. It's designed to work as part of a wider ecosystem and can be flexibly integrated within agentic frameworks and other tools.
Langchain, is a versatile developer tool. Its "agents" can efficiently swap in any LLM, Gorilla included, making it a highly adaptable solution for various needs.
The beauty of these tools truly shines when they collaborate, complementing each other's strengths and capabilities to create an even more powerful and comprehensive solution. This is where your contribution can make a difference. We enthusiastically welcome any inputs to further refine and enhance these tools.
Check out our blog on How to Use Gorilla: A Step-by-Step Walkthrough to see all the different ways you can integrate Gorilla in your projects.
In the immediate future, we plan to release the following:
$ claude mcp add gorilla \
-- python -m otcore.mcp_server <graph>