
📃 LangChain-Chatchat (formerly Langchain-ChatGLM)
An open-source, offline-deployable RAG and Agent application project based on large language models like ChatGLM and application frameworks like Langchain.
🤖️ A question-answering application based on local knowledge bases using the langchain concept. The goal is to create a friendly and offline-operable knowledge base Q&A solution that supports Chinese scenarios and open-source models.
💡 Inspired by GanymedeNil's project document.ai and AlexZhangji' s ChatGLM-6B Pull Request, this project aims to establish a local knowledge base Q&A application fully utilizing open-source models. The latest version of the project uses FastChat to integrate models like Vicuna, Alpaca, LLaMA, Koala, and RWKV, leveraging the langchain framework to support API calls provided by FastAPI or operations using a WebUI based on Streamlit.

✅ This project supports mainstream open-source LLMs, embedding models, and vector databases, allowing full open-source ** model offline private deployment**. Additionally, the project supports OpenAI GPT API calls and will continue to expand access to various models and model APIs.
⛓️ The implementation principle of this project is as shown below, including loading files -> reading text -> text
segmentation -> text vectorization -> question vectorization -> matching the top k most similar text vectors with the
question vector -> adding the matched text as context along with the question to the prompt -> submitting to the LLM
for generating answers.

From the document processing perspective, the implementation process is as follows:

🚩 This project does not involve fine-tuning or training processes but can utilize fine-tuning or training to optimize the project's performance.
🌐 The 0.3.0 version code used in
the AutoDL Mirror has been updated
to version v0.3.0 of this project.
🐳 Docker images will be updated soon.
🧑💻 If you want to contribute to this project, please refer to the Developer Guide for more information on development and deployment.
| Features | 0.2.x | 0.3.x |
|---|---|---|
| Model Integration | Local: fastchat |
Online: XXXModelWorker | Local: model_provider, supports most mainstream model loading frameworks
Online: oneapi
All model integrations are compatible with the openai sdk | | Agent | ❌ Unstable | ✅ Optimized for ChatGLM3 and QWen, significantly enhanced Agent capabilities || | LLM Conversations | ✅ | ✅ || | Knowledge Base Conversations | ✅ | ✅ || | Search Engine Conversations | ✅ | ✅ || | File Conversations | ✅ Only vector search | ✅ Unified as File RAG feature, supports BM25+KNN and other retrieval methods || | Database Conversations | ❌ | ✅ || | ARXIV Document Conversations | ❌ | ✅ || | Wolfram Conversations | ❌ | ✅ || | Text-to-Image | ❌ | ✅ || | Local Knowledge Base Management | ✅ | ✅ || | WEBUI | ✅ | ✅ Better multi-session support, custom system prompts... |
The core functionality of 0.3.x is implemented by Agent, but users can also manually perform tool calls: |Operation Method|Function Implemented|Applicable Scenario| |----------------|--------------------|-------------------| |Select "Enable Agent", choose multiple tools|Automatic tool calls by LLM|Using models with Agent capabilities like ChatGLM3/Qwen or online APIs| |Select "Enable Agent", choose a single tool|LLM only parses tool parameters|Using models with general Agent capabilities, unable to choose tools well
Want to manually select functions| |Do not select "Enable Agent", choose a single tool|Manually fill in parameters for tool calls without using Agent function|Using models without Agent capabilities|
More features and updates can be experienced in the actual deployment.
This project already supports mainstream models on the market, such as GLM-4-Chat and Qwen2-Instruct, among the latest open-source large language models and embedding models. Users need to start the model deployment framework and load the required models by modifying the configuration information. The supported local model deployment frameworks in this project are as follows:
| Model Deployment Framework | Xinference | LocalAI | Ollama | FastChat |
|---|---|---|---|---|
| Aligned with OpenAI API | ✅ | ✅ | ✅ | ✅ |
| Accelerated Inference Engine | GPTQ, GGML, vLLM, TensorRT | GPTQ, GGML, vLLM, TensorRT | GGUF, GGML | vLLM |
| Model Types Supported | LLM, Embedding, Rerank, Text-to-Image, Vision, Audio | LLM, Embedding, Rerank, Text-to-Image, Vision, Audio | LLM, Text-to-Image, Vision | LLM, Vision |
| Function Call | ✅ | ✅ | ✅ | / |
| More Platform Support (CPU, Metal) | ✅ | ✅ | ✅ | ✅ |
| Heterogeneous | ✅ | ✅ | / | / |
| Cluster | ✅ | ✅ | / | / |
| Documentation Link | Xinference Documentation | LocalAI Documentation | Ollama Documentation | FastChat Documentation |
| Available Models | Xinference Supported Models | LocalAI Supported Models | Ollama Supported Models | FastChat Supported Models |
In addition to the above local model loading frameworks, the project also supports the One API framework for integrating online APIs, supporting commonly used online APIs such as OpenAI ChatGPT, Azure OpenAI API, Anthropic Claude, Zhipu Qingyan, and Baichuan.
[!Note] Regarding Xinference loading local models: Xinference built-in models will automatically download. To load locally downloaded models, you can execute
streamlit run xinference_manager.pyin the tools/model_loaders directory of the project after starting the Xinference service and set the local path for the specified model as prompted on the page.
💡 On the software side, this project supports Python 3.8-3.11 environments and has been tested on Windows, macOS, and Linux operating systems.
💻 On the hardware side, as version 0.3.0 has been modified to support integration with different model deployment frameworks, it can be used under various hardware conditions such as CPU, GPU, NPU, and MPS.
Starting from version 0.3.0, Langchain-Chatchat provides installation in the form of a Python library. Execute the follow
$ claude mcp add Langchain-Chatchat \
-- python -m otcore.mcp_server <graph>