MCPcopy
hub / github.com/SaiAkhil066/CORTEX-AI-SUPER-RAG

github.com/SaiAkhil066/CORTEX-AI-SUPER-RAG @main sqlite

repository ↗ · DeepWiki ↗
21 symbols 91 edges 5 files 11 documented · 52%
README

Cortex RAG — Agentic Retrieval Engine 2026

Cortex RAG in action

↑   what actually happens every time you send a message

         

You upload a PDF. You ask a question. Cortex RAG retrieves, cross-checks, reasons, and cites — entirely on your machine. No API key  ·  No cloud upload  ·  No subscription

Open for Enterprise. We build custom, production-grade RAG systems for organizations — same 9-layer pipeline, tuned to your data, your permissions, your stack. Deployed in days, not months. At a fraction of what closed-source vendors charge. → See what we can build for you


✦   Nine Techniques   ✦

🧬   Contextual Retrieval LLM prepends situating context to every chunk before indexing. Each vector carries the full document story, not just a fragment. 🔀   RAG-Fusion + RRF Generates N query variants, retrieves independently for each, then merges all ranked lists via Reciprocal Rank Fusion for better recall. 🕸️   GraphRAG Builds a NetworkX knowledge graph over document entities. Retrieves relational context that a pure vector search would miss entirely.
✅   Corrective RAG (CRAG) LLM grades every retrieved chunk for relevance. Noise is silently dropped before the answer is generated. The model only sees what matters. ⚡   Neural Reranking A Cross-Encoder (ms-marco-MiniLM) reorders all retrieval candidates by true query–passage relevance score, not just embedding similarity. 🔭   HyDE Generates a hypothetical answer first to expand sparse queries into a richer dense embedding space before the actual retrieval step.
🧠   Live Reasoning Panel Streams the model's <think> chain-of-thought in real time. Watch it reason through your documents before the answer appears. 💾   Semantic Cache Cosine-similarity cache at threshold 0.92 on query embeddings. Repeat questions skip retrieval and generation entirely — answer is instant. 💬   Chat Memory Full multi-turn conversation history flows into every generation call. Ask follow-ups naturally; the model remembers what you discussed.

⟁   How a query flows through the system

Upload (PDF / DOCX / TXT)
 │
 ├── Chunk documents
 │
 └── [Contextual Retrieval ON]──► LLM enriches each chunk with surrounding context
                                         │
                                         ▼
                     ┌──────────────────────────────────┐
                     │   BM25   ·   FAISS   ·   Graph   │  ← three indexes built
                     └──────────────────────────────────┘
                                         │
                                   Query arrives
                                         │
                         ┌───────────────┴───────────────┐
                         ▼                               ▼
                  💾 Semantic Cache?              🔀 RAG-Fusion
                  ┌── HIT → return instantly       multi-query expansion
                  │   MISS ↓                            │
                  │                              RRF merge of results
                  │                            + GraphRAG entity boost
                  │                                     │
                  │                            ⚡ Neural Rerank (CrossEncoder)
                  │                                     │
                  │                            ✅ CRAG: grade each chunk
                  │                               drop irrelevant ones
                  │                                     │
                  │                            🧠 LLM stream
                  │                               <think> panel live
                  │                                     │
                  └─────────────────────────────► Answer + Source cards

⚡   Quick Start

Before you begin:

  • [ ] Ollama installed and running
  • [ ] Python 3.10 or higher available

1  —  Clone

git clone https://github.com/SaiAkhil066/CORTEX-AI-SUPER-RAG.git
cd CORTEX-AI-SUPER-RAG

2  —  Install

pip install -r requirements.txt

Windows only: if you get a c10.dll DLL error on first run, pin PyTorch to the stable CPU build: bash pip uninstall torch -y pip install "torch==2.1.2" --index-url https://download.pytorch.org/whl/cpu

3  —  Pull models

ollama pull llama3.1:8b          # LLM  (swap for any model you prefer)
ollama pull nomic-embed-text     # Embeddings  (required)

4  —  Run

python -m streamlit run app.py

Open http://localhost:8501

Use python -m streamlit run (not bare streamlit run) to ensure the correct Python environment is picked up.


🤖   Works with any Ollama model

The model selector in the sidebar auto-populates from your locally installed Ollama models. Swap freely — no config change needed.

Model Params Speed Notes
llama3.1:8b 8B ⚡⚡⚡ Default · best all-round balance
qwen2.5:7b 7B ⚡⚡⚡ Strong on multilingual documents
mistral:7b 7B ⚡⚡⚡ Fast, great for long documents
llama3.1:70b 70B Best quality when speed isn't priority
qwen2.5-coder:7b 7B ⚡⚡⚡ Best for code / technical docs

🐳   Docker setup

Option A — Ollama on host (recommended)

docker-compose build && docker-compose up

Ollama runs natively; the container connects via the host network.

Option B — Everything in Docker

version: "3.8"
services:
  ollama:
    image: ghcr.io/jmorganca/ollama:latest
    ports:
      - "11434:11434"

  cortex-rag-service:
    build: .
    ports:
      - "8501:8501"
    environment:
      - OLLAMA_API_URL=http://ollama:11434
      - MODEL=llama3.1:8b
      - EMBEDDINGS_MODEL=nomic-embed-text:latest
      - CROSS_ENCODER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
    depends_on:
      - ollama
docker-compose up

🔩   Tech Stack

UIStreamlit 1.30 LLM inferenceOllama (local)
Vector storeFAISS Sparse retrievalBM25 (rank-bm25)
Knowledge graphNetworkX Neural rerankersentence-transformers CrossEncoder
Embeddingsnomic-embed-text via Ollama RAG orchestrationLangChain + langchain-classic
Document loadingPyMuPDF · Docx2txt · TextLoader Supported filesPDF · DOCX · TXT · MD

Built with curiosity  ·  runs on your machine  ·  owned by you

Reddit  ·  Issues  ·  Pull Requests

The future of retrieval-augmented AI is local — no internet required.


If Cortex RAG saved you time, consider buying us a coffee ☕

Every contribution keeps this project free and open-source.

Core symbols most depended-on inside this repo

_thinking_html
called by 4
app.py
_ollama_generate
called by 3
utils/advanced_rag.py
_source_html
called by 2
app.py
load_reranker
called by 1
app.py
load_cache_embeddings
called by 1
app.py
get_ollama_models
called by 1
app.py
generate_response
called by 1
app.py
check_semantic_cache
called by 1
app.py

Shape

Function 21

Languages

Python100%

Modules by API surface

app.py8 symbols
utils/advanced_rag.py6 symbols
utils/doc_handler.py3 symbols
utils/retriever_pipeline.py2 symbols
utils/build_graph.py2 symbols

Dependencies from manifests, versioned

langchain0.3 · 1×

For agents

$ claude mcp add CORTEX-AI-SUPER-RAG \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact