hub / github.com/SaiAkhil066/CORTEX-AI-SUPER-RAG

github.com/SaiAkhil066/CORTEX-AI-SUPER-RAG @main sqlite

21 symbols 91 edges 5 files 11 documented · 52%

README

Cortex RAG in action

_{↑ what actually happens every time you send a message}

You upload a PDF. You ask a question. Cortex RAG retrieves, cross-checks, reasons, and cites — entirely on your machine. _{No API key · No cloud upload · No subscription}

Open for Enterprise. We build custom, production-grade RAG systems for organizations — same 9-layer pipeline, tuned to your data, your permissions, your stack. Deployed in days, not months. At a fraction of what closed-source vendors charge. → See what we can build for you

✦ Nine Techniques ✦

🧬 Contextual Retrieval _{LLM prepends situating context to every chunk before indexing. Each vector carries the full document story, not just a fragment.}	🔀 RAG-Fusion + RRF _{Generates N query variants, retrieves independently for each, then merges all ranked lists via Reciprocal Rank Fusion for better recall.}	🕸️ GraphRAG _{Builds a NetworkX knowledge graph over document entities. Retrieves relational context that a pure vector search would miss entirely.}
✅ Corrective RAG (CRAG) _{LLM grades every retrieved chunk for relevance. Noise is silently dropped before the answer is generated. The model only sees what matters.}	⚡ Neural Reranking _{A Cross-Encoder (ms-marco-MiniLM) reorders all retrieval candidates by true query–passage relevance score, not just embedding similarity.}	🔭 HyDE _{Generates a hypothetical answer first to expand sparse queries into a richer dense embedding space before the actual retrieval step.}
🧠 Live Reasoning Panel _{Streams the model's <think> chain-of-thought in real time. Watch it reason through your documents before the answer appears.}	💾 Semantic Cache _{Cosine-similarity cache at threshold 0.92 on query embeddings. Repeat questions skip retrieval and generation entirely — answer is instant.}	💬 Chat Memory _{Full multi-turn conversation history flows into every generation call. Ask follow-ups naturally; the model remembers what you discussed.}

⟁ How a query flows through the system

Upload (PDF / DOCX / TXT)
 │
 ├── Chunk documents
 │
 └── [Contextual Retrieval ON]──► LLM enriches each chunk with surrounding context
                                         │
                                         ▼
                     ┌──────────────────────────────────┐
                     │   BM25   ·   FAISS   ·   Graph   │  ← three indexes built
                     └──────────────────────────────────┘
                                         │
                                   Query arrives
                                         │
                         ┌───────────────┴───────────────┐
                         ▼                               ▼
                  💾 Semantic Cache?              🔀 RAG-Fusion
                  ┌── HIT → return instantly       multi-query expansion
                  │   MISS ↓                            │
                  │                              RRF merge of results
                  │                            + GraphRAG entity boost
                  │                                     │
                  │                            ⚡ Neural Rerank (CrossEncoder)
                  │                                     │
                  │                            ✅ CRAG: grade each chunk
                  │                               drop irrelevant ones
                  │                                     │
                  │                            🧠 LLM stream
                  │                               <think> panel live
                  │                                     │
                  └─────────────────────────────► Answer + Source cards

⚡ Quick Start

Before you begin:

[ ] Ollama installed and running
[ ] Python 3.10 or higher available

1 — Clone

git clone https://github.com/SaiAkhil066/CORTEX-AI-SUPER-RAG.git
cd CORTEX-AI-SUPER-RAG

2 — Install

pip install -r requirements.txt

Windows only: if you get a c10.dll DLL error on first run, pin PyTorch to the stable CPU build: bash pip uninstall torch -y pip install "torch==2.1.2" --index-url https://download.pytorch.org/whl/cpu

3 — Pull models

ollama pull llama3.1:8b          # LLM  (swap for any model you prefer)
ollama pull nomic-embed-text     # Embeddings  (required)

4 — Run

python -m streamlit run app.py

Open http://localhost:8501

Use python -m streamlit run (not bare streamlit run) to ensure the correct Python environment is picked up.

🤖 Works with any Ollama model

The model selector in the sidebar auto-populates from your locally installed Ollama models. Swap freely — no config change needed.

Model	Params	Speed	Notes
`llama3.1:8b`	8B	⚡⚡⚡	Default · best all-round balance
`qwen2.5:7b`	7B	⚡⚡⚡	Strong on multilingual documents
`mistral:7b`	7B	⚡⚡⚡	Fast, great for long documents
`llama3.1:70b`	70B	⚡	Best quality when speed isn't priority
`qwen2.5-coder:7b`	7B	⚡⚡⚡	Best for code / technical docs

🐳 Docker setup

Option A — Ollama on host (recommended)

docker-compose build && docker-compose up

Ollama runs natively; the container connects via the host network.

Option B — Everything in Docker

version: "3.8"
services:
  ollama:
    image: ghcr.io/jmorganca/ollama:latest
    ports:
      - "11434:11434"

  cortex-rag-service:
    build: .
    ports:
      - "8501:8501"
    environment:
      - OLLAMA_API_URL=http://ollama:11434
      - MODEL=llama3.1:8b
      - EMBEDDINGS_MODEL=nomic-embed-text:latest
      - CROSS_ENCODER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
    depends_on:
      - ollama

docker-compose up

🔩 Tech Stack

UI	Streamlit 1.30	LLM inference	Ollama (local)
Vector store	FAISS	Sparse retrieval	BM25 (rank-bm25)
Knowledge graph	NetworkX	Neural reranker	sentence-transformers CrossEncoder
Embeddings	nomic-embed-text via Ollama	RAG orchestration	LangChain + langchain-classic
Document loading	PyMuPDF · Docx2txt · TextLoader	Supported files	PDF · DOCX · TXT · MD

Built with curiosity · runs on your machine · owned by you

Reddit · Issues · Pull Requests

_{The future of retrieval-augmented AI is local — no internet required.}

If Cortex RAG saved you time, consider buying us a coffee ☕

_{Every contribution keeps this project free and open-source.}

Core symbols most depended-on inside this repo

utils/advanced_rag.py

load_cache_embeddings

Shape

Function 21

Languages

Python100%

Modules by API surface

app.py8 symbols

utils/advanced_rag.py6 symbols

utils/doc_handler.py3 symbols

utils/retriever_pipeline.py2 symbols

utils/build_graph.py2 symbols

Dependencies from manifests, versioned

langchain0.3 · 1×

For agents

$ claude mcp add CORTEX-AI-SUPER-RAG \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/SaiAkhil066/CORTEX-AI-SUPER-RAG @main sqlite

You upload a PDF. You ask a question. Cortex RAG retrieves, cross-checks, reasons, and cites — entirely on your machine. No API key · No cloud upload · No subscription

✦ Nine Techniques ✦

⟁ How a query flows through the system

⚡ Quick Start

🤖 Works with any Ollama model

🔩 Tech Stack

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

Dependencies from manifests, versioned

For agents

You upload a PDF. You ask a question. Cortex RAG retrieves, cross-checks, reasons, and cites — entirely on your machine. _{No API key · No cloud upload · No subscription}