Skip to content

NAVTEJJ/goat-rag

Repository files navigation

GOAT-RAG

Autonomous multi-agent RAG platform — hybrid retrieval, knowledge graphs, and real-time streaming over any document corpus.

Python Next.js TypeScript FastAPI License Docker


Overview

GOAT-RAG is a production-grade retrieval-augmented generation system built around a three-layer hybrid search stack — dense vector search (FAISS), sparse keyword matching (BM25), and a spaCy-powered knowledge graph (NetworkX) — with a depth-aware multi-agent orchestrator for autonomous research tasks. All inference runs through the Groq free tier, making the entire platform zero-cost to operate.


Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│  Next.js 14 Frontend  (port 3000)                                       │
│  SSE streaming chat · D3.js knowledge graph · document drag-and-drop    │
└──────────────────────────────────┬──────────────────────────────────────┘
                                   │  HTTP / Server-Sent Events
┌──────────────────────────────────▼──────────────────────────────────────┐
│  FastAPI Backend  (port 8000)                                           │
│                                                                         │
│  ┌─────────────────────────────────────────────────────┐               │
│  │  Multi-Agent Orchestrator                           │               │
│  │  depth-aware query decomposition (scale 1–5)        │               │
│  │  sub-question routing · answer synthesis            │               │
│  └────────────────────┬────────────────────────────────┘               │
│                       │                                                 │
│  ┌────────────────────▼────────────────────────────────┐               │
│  │  RAG Pipeline                                       │               │
│  │  HyDE → multi-query expansion → hybrid retrieval   │               │
│  │  cross-encoder reranking → LLM generation          │               │
│  └────────────────────┬────────────────────────────────┘               │
│                       │                                                 │
│  ┌──────────┬─────────┴──────────┬──────────────────┐                  │
│  │ FAISS    │  BM25 Keyword      │ NetworkX          │                  │
│  │ (dense)  │  (sparse)          │ Knowledge Graph   │                  │
│  └──────────┴────────────────────┴──────────────────-┘                  │
└─────────────────────────────────────────────────────────────────────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │  Groq API  (free tier)       │
                    │  llama-3.3-70b · deepseek-r1 │
                    │  llama-3.1-8b · gemma2-9b    │
                    └─────────────────────────────-┘

Features

Retrieval

  • Three-way hybrid search: FAISS dense vectors + BM25 keyword ranking + knowledge graph traversal
  • HyDE (Hypothetical Document Embeddings) query expansion for cold-start improvement
  • Multi-query generation — single question rewritten into N diverse sub-queries
  • Cross-encoder reranking with sentence-transformers for precision at top-k
  • Configurable retrieval weights per query type

Agents

  • Depth-aware multi-agent orchestrator (research depth 1–5)
  • Automatic question decomposition and sub-agent routing
  • Confidence scoring per answer segment with source attribution
  • Verification pass before final synthesis

Ingestion

  • PDF, DOCX, TXT, HTML, and web URLs (BeautifulSoup scraper)
  • YouTube and audio transcription via yt-dlp
  • Automatic deduplication, semantic chunking, and keyword tagging
  • Document-level and chunk-level embeddings (sentence-transformers, local, no API)

Knowledge Graph

  • spaCy NER extraction on ingested documents
  • NetworkX graph with entity → concept → document edges
  • D3.js force-directed visualization in the browser
  • Graph-aware retrieval for relationship-heavy queries

Interface

  • Server-Sent Events streaming — tokens appear as they generate
  • Per-request model selection from 5 available Groq models
  • Multi-agent toggle and fast-mode switch
  • Session management with conversation history

Supported LLM Models

All models run on Groq's free tier — no billing required.

Model Speed Reasoning Context
llama-3.3-70b-versatile Fast High 128k
deepseek-r1-distill-llama-70b Medium Very high (chain-of-thought) 128k
llama-3.1-8b-instant Very fast Medium 128k
gemma2-9b-it Fast Medium 8k
llama-3.2-90b-vision-preview Medium High + vision 128k

Quick Start

Option A — Docker (recommended)

git clone https://github.com/NAVTEJJ/goat-rag.git
cd goat-rag

# Set your Groq API key (free at https://console.groq.com)
cp .env.example .env
# Edit .env and add your GROQ_API_KEY

docker-compose up --build

Open http://localhost:3000.

Option B — Local development

Backend:

cd backend
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm

cp ../.env.example .env
# Add your GROQ_API_KEY to backend/.env

uvicorn main:app --reload --port 8000

Frontend:

cd frontend
npm install
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
npm run dev

API Reference

Query (streaming)

curl -N -X POST http://localhost:8000/query/stream \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main themes in the uploaded documents?",
    "session_id": "sess_001",
    "model": "llama-3.3-70b-versatile",
    "multi_agent": false
  }'

Query (single response)

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Summarise the key findings",
    "session_id": "sess_001"
  }'

Upload a document

curl -X POST http://localhost:8000/upload \
  -F "[email protected]"

Autonomous research

curl -X POST http://localhost:8000/research \
  -H "Content-Type: application/json" \
  -d '{
    "topic": "Impact of transformer architectures on NLP benchmarks",
    "depth": 3
  }'

Knowledge graph

curl http://localhost:8000/knowledge_graph

Available models

curl http://localhost:8000/models

RAG Pipeline Deep Dive

User query
    │
    ▼
HyDE expansion ──── generate a hypothetical document that would answer the query
    │                then embed that document for retrieval (improves cold-start)
    ▼
Multi-query ──────── rewrite the query into 3–5 diverse variants
    │                retrieve for each, union the candidate set
    ▼
Hybrid retrieval ─── FAISS cosine similarity   (weight: 0.5)
    │                BM25 keyword score         (weight: 0.3)
    │                knowledge graph proximity  (weight: 0.2)
    ▼
Cross-encoder ──────  rerank top-20 candidates → keep top-5
    │
    ▼
LLM generation ──── prompt = system + context chunks + conversation history
    │               streamed token-by-token via SSE
    ▼
Verification ─────── confidence check pass, source attribution

Project Structure

GOAT-RAG/
├── backend/
│   ├── main.py                     # FastAPI entry point
│   ├── requirements.txt
│   ├── agents/
│   │   └── orchestrator.py         # Depth-aware multi-agent research
│   ├── api/
│   │   └── routes.py               # REST + SSE endpoints
│   ├── database/
│   │   ├── vector_store/store.py   # FAISS index management
│   │   └── metadata_store/         # BM25 keyword index
│   ├── ingestion/
│   │   └── pipeline.py             # Document processing chain
│   ├── knowledge_graph/
│   │   └── graph_manager.py        # spaCy NER + NetworkX graph
│   ├── memory/
│   │   └── memory_manager.py       # Conversation history
│   ├── rag/
│   │   ├── pipeline.py             # Core RAG orchestration
│   │   ├── advanced_retrieval.py   # HyDE + multi-query + reranking
│   │   ├── retriever.py            # Hybrid search fusion
│   │   └── embeddings.py           # Sentence-transformer wrapper
│   ├── utils/
│   │   ├── llm_client.py           # Groq + OpenAI dual-provider
│   │   └── config.py               # Settings management
│   └── verification/
│       └── verifier.py             # Answer confidence scoring
├── frontend/
│   ├── src/
│   │   ├── pages/index.tsx         # Main chat UI (SSE, model picker)
│   │   └── components/
│   │       ├── KnowledgeGraphViewer.tsx  # D3.js force graph
│   │       ├── DocumentUploader.tsx      # Drag-and-drop ingestion
│   │       ├── ResearchPanel.tsx         # Autonomous research UI
│   │       ├── MessageBubble.tsx         # Markdown message renderer
│   │       └── SourcesPanel.tsx          # Retrieved sources display
│   └── package.json
├── evaluation/
│   ├── rag_metrics.py              # Faithfulness, relevance, context recall
│   └── benchmark_tests.py          # End-to-end retrieval benchmarks
├── deployment/
│   ├── Dockerfile.backend
│   └── Dockerfile.frontend
├── docker-compose.yml
├── .env.example
└── start.sh / start.bat

Evaluation

The evaluation/ module measures retrieval quality without requiring labelled data:

Metric Description
Faithfulness Fraction of answer claims grounded in retrieved context
Context Relevance Semantic similarity of retrieved chunks to the query
Context Recall Coverage of reference answer concepts in the retrieved set
Answer Relevance Semantic alignment of the generated answer to the query

Run benchmarks:

cd evaluation
python benchmark_tests.py

Configuration

Edit .env (copy from .env.example):

Variable Default Description
GROQ_API_KEY Required. Get free at console.groq.com
OPENAI_API_KEY (blank) Optional fallback
LLM_MODEL llama-3.3-70b-versatile Default model for all queries
EMBEDDING_MODEL all-MiniLM-L6-v2 Local sentence-transformer model
APP_ENV development development or production

License

MIT — see LICENSE.


Acknowledgements

  • Groq — LPU inference engine (free tier)
  • FAISS — Facebook AI similarity search
  • sentence-transformers — Local embedding models
  • spaCy — Named entity recognition for knowledge graph construction
  • rank-bm25 — BM25 sparse retrieval

About

Autonomous multi-agent RAG platform - hybrid retrieval (FAISS + BM25 + knowledge graph), HyDE, cross-encoder reranking, and real-time SSE streaming over Groq free tier

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors