Autonomous multi-agent RAG platform — hybrid retrieval, knowledge graphs, and real-time streaming over any document corpus.
GOAT-RAG is a production-grade retrieval-augmented generation system built around a three-layer hybrid search stack — dense vector search (FAISS), sparse keyword matching (BM25), and a spaCy-powered knowledge graph (NetworkX) — with a depth-aware multi-agent orchestrator for autonomous research tasks. All inference runs through the Groq free tier, making the entire platform zero-cost to operate.
┌─────────────────────────────────────────────────────────────────────────┐
│ Next.js 14 Frontend (port 3000) │
│ SSE streaming chat · D3.js knowledge graph · document drag-and-drop │
└──────────────────────────────────┬──────────────────────────────────────┘
│ HTTP / Server-Sent Events
┌──────────────────────────────────▼──────────────────────────────────────┐
│ FastAPI Backend (port 8000) │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Multi-Agent Orchestrator │ │
│ │ depth-aware query decomposition (scale 1–5) │ │
│ │ sub-question routing · answer synthesis │ │
│ └────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌────────────────────▼────────────────────────────────┐ │
│ │ RAG Pipeline │ │
│ │ HyDE → multi-query expansion → hybrid retrieval │ │
│ │ cross-encoder reranking → LLM generation │ │
│ └────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌──────────┬─────────┴──────────┬──────────────────┐ │
│ │ FAISS │ BM25 Keyword │ NetworkX │ │
│ │ (dense) │ (sparse) │ Knowledge Graph │ │
│ └──────────┴────────────────────┴──────────────────-┘ │
└─────────────────────────────────────────────────────────────────────────┘
│
┌──────────────▼──────────────┐
│ Groq API (free tier) │
│ llama-3.3-70b · deepseek-r1 │
│ llama-3.1-8b · gemma2-9b │
└─────────────────────────────-┘
Retrieval
- Three-way hybrid search: FAISS dense vectors + BM25 keyword ranking + knowledge graph traversal
- HyDE (Hypothetical Document Embeddings) query expansion for cold-start improvement
- Multi-query generation — single question rewritten into N diverse sub-queries
- Cross-encoder reranking with sentence-transformers for precision at top-k
- Configurable retrieval weights per query type
Agents
- Depth-aware multi-agent orchestrator (research depth 1–5)
- Automatic question decomposition and sub-agent routing
- Confidence scoring per answer segment with source attribution
- Verification pass before final synthesis
Ingestion
- PDF, DOCX, TXT, HTML, and web URLs (BeautifulSoup scraper)
- YouTube and audio transcription via yt-dlp
- Automatic deduplication, semantic chunking, and keyword tagging
- Document-level and chunk-level embeddings (sentence-transformers, local, no API)
Knowledge Graph
- spaCy NER extraction on ingested documents
- NetworkX graph with entity → concept → document edges
- D3.js force-directed visualization in the browser
- Graph-aware retrieval for relationship-heavy queries
Interface
- Server-Sent Events streaming — tokens appear as they generate
- Per-request model selection from 5 available Groq models
- Multi-agent toggle and fast-mode switch
- Session management with conversation history
All models run on Groq's free tier — no billing required.
| Model | Speed | Reasoning | Context |
|---|---|---|---|
llama-3.3-70b-versatile |
Fast | High | 128k |
deepseek-r1-distill-llama-70b |
Medium | Very high (chain-of-thought) | 128k |
llama-3.1-8b-instant |
Very fast | Medium | 128k |
gemma2-9b-it |
Fast | Medium | 8k |
llama-3.2-90b-vision-preview |
Medium | High + vision | 128k |
git clone https://github.com/NAVTEJJ/goat-rag.git
cd goat-rag
# Set your Groq API key (free at https://console.groq.com)
cp .env.example .env
# Edit .env and add your GROQ_API_KEY
docker-compose up --buildOpen http://localhost:3000.
Backend:
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
cp ../.env.example .env
# Add your GROQ_API_KEY to backend/.env
uvicorn main:app --reload --port 8000Frontend:
cd frontend
npm install
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
npm run devcurl -N -X POST http://localhost:8000/query/stream \
-H "Content-Type: application/json" \
-d '{
"query": "What are the main themes in the uploaded documents?",
"session_id": "sess_001",
"model": "llama-3.3-70b-versatile",
"multi_agent": false
}'curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "Summarise the key findings",
"session_id": "sess_001"
}'curl -X POST http://localhost:8000/upload \
-F "[email protected]"curl -X POST http://localhost:8000/research \
-H "Content-Type: application/json" \
-d '{
"topic": "Impact of transformer architectures on NLP benchmarks",
"depth": 3
}'curl http://localhost:8000/knowledge_graphcurl http://localhost:8000/modelsUser query
│
▼
HyDE expansion ──── generate a hypothetical document that would answer the query
│ then embed that document for retrieval (improves cold-start)
▼
Multi-query ──────── rewrite the query into 3–5 diverse variants
│ retrieve for each, union the candidate set
▼
Hybrid retrieval ─── FAISS cosine similarity (weight: 0.5)
│ BM25 keyword score (weight: 0.3)
│ knowledge graph proximity (weight: 0.2)
▼
Cross-encoder ────── rerank top-20 candidates → keep top-5
│
▼
LLM generation ──── prompt = system + context chunks + conversation history
│ streamed token-by-token via SSE
▼
Verification ─────── confidence check pass, source attribution
GOAT-RAG/
├── backend/
│ ├── main.py # FastAPI entry point
│ ├── requirements.txt
│ ├── agents/
│ │ └── orchestrator.py # Depth-aware multi-agent research
│ ├── api/
│ │ └── routes.py # REST + SSE endpoints
│ ├── database/
│ │ ├── vector_store/store.py # FAISS index management
│ │ └── metadata_store/ # BM25 keyword index
│ ├── ingestion/
│ │ └── pipeline.py # Document processing chain
│ ├── knowledge_graph/
│ │ └── graph_manager.py # spaCy NER + NetworkX graph
│ ├── memory/
│ │ └── memory_manager.py # Conversation history
│ ├── rag/
│ │ ├── pipeline.py # Core RAG orchestration
│ │ ├── advanced_retrieval.py # HyDE + multi-query + reranking
│ │ ├── retriever.py # Hybrid search fusion
│ │ └── embeddings.py # Sentence-transformer wrapper
│ ├── utils/
│ │ ├── llm_client.py # Groq + OpenAI dual-provider
│ │ └── config.py # Settings management
│ └── verification/
│ └── verifier.py # Answer confidence scoring
├── frontend/
│ ├── src/
│ │ ├── pages/index.tsx # Main chat UI (SSE, model picker)
│ │ └── components/
│ │ ├── KnowledgeGraphViewer.tsx # D3.js force graph
│ │ ├── DocumentUploader.tsx # Drag-and-drop ingestion
│ │ ├── ResearchPanel.tsx # Autonomous research UI
│ │ ├── MessageBubble.tsx # Markdown message renderer
│ │ └── SourcesPanel.tsx # Retrieved sources display
│ └── package.json
├── evaluation/
│ ├── rag_metrics.py # Faithfulness, relevance, context recall
│ └── benchmark_tests.py # End-to-end retrieval benchmarks
├── deployment/
│ ├── Dockerfile.backend
│ └── Dockerfile.frontend
├── docker-compose.yml
├── .env.example
└── start.sh / start.bat
The evaluation/ module measures retrieval quality without requiring labelled data:
| Metric | Description |
|---|---|
| Faithfulness | Fraction of answer claims grounded in retrieved context |
| Context Relevance | Semantic similarity of retrieved chunks to the query |
| Context Recall | Coverage of reference answer concepts in the retrieved set |
| Answer Relevance | Semantic alignment of the generated answer to the query |
Run benchmarks:
cd evaluation
python benchmark_tests.pyEdit .env (copy from .env.example):
| Variable | Default | Description |
|---|---|---|
GROQ_API_KEY |
— | Required. Get free at console.groq.com |
OPENAI_API_KEY |
(blank) | Optional fallback |
LLM_MODEL |
llama-3.3-70b-versatile |
Default model for all queries |
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
Local sentence-transformer model |
APP_ENV |
development |
development or production |
MIT — see LICENSE.
- Groq — LPU inference engine (free tier)
- FAISS — Facebook AI similarity search
- sentence-transformers — Local embedding models
- spaCy — Named entity recognition for knowledge graph construction
- rank-bm25 — BM25 sparse retrieval