GOAT-RAG

Autonomous multi-agent RAG platform — hybrid retrieval, knowledge graphs, and real-time streaming over any document corpus.

Overview

GOAT-RAG is a production-grade retrieval-augmented generation system built around a three-layer hybrid search stack — dense vector search (FAISS), sparse keyword matching (BM25), and a spaCy-powered knowledge graph (NetworkX) — with a depth-aware multi-agent orchestrator for autonomous research tasks. All inference runs through the Groq free tier, making the entire platform zero-cost to operate.

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│  Next.js 14 Frontend  (port 3000)                                       │
│  SSE streaming chat · D3.js knowledge graph · document drag-and-drop    │
└──────────────────────────────────┬──────────────────────────────────────┘
                                   │  HTTP / Server-Sent Events
┌──────────────────────────────────▼──────────────────────────────────────┐
│  FastAPI Backend  (port 8000)                                           │
│                                                                         │
│  ┌─────────────────────────────────────────────────────┐               │
│  │  Multi-Agent Orchestrator                           │               │
│  │  depth-aware query decomposition (scale 1–5)        │               │
│  │  sub-question routing · answer synthesis            │               │
│  └────────────────────┬────────────────────────────────┘               │
│                       │                                                 │
│  ┌────────────────────▼────────────────────────────────┐               │
│  │  RAG Pipeline                                       │               │
│  │  HyDE → multi-query expansion → hybrid retrieval   │               │
│  │  cross-encoder reranking → LLM generation          │               │
│  └────────────────────┬────────────────────────────────┘               │
│                       │                                                 │
│  ┌──────────┬─────────┴──────────┬──────────────────┐                  │
│  │ FAISS    │  BM25 Keyword      │ NetworkX          │                  │
│  │ (dense)  │  (sparse)          │ Knowledge Graph   │                  │
│  └──────────┴────────────────────┴──────────────────-┘                  │
└─────────────────────────────────────────────────────────────────────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │  Groq API  (free tier)       │
                    │  llama-3.3-70b · deepseek-r1 │
                    │  llama-3.1-8b · gemma2-9b    │
                    └─────────────────────────────-┘

Features

Retrieval

Three-way hybrid search: FAISS dense vectors + BM25 keyword ranking + knowledge graph traversal
HyDE (Hypothetical Document Embeddings) query expansion for cold-start improvement
Multi-query generation — single question rewritten into N diverse sub-queries
Cross-encoder reranking with sentence-transformers for precision at top-k
Configurable retrieval weights per query type

Agents

Depth-aware multi-agent orchestrator (research depth 1–5)
Automatic question decomposition and sub-agent routing
Confidence scoring per answer segment with source attribution
Verification pass before final synthesis

Ingestion

PDF, DOCX, TXT, HTML, and web URLs (BeautifulSoup scraper)
YouTube and audio transcription via yt-dlp
Automatic deduplication, semantic chunking, and keyword tagging
Document-level and chunk-level embeddings (sentence-transformers, local, no API)

Knowledge Graph

spaCy NER extraction on ingested documents
NetworkX graph with entity → concept → document edges
D3.js force-directed visualization in the browser
Graph-aware retrieval for relationship-heavy queries

Interface

Server-Sent Events streaming — tokens appear as they generate
Per-request model selection from 5 available Groq models
Multi-agent toggle and fast-mode switch
Session management with conversation history

Supported LLM Models

All models run on Groq's free tier — no billing required.

Model	Speed	Reasoning	Context
`llama-3.3-70b-versatile`	Fast	High	128k
`deepseek-r1-distill-llama-70b`	Medium	Very high (chain-of-thought)	128k
`llama-3.1-8b-instant`	Very fast	Medium	128k
`gemma2-9b-it`	Fast	Medium	8k
`llama-3.2-90b-vision-preview`	Medium	High + vision	128k

Quick Start

Option A — Docker (recommended)

git clone https://github.com/NAVTEJJ/goat-rag.git
cd goat-rag

# Set your Groq API key (free at https://console.groq.com)
cp .env.example .env
# Edit .env and add your GROQ_API_KEY

docker-compose up --build

Open http://localhost:3000.

Option B — Local development

Backend:

cd backend
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm

cp ../.env.example .env
# Add your GROQ_API_KEY to backend/.env

uvicorn main:app --reload --port 8000

Frontend:

cd frontend
npm install
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
npm run dev

API Reference

Query (streaming)

curl -N -X POST http://localhost:8000/query/stream \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main themes in the uploaded documents?",
    "session_id": "sess_001",
    "model": "llama-3.3-70b-versatile",
    "multi_agent": false
  }'

Query (single response)

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Summarise the key findings",
    "session_id": "sess_001"
  }'

Upload a document

curl -X POST http://localhost:8000/upload \
  -F "[email protected]"

Autonomous research

curl -X POST http://localhost:8000/research \
  -H "Content-Type: application/json" \
  -d '{
    "topic": "Impact of transformer architectures on NLP benchmarks",
    "depth": 3
  }'

Knowledge graph

curl http://localhost:8000/knowledge_graph

Available models

curl http://localhost:8000/models

RAG Pipeline Deep Dive

User query
    │
    ▼
HyDE expansion ──── generate a hypothetical document that would answer the query
    │                then embed that document for retrieval (improves cold-start)
    ▼
Multi-query ──────── rewrite the query into 3–5 diverse variants
    │                retrieve for each, union the candidate set
    ▼
Hybrid retrieval ─── FAISS cosine similarity   (weight: 0.5)
    │                BM25 keyword score         (weight: 0.3)
    │                knowledge graph proximity  (weight: 0.2)
    ▼
Cross-encoder ──────  rerank top-20 candidates → keep top-5
    │
    ▼
LLM generation ──── prompt = system + context chunks + conversation history
    │               streamed token-by-token via SSE
    ▼
Verification ─────── confidence check pass, source attribution

Project Structure

GOAT-RAG/
├── backend/
│   ├── main.py                     # FastAPI entry point
│   ├── requirements.txt
│   ├── agents/
│   │   └── orchestrator.py         # Depth-aware multi-agent research
│   ├── api/
│   │   └── routes.py               # REST + SSE endpoints
│   ├── database/
│   │   ├── vector_store/store.py   # FAISS index management
│   │   └── metadata_store/         # BM25 keyword index
│   ├── ingestion/
│   │   └── pipeline.py             # Document processing chain
│   ├── knowledge_graph/
│   │   └── graph_manager.py        # spaCy NER + NetworkX graph
│   ├── memory/
│   │   └── memory_manager.py       # Conversation history
│   ├── rag/
│   │   ├── pipeline.py             # Core RAG orchestration
│   │   ├── advanced_retrieval.py   # HyDE + multi-query + reranking
│   │   ├── retriever.py            # Hybrid search fusion
│   │   └── embeddings.py           # Sentence-transformer wrapper
│   ├── utils/
│   │   ├── llm_client.py           # Groq + OpenAI dual-provider
│   │   └── config.py               # Settings management
│   └── verification/
│       └── verifier.py             # Answer confidence scoring
├── frontend/
│   ├── src/
│   │   ├── pages/index.tsx         # Main chat UI (SSE, model picker)
│   │   └── components/
│   │       ├── KnowledgeGraphViewer.tsx  # D3.js force graph
│   │       ├── DocumentUploader.tsx      # Drag-and-drop ingestion
│   │       ├── ResearchPanel.tsx         # Autonomous research UI
│   │       ├── MessageBubble.tsx         # Markdown message renderer
│   │       └── SourcesPanel.tsx          # Retrieved sources display
│   └── package.json
├── evaluation/
│   ├── rag_metrics.py              # Faithfulness, relevance, context recall
│   └── benchmark_tests.py          # End-to-end retrieval benchmarks
├── deployment/
│   ├── Dockerfile.backend
│   └── Dockerfile.frontend
├── docker-compose.yml
├── .env.example
└── start.sh / start.bat

Evaluation

The evaluation/ module measures retrieval quality without requiring labelled data:

Metric	Description
Faithfulness	Fraction of answer claims grounded in retrieved context
Context Relevance	Semantic similarity of retrieved chunks to the query
Context Recall	Coverage of reference answer concepts in the retrieved set
Answer Relevance	Semantic alignment of the generated answer to the query

Run benchmarks:

cd evaluation
python benchmark_tests.py

Configuration

Edit .env (copy from .env.example):

Variable	Default	Description
`GROQ_API_KEY`	—	Required. Get free at console.groq.com
`OPENAI_API_KEY`	(blank)	Optional fallback
`LLM_MODEL`	`llama-3.3-70b-versatile`	Default model for all queries
`EMBEDDING_MODEL`	`all-MiniLM-L6-v2`	Local sentence-transformer model
`APP_ENV`	`development`	`development` or `production`

License

MIT — see LICENSE.

Acknowledgements

Groq — LPU inference engine (free tier)
FAISS — Facebook AI similarity search
sentence-transformers — Local embedding models
spaCy — Named entity recognition for knowledge graph construction
rank-bm25 — BM25 sparse retrieval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GOAT-RAG

Overview

Architecture

Features

Supported LLM Models

Quick Start

Option A — Docker (recommended)

Option B — Local development

API Reference

Query (streaming)

Query (single response)

Upload a document

Autonomous research

Knowledge graph

Available models

RAG Pipeline Deep Dive

Project Structure

Evaluation

Configuration

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
deployment		deployment
evaluation		evaluation
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
start.bat		start.bat
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

GOAT-RAG

Overview

Architecture

Features

Supported LLM Models

Quick Start

Option A — Docker (recommended)

Option B — Local development

API Reference

Query (streaming)

Query (single response)

Upload a document

Autonomous research

Knowledge graph

Available models

RAG Pipeline Deep Dive

Project Structure

Evaluation

Configuration

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages