I build differential-geometric and information-geometric frameworks to answer one question: What actually changes inside a model when we align it? I discovered that DPO/RLHF alignment creates geometrically localised "brake layers" — specific transformer depths where torsion is selectively suppressed. These are detectable via forward passes alone, reproducible across architectures, and directly patchable. Published at NeurIPS 2025 and SpringerNature. Background: Self-driven independent researcher. Lecturer in CS at Nalhati Government Polytechnic (West Bengal). All research built before/after office hours — no lab, no GPU cluster, no supervisor. 12 years of industry experience across Infosys, IIT Kanpur, J&J R&D, Wipro. Four active research threads:
|
|
| Year | Venue | Title | Links |
|---|---|---|---|
| 2025 | NeurIPS 2025 Workshop | Prompting Away Stereotypes? Evaluating Bias in Text-to-Image Models for Occupations | arXiv |
| 2025 | Journal | Enhancing Human Empathy in Conversations Using Transformer-Based Models | DOI |
| 2024 | SpringerNature ICOMP'24 | Collaborative Federated Learning Cloud Based System | Paper |
| 2026 | [Under Review — NeurIPS 2026] | MENTIS: Multi-Scale Latent Torsion in Language Models | Repo |
| 2026 | [Preprint] | GRAFT: Geometric Representations of Alignment's Fingerprint in Transformer Belief Trajectories | Repo |
Mechanistic Interpretability |
graft-belief-geometry
GRAFT is a post-hoc, gradient-free mechanistic audit toolkit that characterises preference alignment via three torsion probes (𝒯, T1, T2) and an ERA depth profiler — requiring only forward passes through publicly available checkpoints.
| Result | Value |
|---|---|
| T2 concept discriminability | CV = 0.64 vs CKA = 0.08 → 8× better |
| T2 classification AUC | 0.89 [0.85, 0.93] vs CKA 0.61 |
| Normative torsion amplification | 20–46× larger than factual concepts |
| Alignment depth address | ℓ★ ∈ {14, 20, 29–30} — architecture-specific, falsifiable patching targets |
| Safe-prompt paradox | Safe prompts drive larger Δτ than unsafe (p < 10⁻³³ OLMo; cross-dataset replicated) |
| Low-rank alignment signature | DPO operates in 2–3 dim subspace vs RLHF's 4–5 |
| Benchmark | LITMUS · 20,439 prompts · 7 value axioms · 3 null-baseline controls |
| Models | OLMo-2-7B · Llama-3-8B · Mistral-7B · Qwen-2.5-7B (IT→PA pairs) |
📌 Three pre-registered falsifiable hypotheses — all confirmed
H1 (Concept selectivity): Alignment torsion Δ_f is larger for normative concepts than factual ones; T2 spectral anisotropy is the dominant mechanistic signature (CV > 0.50; AUC > 0.85). ✅ Confirmed — CV = 0.64, AUC = 0.89, normative 20–46× > factual; 3 null-baseline controls hold
H2 (Depth address): Alignment concentrates at an architecture-specific depth ℓ★, determined by architecture family — providing a surgical patching target. ✅ Confirmed — ℓ★ reproducible per architecture family, compatible with ROME/ACDC patching
H3 (Safe-prompt paradox): Safe prompts produce larger alignment torsion Δτ than unsafe ones. ✅ Confirmed — p < 10⁻³³ (OLMo), p < 10⁻⁴ (Llama); cross-dataset replication on SafetyBench + WildGuard (n=150 each)
[Under Review — NeurIPS 2026] |
torsional-belief-vector-fieldTeam: Partha Pratim Saha · Samarth Raina · Mayur Parvatikar · Amit Dhanda · Vinija Jain · Aman Chadha · Amitava Das
MENTIS · Headline Metrics
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
DPO torsion suppression (Mistral, Layer 27)
████████████████████████ 44.4% Cohen's d = 0.741 ***
Bonferroni-corrected p-value 7.7 × 10⁻¹³
H1 normative amplification 1500× (vs factual concepts)
Entropy–torsion bridge (Mistral) ρ = −0.387 p = 5.43 × 10⁻³⁰
17 geometric metrics · 4 model pairs · 20,439 prompts
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AgriTalk· Conformal prediction + HITL safety guarantees
5-layer safety stack: Input Sanitiser → Staleness Verifier → Conformal Predictor (RAPS) → Attribution Gate → Non-bypassable HITL for ABORT / EMERGENCY_STOP
Epistemic inheritance in merged LLMs via Fisher-Rao geometry · 8 cultural axes
| Award | Year |
|---|---|
| 🛡️ BlueDot Impact Scholar — AGI Strategy + Technical AI Safety | 2025–2026 |
| 🔬 LASR Labs — Mechanistic interpretability research programme | 2025 |
| 📋 NeurIPS 2025 Reviewer — MTI-LLM Workshop | 2025 |
| 🖥️ 5× Google Colab Pro A100/H100 — Neuromatch AI Safety grant | 2025 |
| ☁️ AWS AI & ML Scholar | 2025 |
| 🎓 Armenian LLM Summer School — 90% scholarship | 2025 |
| 🌐 Duke ML Summer School | 2025 |
| 🌐 Cohere Summer School | 2025 |
| 🏙️ University of Chicago DSI — Eric & Wendy Schmidt Postdoctoral Fellowship | 2024 |
| 🎓 MLx Generative AI Fellowship — Oxford ML Summer School | 2024–2025 |
| 🧠 diiP Summer School Paris — Deep Learning & Interpretability | 2024 |
| 🤖 Neuromatch Academy — Deep Learning Summer School | 2023 |
| 🗽 NYU AI Summer School | 2022 |
|
🔬 Geometric & Mathematical ML |
🤖 LLMs & Alignment |
⚙️ Stack PyTorch · Transformers (HF)
OLMo · Llama · Mistral
LangChain · LlamaIndex
NumPy · SciPy · Plotly
Docker · Azure · AWS
LaTeX · scikit-learn |
| Repository | Description | Status |
|---|---|---|
graft-belief-geometry |
GRAFT: post-hoc geometric audit of preference alignment | 🟢 Active |
torsional-belief-vector-field |
MENTIS — Riemannian torsion framework for alignment auditing | 🟢 Active |
AgriTalk |
Calibrated NLU for agricultural robots — conformal prediction | 🟢 Active |
nDNA |
Cultural epistemic inheritance in merged LLMs | 🟢 Active |
pps121.github.io |
Full academic portfolio | 🟢 Live |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🎓 Lecturer in CS & Nalhati Govt. Polytechnic 2021–present
Independent Researcher BITS Pilani M.Tech · GPA 9.08 · Top 5%
NeurIPS 2025 author · 4 active research projects
All research self-funded, pre/post office hours
📚 Teaching Assistant BITS Pilani (M.Tech Programme) 2021–2023
NLP · Deep Learning · Deep RL
Honorarium: USD $2,513.11
🏭 Lead Data Scientist Wipro Limited, Bangalore 2021
Conversational AI · IBM Watson · 0.3M users
🔬 Senior Data Scientist BirlaSoft / Johnson & Johnson R&D 2017–2019
Medical search engine · 0.1M+ users
🛡️ Project Engineer IIT Kanpur 2016–2017
Threat intelligence · cybersecurity research
🧬 Senior Systems Engg. Infosys Technologies, Chennai 2011–2015
DNA alignment · Multiple Myeloma genomics
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
| Metric | Value |
|---|---|
| 🔬 Research Repos | graft-belief-geometry · torsional-belief-vector-field · AgriTalk · nDNA |
| 📄 Publications | NeurIPS 2025 Workshop · SpringerNature ICOMP'24 · 3× under review / preprint |
| 🏆 Fellowships | BlueDot · LASR Labs · AWS Scholar · Duke ML · Oxford MLx · 14× competitive selections |
| 🌟 Highlights | NeurIPS 2025 Reviewer · 20,439-prompt benchmark · 4 model families probed |
I am actively seeking fully-funded PhD positions starting 2026/2027 and open to:
- 🤝 Research collaborations in mechanistic interpretability, geometric ML, or AI safety
- 🎓 PhD discussions with faculty at world-class AI safety & interpretability groups
- 🔭 Partnerships validating geometric torsion findings against circuit-level causal analysis
If you work at Anthropic · Redwood Research · ARC Evals · Oxford · Cambridge · MIT · CMU · or any world-class AI safety group — I would love to talk.

