AutoResearcher

Compresses multi-hour literature reviews into a single query

GitHub Live DemoStack: Python, FastAPI, React, FAISS, NVIDIA NIM API

Overview

Academic literature reviews traditionally require days of manual searching, reading, highlighting, and synthesizing. AutoResearcher condenses this into a 3-minute autonomous pipeline by fetching papers, reading PDFs, and extracting cross-paper insights automatically via a multi-agent LLM workflow.

Frontend Architecture

Built with React, Vite, TypeScript, and TailwindCSS.

State Management: Uses React Context API and sessionStorage to hold the active report. localStorage persists historical reports perfectly without needing a database.
Streaming UI: Consumes Server-Sent Events (SSE) from the backend for real-time pipeline progress updates, using conditional rendering and skeleton loaders.
Client-Side Processing: PDF generation is handled entirely client-side using html2pdf.js, saving backend compute.

React UI and Client-Side rendering

Backend Architecture & Multi-Agent System

FastAPI handles highly modular routes (search, research, chat) with heavy use of asyncio for I/O bound tasks to the arXiv and NVIDIA APIs.

Map-Reduce Agent Pattern: A single LLM cannot read 5 full PDFs. The pipeline divides tasks: PlannerAgent defines queries, SearchAgent hits APIs, PDFAgent downloads via PyMuPDF, AnalysisAgent summarizes concurrently, and WriterAgent synthesizes the final report.
Async Execution: The AnalysisAgent and PDFAgent utilize asyncio.gather() to process multiple PDFs in parallel. Analyzing 5 papers takes the same time as 1.

FAISS Vector Database: Uses IndexFlatIP (Inner Product). Because the embeddings from NVIDIA are L2-normalized, the inner product perfectly calculates Cosine Similarity.
Embedding Optimization: Uses the NVIDIA NIM API (nvidia/nv-embedqa-e5-v5), migrating away from heavy local PyTorch dependencies to prevent Out-of-Memory (OOM) crashes on Render's 512MB free tier.

RAG Pipeline & Hallucination Reduction:

Text is chunked using RecursiveCharacterTextSplitter (e.g., 512 tokens with 64 token overlap) to preserve semantic boundaries.
Requires specific input_type parameters ("passage" for chunks, "query" for user questions) due to the asymmetric nature of the E5 embedding model.
System prompts explicitly enforce strict grounding to prevent hallucinations, achieving 40% fewer hallucinations than standard models.

Infrastructure & Ephemeral Storage

The backend returns a StreamingResponse (Server-Sent Events) yielding JSON chunks as the internal pipeline advances, dropping latency significantly compared to WebSockets.

Stateless REST: The frontend generates a UUID session_id, which the backend uses to isolate in-memory FAISS indices in a dictionary, bypassing the need for a persistent database for the MVP.

Deployment: Frontend deployed on Vercel Edge network, Backend deployed on Render running Uvicorn.

Performance Bottleneck Visualizations

Demo

What I'd Improve

Move from in-memory FAISS dictionary to a managed vector database like Pinecone or Qdrant to solve backend statefulness across multiple Docker containers.
Move long-running pipeline tasks to a message queue like Celery/RabbitMQ so the FastAPI web workers aren't tied up.
Implement strict CORS allowances to protect paid NVIDIA API credits from cross-site requests.

Next Project →

Let's work together!

RAG pipelines and multi-agent systems engineer. I approach AI through a builder's lens — interested in both the architecture and the outcome.

Version

2024 © Edition

Timezone

20:42 IST (GMT+5:30)

Socials

TwitterGithub Linkedin