Saar AI

Ingests bills, legal notices, medical PDFs — extracts what matters

GitHub Live DemoStack: FastAPI, React, LangGraph, PostgreSQL, ChromaDB

Overview

Organizations drown in unstructured data—specifically PDFs, invoices, contracts, and scanned documents. Finding specific clauses or extracting structured data manually is slow and error-prone. Saar AI automates this by turning static documents into interactive, queryable intelligence using an advanced LangGraph cyclical state machine.

System diagram

Frontend & Ingestion Flow

React SPA built for real-time document interaction.

Upload UI: Supports multipart file uploads with drag-and-drop. Crucially, provides real-time progress while backend OCR runs.
Chat Interface: Maintains multi-turn conversation arrays with citations, managing streaming text state from Server-Sent Events (SSE) and applying optimistic UI updates.
Handling Noisy Text: Backend OCR (Tesseract / PyMuPDF) detects if a PDF is native or scanned, triggering vision models to extract text from multi-column layouts and tables before passing to ChromaDB.

React UI and Document Processing

LangGraph Multi-Agent Architecture

A single LLM prompt for RAG is brittle. LangGraph provides a robust state machine that significantly reduces hallucinations compared to linear chains.

Agent Workflow: A Router Agent determines if the question requires document retrieval. The Retriever Agent fetches from ChromaDB. The Grader Agent explicitly checks if retrieved documents answer the question—if not, it rewrites the query and loops back.
State Management: LangGraph maintains a state object across nodes (messages, retrieved_docs, current_query) to pass context smoothly without bloating the context window.

ChromaDB Retrieval: Text chunks are converted to high-dimensional vectors for fast Cosine similarity search. Tenant isolation strictly filters by user_id to prevent data leakage.
Contextual Memory: To support multi-turn retrieval referencing previous answers, an LLM rephrases the user's query using chat history before querying ChromaDB.

Backend & PostgreSQL Implementation:

FastAPI handles async routes. BackgroundTasks are utilized for OCR processing so the API returns a 202 Accepted instantly rather than blocking.
PostgreSQL handles relational metadata (users, documents, conversations, messages), while ChromaDB acts solely as the specialized Vector DB.
This combined architecture enables sub-8s end-to-end processing while supporting 20+ simultaneous document sessions.

Demo

What I'd Improve

Migrate OCR from FastAPI BackgroundTasks to a dedicated message broker (Celery/RabbitMQ) to prevent blocking the CPU thread under heavy load.
Implement advanced layout-aware parsers (like Unstructured.io) to handle multi-column layouts and complex tables without breaking.
Expand the classifier to support more niche document types like tax filings or financial statements.

Next Project →

Let's work together!

RAG pipelines and multi-agent systems engineer. I approach AI through a builder's lens — interested in both the architecture and the outcome.

Version

2024 © Edition

Timezone

20:42 IST (GMT+5:30)

Socials

TwitterGithub Linkedin