— CASE STUDY / AI PLATFORM

A multi-tenant RAG document intelligence platform, with per-user data isolation.

DocSense is the document chat platform I'm building as founder. Upload documents, ask questions, get answers with sources — with per-user isolation enforced in the database, not the application, so two tenants' embeddings never share a query.

Founder & lead architect · 2026–present · Beta · Early access

View live demo →

Hero screenshot — to be added.

— Section B

Context

Multi-tenant RAG has one failure mode that beats every other: a vector similarity search that forgets to filter by user and silently returns another customer’s chunks. The query succeeds. The answer is plausible. Nobody notices until they do, and by then it is too late. Most production RAG systems handle this with careful application-level filtering — which means safety depends on every developer remembering to add the filter every time.

DocSense is built on the inverse premise: isolation belongs in the database, not the application. Every chunks row carries a user_id foreign key to auth.users(id). PostgreSQL’s row-level security policies use auth.uid() to filter at query time. The similarity-search function takes p_user_id as a required parameter. Even if a backend endpoint forgets to scope the search, the database refuses to leak.

On top of that foundation: per-call usage metering, pluggable parsers and chunkers, multilingual document support including Gujarati embeddings, JWT-authenticated APIs, and a deployment that runs on a single Lightsail box. Built as the second product under Kashi — the same architectural patterns I deploy with clients, applied to a problem I wanted solved myself.

Vector search returns rows. Rows leak data. The defence is structural, not careful.

— Section C

My role

I’m the founder, sole architect, and lead engineer. I designed the data model, the ingestion pipeline, the retrieval pipeline, the auth and usage phases, and the deployment shape. I used modern AI coding tools — Cursor, Claude — under my direction to accelerate implementation. The architecture decisions, the trade-offs, the phase-gated rollout sequence: all mine. Every line that ships, I review and own.

— Section D

The hard parts

Five architectural decisions worth surfacing. Each one is the kind of choice people regret not making earlier.

Tenant isolation in vector search

Vector similarity search returns rows. Rows leak data. A single forgotten WHERE clause on a 10K-document index returns chunks from every customer at once — and the answer the LLM writes on top will be confidently grounded in someone else’s documents.

DocSense pushes isolation down to PostgreSQL. Every chunks row has a user_id foreign key to auth.users(id). Supabase row-level security policies enforce auth.uid() at query time. The match_chunks RPC takes p_user_id as a required filter parameter. The default state is safe: a backend developer (or an AI agent) cannot write a cross-tenant leak without first explicitly bypassing both the RLS policy and the RPC’s filter argument. The dangerous path requires intent.

Phase-gated architecture

The user accounts rollout was planned and executed in five phases — A (inventory), B (Supabase Auth + profiles), C (document and chunk ownership), D (usage tracking), E (API JWT middleware). Each phase had its own runbook, its own checklist, its own stable end state. The next phase didn’t start until the previous one was verified live.

Phase folders for billing, analytics, and auth were committed empty from day one — load-bearing signposts, not dead code. When the eventual code lands, the home is obvious and the import paths are stable. The same approach made the migration sequence work: 001 through 006 in strict order, with a dedicated repair migration (005) for any legacy database that had drifted off the spec.

Pluggable parsers and chunkers

File ingestion is where RAG systems accumulate technical debt fastest. Every new format becomes a special case, every chunking strategy a fork in the codebase, every change a regression risk.

DocSense uses abstract base classes for parsers (PDF, DOCX, TXT, OCR for images) and for chunking strategies. Adding a new format is a new subclass, not a touched switch statement. Repository-pattern data access keeps every query mockable. The ingestion pipeline reads as one straight line — parse, chunk, embed, store — and stays that way no matter how many formats land.

Usage metering as a database concern

Per-user billing requires reliable usage events. The easy mistake is to record them asynchronously — a separate queue, a separate worker, a separate source of truth — and then spend forever reconciling the drift.

DocSense records every billable API call through a single record_usage Postgres RPC: one usage_events row written, one usage_monthly row upserted, inside the same transaction as the request that consumed the tokens. No second system, no batch reconciler, no async drift. Daily caps live in the same table — a single COUNT query returns a 429 when the user is over their limit. Billing becomes a SQL question, not a distributed systems problem.

Multilingual retrieval (Gujarati, mixed-script, fallback)

Most RAG systems assume English. DocSense was built to handle non-Latin scripts — Gujarati in particular — without falling back to English-only embeddings. Migration 004 added a documents.language column with values gujarati, mixed, and other, surfaced in the UI as a small badge and used as retrieval metadata.

Embedding model selection is environment-configurable: text-embedding-3-small by default; switch to text-embedding-3-large with the dimensions parameter pinned at 1536, which keeps existing pgvector columns valid and avoids a database migration. Language metadata flows into retrieval, not just display — so a Gujarati query retrieves Gujarati chunks first, with mixed-script and English-fallback chunks behind them.

— Section E

Architecture

File upload → parse → chunk → embed → store in pgvector.

Question → embed → match_chunks(p_user_id) → prompt assembly → GPT-4o-mini → streamed answer. The match_chunks call is where tenant isolation enforces.

— Section F

Stack

Backend: FastAPI · Python 3.11 · Supabase (Auth + PostgreSQL + pgvector + RLS) · OpenAI (GPT-4o-mini, text-embedding-3-small / -3-large)
Frontend: React · Vite · TypeScript · Tailwind CSS · Supabase JS client
Infrastructure: AWS Lightsail (backend + nginx) · Vercel (frontend) · Docker · GitHub Actions

— Section G

What shipped

Live at docsense.co.in. The product runs end-to-end on production infrastructure and accepts real document uploads from real users in early access.

— Document upload across PDF, DOCX, TXT, and image OCR — parsed, chunked, embedded, stored.
— Document chat with grounded answers and source citations streamed to the browser.
— Multi-tenant data isolation enforced by Supabase RLS + match_chunks RPC filter — not application-level checks.
— JWT-authenticated APIs (RS256 via Supabase JWT Signing Keys, with legacy HS256 fallback) across /upload, /chat, /documents.
— Per-user usage metering via record_usage RPC: events + monthly rollup written transactionally with the consuming request.
— Daily token caps with HTTP 429 enforcement at the API layer using usage data.
— Multilingual document support: Gujarati, mixed-script, and other-language documents tagged at ingest time and used as retrieval metadata.
— Five-phase user accounts rollout (A inventory → E JWT middleware) shipped and stable.
— Backend on AWS Lightsail with nginx; frontend on Vercel. Single deploy pipeline via GitHub Actions.

Not yet shipped: paid plan billing UX, formal analytics dashboard, mobile client, batch ingestion API, second-language UI translation. The architecture supports all of these — they’re sequencing decisions, not unsolved problems.

Building something with this shape — RAG document intelligence, multi-tenant SaaS, strict data isolation, real production discipline?

Let’s talk.

Book a call →