
Shreyash Kadam
Developer by trade, creative by nature.
Overview
Graduate Research Assistant @ University of Illinois, Chicago
MS CS Student @ University of Illinois, Chicago
Chicago, Illinois, United States
Social Links
About
Hello! I'm Shreyash, a software developer with a passion for building robust and scalable applications.
With over five years of coding experience (Internships and Non-Professional), I've specialized in creating optimized full-stack solutions using Next.js and TypeScript. However, my curiosity has led me deep into the world of distributed systems, where I leverage the Go language to engineer complex and interesting projects that tackle challenges of concurrency and resilience.
Beyond software development, I have a strong interest in data science and machine learning, where I primarily use Python to build applications and continuously explore the latest advancements in generative AI.
When I'm not coding, I trade my keyboard for a different kind of rhythm, playing percussion instruments like the Tabla and Cajon. I also have a passion for capturing moments through videography and photography.
Let's connect and collaborate!
Stack
Education
- GPA : 3.85 / 4.0
- Courses:
- Computer Algorithms
- Introduction to Data Science
- Object Oriented Languages and Environments
- Responsible Data Science and Algorithmic Fairness
- User Experience Research Methods
- Visual Data Science
- Information Retrieval
- Representation in Algorithm Design
- Security and Privacy in Networked and Distributed Systems
- Achievements:
- Received Full Tuition Waiver as part of Research Assistantship position.
- Accepted to the KDD 2025 Undergraduate and Masters Consortium (KDD-UMC 2025) in Canada.
Experience
University of Illinois Chicago
Current Employer- Architected and deployed a cloud-native data analytics platform on AWS, replacing legacy Microsoft Access workflows. The new system automates 20+ manual reports, reducing data processing time from 48 hours to under 15 minutes.
- Implemented all infrastructure using Terraform (IaC) and built a full CI/CD pipeline with GitHub Actions for automated container builds, testing (PyTest), and deployment to AWS ECS on Fargate, achieving 99.9% service uptime for over 500 caseworkers.
- Initiated and developed a proof-of-concept predictive model using XGBoost to identify individuals at high risk of service disruption; backtesting on historical data projects a potential 30% reduction in missed critical appointments.
- TypeScript
- Next.js
- Python
- SQLAlchemy
- Pandas
- OpenPyXL
- FastAPI
- Jira
- Docker
- ETL
- Web Sockets
8kSec LLC
- Developed a multi-tenant threat intelligence dashboard using Next.js, Prisma, and a Python (FastAPI) backend, serving real-time security alerts from multiple data sources to corporate clients.
- Optimized PostgreSQL query performance by introducing indexing strategies and connection pooling, reducing average query latency by 40% and contributing to a 25% reduction in database downtime.
- Engineered a WebSocket-based notification service for real-time threat updates, decreasing information delivery delay from over 20 seconds to under 500ms.
- TypeScript
- Next.js
- Python
- SQL
- AWS
- Material UI
- Agile
- Prisma
- Jira
- Docker
- Cybersecurity
- Machine Learning
- SCM
Sortwind Pvt. Ltd.
- Led the migration of a legacy React SPA to a server-side rendered Next.js application, improving the Lighthouse performance score by 35 points and achieving a 30% faster initial page load speed.
- Designed and implemented RESTful APIs in Node.js/Express, deployed as containerized services behind an NGINX reverse proxy for load balancing, increasing peak traffic capacity by 1.5x.
- TypeScript
- Next.js
- React
- MongoDB
- Stripe
- Tailwind CSS
- Firebase
- GCP
- NGINX
- Node
- Express
- Docker
- Load Balancing
- Distributed Systems
Projects(12)
A production-grade, evaluation-driven Retrieval-Augmented Generation (RAG) system that turns the arXiv quantitative finance (q-fin) literature into an interactive expert knowledge base. Unlike standard “single-pass” RAG, abstRAG uses a hierarchical retrieval strategy (Abstract → Full Body) to aggressively filter noise and keep answers grounded in real papers.
- Two-Step Semantic Retrieval (Core Innovation): Runs semantic search over paper abstracts to select Top-K candidate papers, then performs passage-level semantic search only inside those candidates to return the most relevant Top-N chunks (reducing off-topic retrieval and downstream hallucination).
- Vector Search at Scale: Stores 768-d embeddings in PostgreSQL using pgvector, with separate abstract and full-body embedding tables and fast approximate nearest-neighbor search via HNSW indexes.
- High-Quality Ingestion Pipeline: Fetches papers via the arXiv API, prefers HTML sources (post-Dec 2023) for better structure, cleans and converts to Markdown (markdownify), then chunks with LangChain’s MarkdownTextSplitter (500 tokens, 50 overlap) to preserve section-aware context.
- Grounded Answer Generation: Uses Llama 3.1 (via Groq) with a strict context-only prompt template to ensure responses are derived from retrieved passages; the UI streams responses and surfaces the referenced paper links.
- Benchmarking & Research Rigor: Includes an evaluation framework comparing BM25, single-step RAG, and the 2-step method using Precision@k, Recall/Hit Rate, MRR, nDCG, plus answer-quality scoring (LLM-as-a-judge) and latency analysis; internal benchmarks show large gains (e.g., Precision@5 0.85 / Hit Rate 0.94 for 2-step vs. baselines).
- Product UX + Monitoring: Modern Streamlit chat UI with retrieval progress/status, retrieval-quality + latency breakdown, caching and rate limiting, thumbs up/down feedback persisted to the database, and a separate feedback monitoring dashboard for analysis over time.
- Python
- Streamlit
- PostgreSQL
- pgvector
- Docker
- LangChain
- Sentence-Transformers
- Groq API
- Llama 3.1
- HNSW (ANN Vector Indexing)
- BM25
- Jinja2
- Pandas
- Altair
A production-grade, fault-tolerant distributed key-value store built in Go. This project provides a horizontally scalable storage solution that ensures data consistency and high availability using the Raft consensus algorithm. It features a modern Svelte UI for real-time cluster management and data exploration.
- Distributed CRUD Operations: Simple PUT, GET, and DELETE operations distributed across a multi-node cluster.
- Strong Consistency: Guarantees data consistency across all nodes using the Raft consensus protocol. All writes are committed by a leader and replicated to a majority of nodes.
- High Availability & Fault Tolerance: The system can tolerate node failures. If a leader node fails, the cluster automatically elects a new leader with no data loss.
- Horizontal Scalability: Easily scale the cluster by adding new nodes. The system is designed to handle new nodes joining a live cluster.
- Persistent Storage: Utilizes BoltDB for durable, on-disk storage with ACID guarantees, ensuring data survives node restarts.
- Live Management Dashboard: A modern, real-time web UI built with Svelte allows you to:
- View the status of all nodes (leader, follower, online/offline).
- Add new nodes to the cluster dynamically.
- Stop, restart, and decommission nodes.
- Explore and manage key-value data directly.
- Go
- HashiCorp Raft
- Gin
- BoltDB
- HashiCorp Memberlist
- SvelteKit
- Tailwind CSS
- Vite
- Investigated the impact of popularity bias on recommendation fairness through a comparative analysis of the Music (Last.fm) and Movie (MovieLens 1M) domains.
- Evaluated the critical role of different evaluation strategies (UserTest vs. TrainItems), confirming that the choice of strategy profoundly influences the measurement of bias and accuracy.
- Designed and validated a novel user grouping method, NicheConsumptionRate, to effectively identify users with niche tastes based on their consumption of the least popular items.
- Implemented and assessed a post-processing mitigation technique (multiplicative damping, α=0.5), successfully reducing the magnitude and disparity of popularity bias across user groups.
- Quantified the trade-off between fairness and accuracy, demonstrating that while the mitigation strategy improved fairness, it generally decreased recommendation accuracy (NDCG@10)
- Python
- Cornac
- Pandas
- NumPy
- SciPy
- Matplotlib
- Seaborn
A production-quality visual analytics dashboard built for Indiana’s Overdose Fatality Review (OFR) teams to turn “touchpoint” data into actionable intervention strategy. It solves the “apples-to-oranges” comparison problem in public health by letting users find statistical peer counties based on multivariate similarity (not geography), then drill down into where and when the last intervention opportunities occur.
- What it does: Helps OFR teams and state officials compare Indiana’s 92 counties across the full touchpoint pipeline (ED visits, EMS interactions, jail bookings, prison releases, Rx dispensations), identify peer counties with similar overdose pathways, and prioritize resource allocation using interpretable, coordinated visual evidence.
- How it works (end-to-end pipeline):
- Data ingestion: Loads de-identified Indiana touchpoint data (county → year → touchpoint type → metrics) from CSV and county boundaries from GeoJSON.
- Metrics model: Computes and visualizes three core intervention lenses:
- Prevalence: % of decedents with each touchpoint in the 12 months prior to death
- Frequency: average utilization count over time
- Recency: average days between last touchpoint and death (the “intervention window”)
- Similarity + clustering space: A Python preprocessing pipeline generates year-specific 2D projections for each county using feature vectors over touchpoint metrics and t-SNE (scikit-learn), persisted as JSON for fast client rendering.
- Real-time interaction: All filtering and coordinated updates happen client-side; selecting a county via any view updates all linked views instantly.
- Signature visual analytics features:
- Touchpoint Fingerprint Glyphs (map overlay): A custom “flower” glyph per county where petal length encodes prevalence and petal opacity encodes recency, enabling fast pre-attentive scanning for “high urgency + high volume” patterns.
- Coordinated Multiple Views (CMV): Map + clustering scatterplot + prevalence distribution + frequency trends + recency lollipop chart are tightly linked (overview → zoom/filter → details-on-demand).
- Feature-switchable clustering: Users can explore peer counties by toggling similarity space across Prevalence (volume), Frequency (counts), or Recency (timing).
- High-precision selection: Voronoi interaction layers (Delaunay/Voronoi) make dense scatterplot points easy to click; density contours reveal cluster structure without clutter.
- Statistical context everywhere: State-level baseline shown by default; benchmark overlays and distribution context help users interpret whether a county is truly an outlier.
- Engineering + quality:
- Clean React–D3 integration via refs/effects to keep D3 DOM performance without fighting React’s lifecycle.
- Responsive visualization layout (ResizeObserver), semantic zooming for glyph readability, and defensive data handling for sparse counties/years.
- Evaluation & impact:
- Usability-tested with think-aloud protocol; achieved 100% task success on peer-finding and outlier identification tasks with strong ease-of-use and coordination ratings.
- Links:
- Live Demo: https://cs-529-final-project.vercel.app
- Source Code: https://github.com/sajontahsen/CS529-project
- JavaScript
- React
- D3.js
- Geospatial Visualization (GeoJSON)
- Coordinated Multiple Views (CMV)
- Multivariate Glyph Design
- Semantic Zooming
- t-SNE
- Delaunay/Voronoi Interaction
- Density Contours
- Python
- Pandas
- NumPy
- Scikit-learn
- Vercel
- UX Research (Think-Aloud Testing)





