AI Systems, Retrieval, and Data Infrastructure

Noshitha Juttu

Former Deloitte data engineer and MSCS candidate building AI systems, LLM infrastructure, retrieval, and trustworthy applied AI products.

I bring production data engineering experience from Deloitte USI into AI systems work at UMass Amherst, with current focus on LLM infrastructure, retrieval, verification, and inference optimization.

NJ

Based in

San Francisco, California

Open to AI systems, applied AI, and data platform roles

Core Areas

AI systemsRetrieval pipelinesLLM inference optimizationNLPMulti-agent systemsData infrastructureData Engineering

About

Former Deloitte data engineer, MSCS, AI systems builder.

I am a former Deloitte USI data engineer and M.S. Computer Science candidate at UMass Amherst, now focused on AI systems, LLM infrastructure, retrieval, and trustworthy applied AI.

At Deloitte, I spent nearly 2.5 years building production data platforms, pipelines, and analytics systems across utility, healthcare, and pricing domains, with that work recognized through three Deloitte awards.

At UMass Amherst and Adobe, I have worked on on-device model optimization, agentic systems, clinical reasoning, and rapid AI product builds through Bay Area hackathons and community events.

Experience

3+ Years

Research

2 Publications + 1 Under Review

Background

Ex-Deloitte

Experience

Research, applied AI, and production data systems.

My work spans research labs, model optimization, and large-scale enterprise data platforms — with a focus on systems that are measurable, deployable, and reliable.

UMass BioNLP Lab logo

AI Researcher

Current

UMass BioNLP Lab

Advisor: Prof. Hong Yu

Built a training-free multi-agent framework for SDOH prediction from clinical text, focused on inference-time refinement, reasoning stability, and limited-supervision settings.

  • Designed a multi-agent pipeline that ranks candidate outputs for correctness and consistency without fine-tuning.
  • Implemented a lightweight memory module to reuse high-reward reasoning patterns across predictions.

Period

Sep 2025 – Jan 2026

Initiative

Training-Free Multi-Agent Clinical Reasoning

Adobe x UMass logo

Applied AI Extern

Adobe x UMass

Advisors: Prof. Andrew McCallum & Franck Dernoncourt

Engineered an on-device inference optimization pipeline for MarianMT-based neural machine translation, balancing model size, decoding speed, and quality for edge deployment.

  • Built a PyTorch to CoreML / ONNX Runtime workflow enabling INT8 and FP16 export for cross-platform inference.
  • Reduced model size from 75M to 23M parameters and improved decoding throughput by about 20%.

Period

Jan 2025 – May 2025

Initiative

On-Device MarianMT Inference Optimization

Deloitte USI logo

Consulting Client Work

Deloitte USI — AI & Data Engineering Analyst

Delivered production data engineering, analytics, and ML work across multiple client domains under one Deloitte role.

Period

Sep 2021 – Jan 2024

Client

Public Utility

Nov 2022 – Jan 2024

Utility Customer Data Platform

Architected and maintained secure ingestion and transformation pipelines for utility billing, consumption, and daily customer activity data powering customer-facing digital experiences.

  • Supported data used by a customer base of 15M+ users across utility-service experiences and leadership-facing dashboards.
  • Developed a custom Python-based NiFi processor that reduced batch load times from 3 hours to 30 minutes.

Client

Fortune 100 Energy Utility

Apr 2022 – Nov 2022

Enterprise ETL Migration to Databricks

Led migration of legacy Informatica BDM workflows to Databricks-based PySpark pipelines for a high-scale enterprise utility environment.

  • Reduced batch runtimes by 25-30% through Databricks migration, incremental processing, and better transformation design.
  • Received a Spot Award for ownership and impact during the migration effort.

Client

Healthcare Provider

Nov 2021 – Apr 2022

Healthcare ELT & Patient Risk Analytics

Designed and automated ELT pipelines across multi-source healthcare systems to centralize patient and operational data for downstream analytics.

  • Built a 3-layer architecture from raw to transformed to reporting with schema validation, deduplication, and standardized transformations.
  • Supported analytics workflows focused on identifying high-priority and critical-care patient signals from consolidated records.

Client

Used Car Dealers

Sep 2021 – Nov 2021

Used Car Pricing Intelligence

Analyzed used-car market data to identify price-driving factors across vehicle attributes, market trends, and resale patterns.

  • Built predictive modeling workflows to estimate used-car pricing and support pricing-related recommendations.
  • Improved interpretability by surfacing the most important feature drivers behind pricing outputs.
Innodatatics logo

Data Scientist Intern

Innodatatics

Worked on airline churn analysis using statistical testing and classical machine learning to identify customer retention drivers.

  • Performed EDA and feature engineering on 100K+ airline customer records.
  • Built and validated a decision-tree churn model with 93.5% accuracy.

Period

Jun 2019 – Aug 2019

Initiative

Airline Customer Churn Analytics

Projects

Selected systems and applied research work.

A mix of retrieval, inference, embedded ML, and analytical systems built across coursework, research, and applied engineering work.

Flagship Project · Legal AI Verification

BriefCheck

Built a verification layer for AI-drafted legal briefs that checks whether cited cases are real, still good law, support the argument, and fit the right jurisdiction. The project combines retrieval, LLM orchestration, domain reasoning, and product judgment around trust and practical AI safety.

Legal AIVerificationRetrievalLLM OrchestrationMCP
View project

Retrieval · NLP Systems

RAG-based Research Copilot

Built modular retrieval and indexing pipelines using LangGraph, Hugging Face, and semantic search to automate literature ingestion, search, and topic discovery.

LangGraphRAGSentence-TransformersSemantic Search
View project

Data Systems · Analytical Ranking

Automated SQL View Generation & Entropy-Based Ranking Engine

Engineered KL-divergence-based ranking, in-memory caching, and pruning to prioritize analytical views, improve throughput, and reduce query runtime from 10s to under 2s.

SQLKL DivergenceSQLiteOptimization
View project

Embedded ML · Real-time Systems

Hand Gesture Controlled UAV / IMU-Based Gesture Recognition

Built a gesture recognition system using ESP32-S3 and IMU sensor data with FFT-based preprocessing for motion-driven control and low-latency command execution.

ESP32-S3IMUFFTEmbedded ML
View project

Hackathons & Rapid Prototypes

Fast builds that explore ideas quickly.

A space for hackathon systems, rapid prototypes, and experimental builds designed and shipped under tight time constraints.

Rapid Prototype

Prototype · Graph + Agents

12 hours

KDIGO Guideline-Aware Clinical Graph

Theme

Clinical Decision Support with Graph + Agents

Designed a graph-centered prototype using Neo4j and agent orchestration to reason over patient records, guideline rules, contraindications, and treatment thresholds.

Neo4jAgentsClinical AIPrototype

Google DeepMind × Cactus (YC S25) AI Hackathon

Hackathon · Inference Routing

Hackathon Build

Hybrid Edge-Cloud Routing for Tool-Calling AI

Theme

Hybrid inference, tool routing, and edge AI systems

Built a hybrid edge-cloud routing system for tool-calling AI that decides when a small language model (FunctionGemma-270M) is enough and when to escalate to Gemini for stronger reasoning. The project focused on practical inference trade-offs across speed, accuracy, on-device execution, and recovery behavior, showing how smaller models and cloud models can work together instead of competing.

Edge AIGeminiTool RoutingSystems

Publications

Papers, preprints, and research outputs.

Research work spanning multi-agent language models, legal NLP, and early deep learning systems.

When Consensus Becomes Compliance: Measuring Sycophancy in Multi-Agent Language Model Interactions

2026

ACL 2026 Student Research Workshop · Under Review

Introduced the Conditional Infection metric to quantify interaction-driven epistemic regression in multi-agent LLM debates.

Read more

Text to Trust: Evaluating Fine-Tuning and LoRA Trade-offs in Language Models for Unfair Terms of Service Detection

2025

arXiv preprint (arXiv:2510.22531)

Systematic evaluation of full fine-tuning and parameter-efficient LoRA adaptations for clause-level classification and risk flagging in legal contracts.

Read more

Development of an AI-Based Chatbot Using Deep Neural Networks

2021

International Conference on Intelligent Vision and Computing 2021

Speech-enabled chatbot development using Bag of Words, DNNs, and batch gradient descent; recognized for societal impact and integrated into a city municipal website.

Read more

Tech Stack

Tools I use to build and evaluate systems.

From model optimization and retrieval to orchestration, warehousing, and infrastructure.

inference

vLLMCUDAONNX RuntimeCoreMLPTQ/QATINT8/FP16

agents

LangChainLangGraphReActMulti-agent orchestrationGRPO/DPO

retrieval

RAGLlamaParseLlamaIndexFAISSMilvusSentence-TransformersSemantic retrievalNeo4j

data

SparkDatabricksAirflowRedshiftAthenaSnowflakeBigQuerydbt

infrastructure

AWSSageMakerDockerKubernetesTerraform

tooling

MCPDBeaverGitHubPythonTypeScript

Contact

Let’s build AI systems that hold up beyond the demo.

I’m open to applied AI, AI systems, retrieval, ML infrastructure, and data platform roles — especially work that sits between research ideas and production systems.