Ciao, I'm Swathi.

Applied AI engineer building agentic systems and autonomous tooling.

Research Fellow @ Microsoft Research

Swathi's desk β€” hover over items to explore
Say hi πŸ‘‹
Experience
Projects
Extracurriculars
Education
Skills
Tap items to explore Β·

Experience

Agentic system combining AST-based static analysis with LLM-driven classification for Rust code repair. 80-95% recall across 150+ crates and 300k+ LoC.

Pointer ownership inference for C→Rust using call graphs, topological propagation, and RL-based translation with backtracking. Reduced manual annotation ~80%.

Implemented on Model Context Protocol (MCP), integrated with GitHub Copilot and Claude Code.

Featured in Rust@M365 newsletter.

Invoice understanding with PaddleOCR + LLM extraction (92%+ accuracy). SVM email classifier at 99.6% accuracy.

Hybrid RAG architecture with SQL reasoning β€” 95%+ answer accuracy for e-commerce social media platform.

Built recommendation engine for commerce-focused social platform; improved engagement by 40% and CTR by 25%.

Django backend serving 2M+ company records for finance chatbot. Deployed ML pipelines via Docker + AWS.

Vue.js + Flask application with 30% scheduling accuracy improvement, saving 15+ hours per week.

Streamlined API testing with Postman, reducing system errors by 40% and increasing uptime 20%.

Projects

Autonomous Code Repair

Multi-agent system that detects and fixes coding guideline violations in Rust codebases. Combines AST-based static analysis to extract structural code features with LLM-driven classification.

Designed a multi-agent architecture with an orchestrator delegating to specialized subagents for violation detection, classification, and fix generation β€” operating in an autonomous loop with build/test validation. Achieved 80-95% recall (vs. 10–50% with vanilla LLM) and <10% missed violations; deployed across 150+ Rust crates and 300k+ LoC.

Agentic AIRustMCPMulti-AgentAST

C β†’ Rust Ownership Inference

Pointer ownership inference system analyzing semantics (singleton vs. array vs. null-terminated) by constructing call graphs and propagating annotations in reverse topological order.

Built type summarization pipeline extracting field-level ownership patterns from C structs via usage-site analysis. Implemented RL-based translation with backtracking: exploration stack of competing approaches, LLM-as-judge scoring, and reward-based approach selection. Reduced manual type annotation effort ~80%.

Type SystemsGraph AlgorithmsRLAST

Invoice Understanding System

Production ML pipeline for automated invoice understanding using OCR and LLM-based structured extraction.

Built an end-to-end invoice processing system combining PaddleOCR with LLM-based field extraction, achieving 92%+ extraction accuracy. Trained and optimized an SVM (RBF kernel) classifier for invoice email detection with 99.6% accuracy and 99%+ precision/recall. Designed scalable ML inference pipelines and deployed them via Docker and AWS for production use.

PaddleOCRScikit-learnLLMsAWSDocker

Financial Intelligence Backend

Scalable backend serving financial intelligence queries across millions of company records.

Architected and deployed a Django-based backend serving financial data across 2M+ company records. Implemented optimized database querying with PostgreSQL and integrated secure authentication using Auth0. Designed APIs enabling LLM-powered financial query responses over structured databases.

DjangoPostgreSQLAuth0APIsAWS

LLM GMeet Assistant

Chrome extension that captures live meeting transcripts and generates summaries and action items.

Built a browser extension using TypeScript, React, and Chrome APIs to capture live meeting transcripts and process them with LLMs for summarization and action-item extraction. Implemented scenario-aware prompting to generate context-specific outputs for meetings.

TypeScriptReactChrome APIsLLMs

Distributed File Sharding

Client-server architecture with consistent hashing for fault-tolerant file partitioning and retrieval.

98% sharding efficiency by evenly distributing files across 10 servers using MD5 hash-based sharding. Reduced data retrieval time by 75% compared to a non-sharded system, with average retrieval of 3ms per file.

Distributed SystemsConsistent HashingPython
View on GitHub β†’

Skills

Languages

Python
Rust
C/C++
C#
TypeScript

ML & AI

LLMs
Agentic AI
Multi-Agent Systems
RAG
AST Analysis

Frameworks

LangChain
FAISS
PaddleOCR
Django
FastAPI

Infrastructure

Model Context Protocol
PostgreSQL
Docker
AWS

Extracurriculars

Running β€” signing up for marathons to remember pain is temporaryPublic speaking & debatesNetflix β€” competitive long-form content consumptionBeaches True crime documentariesLearning to surf Learning to tuft β€” including the slightly unhinged duck rug

Education

Manipal Institute of Technology

B.Tech CS & Engineering (Minor: Data Science) Β· CGPA 9.26 Β· 2020β€”2024

Computer Networks, Database Management, Distributed Systems, Data Science, Compiler Design, Operating System, Computer Vision, Data Structures and Algorithms