Parth Sarthi

I am a graduate researcher in Computer Science at Stanford University, working on deep generative models with applications in language modeling, speech understanding and generation, and more. I am part of the Stanford NLP Group and Stanford AI Lab, working under Professor Chris Manning.

I've led the development of state-of-the-art retrieval augmentation systems, including RAPTOR (980+ stars on GitHub) and production RAG systems during my research internship at Contextual.ai. I've worked on commercial Text-to-Speech systems and Speech LMs at Gan.ai. As part of Stanford NLP's Chirpy Cardinal team, I helped develop award-winning conversational AI systems, leading to our first-place victory ($250K award) in the Amazon Alexa Socialbot Grand Challenge 5 Science Prize.

psarthi@stanford.edu

Recent News

August 2024: Released Myna-Base Text-to-Speech Model (first TTS model to support all 22 official Indian Languages with code mixing)

March 2024: Released RAPTOR code (980+ stars on GitHub)

February 2024: RAPTOR accepted to ICLR 2024

Selected Publications

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, Christopher D. Manning

ICLR 2024 Paper Code

Dialogue Distillery: Crafting Interpolable, Interpretable, and Introspectable Dialogue from LLMs

Ryan A Chi, Jeremy Kim, Scott Hickmann, ..., Parth Sarthi, ..., Christopher D Manning

Alexa Prize SocialBot Grand Challenge Paper

Invited Talks

RAPTOR at Microsoft Turing | April 26, 2024
RAPTOR at LlamaIndex Webinar | March 7, 2024
RAPTOR at Contextual AI Reading Group | February 12, 2024

Honors & Awards

Best Project (Sponsor's Prize) in CS224N out of 650+ students for novel work on retrieval-augmented language models
Awarded India's highest civilian honour for U-18 citizens by Prime Minister Modi (2020) for finding a method to reunite children back with their families using India's biometric-based national ID system: Aadhaar.
Finalist at MIT Bitcoin Expo 2022 for ValGun - a decentralized database validation protocol
Best Hardware Hack at Stanford TreeHacks 2021 for developing an EEG-controlled wheelchair system to assist mobility-impaired individuals

Academic Service

Reviewer for NeurIPS (2023), EMNLP (2023), ICLR (2024), ICML (2024)

Selected Coursework

A+ in: CS107 (Computer Systems), CS111 (Operating Systems), CS161 (Algorithms), CS197 (CS Research), CS236 (Deep Generative Models) among others.

Additional coursework: CS229 (Machine Learning), CS224N (Deep Learning for NLP), CS149 (Parallel Computing), CS221 (AI Principles & Techniques), Math 53 (Differential Equations with Linear Algebra and Fourier Methods), CS255 (Cryptography)