My research focuses on developing code generation models for automated and collaborative scientific discovery across scientific domains and modalities.
Atharva Sehgal
Research Interests
Education
Visiting Student, Caltech
Working with Yisong Yue on building structured machine learning algorithms with a focus on AI4Science.
Working with Yisong Yue on building structured machine learning algorithms with a focus on AI4Science.
August 2024 - Present
PhD in Computer Science, University of Texas Austin
Developing machine learning algorithms for visual reasoning (Cosmos, Escher) and scientific discovery (LaSR, neurosym-lib). Advised by Swarat Chaudhuri.
Developing machine learning algorithms for visual reasoning (Cosmos, Escher) and scientific discovery (LaSR, neurosym-lib). Advised by Swarat Chaudhuri.
August 2021 - Present
B.S. in Computer Science, University of Illinois Urbana Champaign
Graduated with high honors. Minor in Linguistics. James Scholar.
Graduated with high honors. Minor in Linguistics. James Scholar.
August 2017 - May 2021
Publications
FormulaCode: Evaluating Agentic Optimization on Large Codebases
PRAL @ ICML 2025
Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual Descriptor
(Best Poster) | VisCon @ CVPR 2025
Escher: Self-Evolving Visual Concept Library using Vision-Language Critics
CVPR 2025
LaSR: Symbolic Regression with a Learned Concept Library
NeurIPS 2024
Neurosymbolic Grounding for Compositional World Models
ICLR 2024
Neurosymbolic Programming for Science
AI4Science @ NeurIPS 2022
Composing Neural and Symbolic Reasoning with an Application to Visual Discrimination
IJCAI/ECAI 2022
Statheros: A Compiler for Efficient Low-Precision Probabilistic Programming
DAC 2021
Projects
tacc-inference Library
August 2024
tacc-inference library provides a common API to run LLMs on a single node or multiple nodes of TACC's Vista and Frontera supercomputers (the largest academic supercomputer in the US). Used by 50+ labs. (pip install tacc-inf)
neurosym Library
August 2023
neurosym library is a Python package for neuro-symbolic program synthesis.
It is the first framework which integrates tools for DSL design, program search, and program abstraction in a self contained package.
Used extensively in research, production, and teaching. Joint work with collaborators at MIT. (pip install neurosym)
Technical Strengths
Computer Languages: Python, C, Julia, C++14, Haskell, HTML/CSS/JavaScript, OCaml
Frameworks: PyTorch/TensorFlow/Scipy, Pandas/Dask, NetworkX, Coq/Lean, Z3, Pyro
Outreach, Service, and Talks
Academic Reviewing
ICML (2023-present),
NeurIPS (2022-present),
ICLR (2023-present)
CVPR (2025-present)
ICCV (2025)
Talks
Program Synthesis and Scientific Discovery MIT (2025)
Program Synthesis and Scientific Discovery Caltech (2024)
Program Synthesis and Scientific Discovery Cornell (2024)
Neurosymbolic Programming and Scientific Discovery Chalmers University (2024)
Tutorial on Neurosymbolic Programming Caltech (2022)
Tutorial on Neurosymbolic Programming POPL (2023)
Tutorial on Neurosymbolic Programming ICSE (2024)
Tutorial on Neurosymbolic Programming MIT (2024)
Program Synthesis and Scientific Discovery Caltech (2024)
Program Synthesis and Scientific Discovery Cornell (2024)
Neurosymbolic Programming and Scientific Discovery Chalmers University (2024)
Tutorial on Neurosymbolic Programming Caltech (2022)
Tutorial on Neurosymbolic Programming POPL (2023)
Tutorial on Neurosymbolic Programming ICSE (2024)
Tutorial on Neurosymbolic Programming MIT (2024)
Teaching
Math Tutor Caltech Y (FA24, SP25)
College Math Prep (Co-Instructor) Coleman State Prison (FA23)
DiRP: Neurosymbolic Programming (Instructor) UT Austin (SP24, FA23, SP23)
Honors: Embedded Systems (TA) UIUC (SP20)
Honors: Algorithms for String Processing (TA) UIUC (FA20, SP21)
Data Structures and Algorithms (TA) UIUC (FA19, SP20, FA20, SP21)
Discrete Mathematics (TA) UIUC (FA20)
College Math Prep (Co-Instructor) Coleman State Prison (FA23)
DiRP: Neurosymbolic Programming (Instructor) UT Austin (SP24, FA23, SP23)
Honors: Embedded Systems (TA) UIUC (SP20)
Honors: Algorithms for String Processing (TA) UIUC (FA20, SP21)
Data Structures and Algorithms (TA) UIUC (FA19, SP20, FA20, SP21)
Discrete Mathematics (TA) UIUC (FA20)
Relevant Coursework
Program Synthesis, Computer Vision, Robot Learning, Data Driven Algorithm Design,
Programming Languages, Trustworthy ML