Shiyuan Sean Zhang

Hi! I am Shiyuan (Sean) Zhang, a Visiting Research Assistant at the University of Southern California, with guidance of Prof. Jieyu Zhao. I earned my bachelor’s and master’s degrees in Statistics and Computer Science from the University of Illinois Urbana-Champaign, worked closely with Prof. Jiaqi Ma. My research interest focus on data-centric machine learning (e.g., data attribution), trustworthy NLP, LLM for recommendations, and vision-language models.

Outside of academics, I enjoy exploring trading, where I find the unpredictability of the markets both humbling and exhilarating. I also share my quiet late nights with Lulu, a white rag who has been a silent witness to many of my coding sessions and research explorations.

News

May 30, 2025	🆕 ArXived a new preprint: Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining. This paper proposes a practical framework for selecting hyperparameters in data attribution methods.
May 23, 2025	🎓 I received my Master’s degree in Computer Science from the University of Illinois Urbana-Champaign (UIUC)! Grateful for all the guidance and support throughout this journey.
May 21, 2025	🆕 Released a new preprint: TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models. Co-first authored with Zeqing Wang. This paper proposes a benchmark to evaluate the temporal reasoning ability of Vision-Language Models (VLMs).
Oct 15, 2024	📄 One paper Nuanced Multi-class Detection of Machine-Generated Scientific Text has been accepted to PACLIC 2024 as ORAL presentation! Looking forward to presenting it in Tokyo this December. 🔗 Read the paper on ACL Anthology
Sep 26, 2024	🌟 Our paper dattri: A Library for Efficient Data Attribution has been accepted as a Spotlight Paper at NeurIPS 2024! dattri provides a modular and scalable framework for training data attribution, supporting methods like Influence Functions, TracIn, and TRAK across different access settings. Grateful to work with such an amazing team!

Publications

Preprint 2025

Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining

Weiyi Wang, Junwei Deng, Yuzheng Hu, Shiyuan Zhang, Xirui Jiang, Runting Zhang, Han Zhao, and Jiaqi W. Ma

2025

HTML Code
Preprint 2025

TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models

Zeqing Wang*, Shiyuan Zhang*, Chengpei Tang, and Keze Wang

2025

HTML Code
NeurIPS 2024

dattri: A Library for Efficient Data Attribution

Junwei Deng*, Ting-Wei Li*, Shiyuan Zhang, Shixuan Liu, Yijun Pan, Hao Huang, Xinhe Wang, Pingbang Hu, Xingjian Zhang, and Jiaqi Ma

Advances in Neural Information Processing Systems, 2024

Spotlight Paper

HTML Blog Code
Preprint 2023

Computational Copyright: Towards A Royalty Model for Music Generative AI

Junwei Deng, Shiyuan Zhang, and Jiaqi Ma

ICML 2024 GenLaw Workshop; DPFM Workshop at ICLR 2024, Best Paper Award, 2023

HTML Code
PACLIC 2024

Nuanced Multi-class Detection of Machine-Generated Scientific Text

Shiyuan Zhang, Yubin Ge, and Xiaofeng Liu

38th Pacific Asia Conference on Language, Information and Computation, 2024

HTML Code