Shiyuan Sean Zhang

Yesterday Today Tomorrow

shiyuan.png

Hi! I am Shiyuan (Sean) Zhang, a Visiting Research Assistant at the University of Southern California, with guidance of Prof. Jieyu Zhao. I earned my bachelor’s and master’s degrees in Statistics and Computer Science from the University of Illinois Urbana-Champaign, worked closely with Prof. Jiaqi Ma. My research interest focus on data-centric machine learning (e.g., data attribution), trustworthy NLP, LLM for recommendations, and vision-language models.

Outside of academics, I enjoy exploring trading, where I find the unpredictability of the markets both humbling and exhilarating. I also share my quiet late nights with Lulu, a white rag who has been a silent witness to many of my coding sessions and research explorations.

News

May 30, 2025 🆕 ArXived a new preprint: Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining. This paper proposes a practical framework for selecting hyperparameters in data attribution methods.
May 23, 2025 🎓 I received my Master’s degree in Computer Science from the University of Illinois Urbana-Champaign (UIUC)! Grateful for all the guidance and support throughout this journey.
May 21, 2025 🆕 Released a new preprint: TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models.
Co-first authored with Zeqing Wang. This paper proposes a benchmark to evaluate the temporal reasoning ability of Vision-Language Models (VLMs).
Oct 15, 2024 📄 One paper Nuanced Multi-class Detection of Machine-Generated Scientific Text has been accepted to PACLIC 2024 as ORAL presentation! Looking forward to presenting it in Tokyo this December.
🔗 Read the paper on ACL Anthology
Sep 26, 2024 🌟 Our paper dattri: A Library for Efficient Data Attribution has been accepted as a Spotlight Paper at NeurIPS 2024!
dattri provides a modular and scalable framework for training data attribution, supporting methods like Influence Functions, TracIn, and TRAK across different access settings. Grateful to work with such an amazing team!

Publications

  1. Preprint 2025
    Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining
    Weiyi Wang, Junwei Deng, Yuzheng Hu, Shiyuan Zhang, Xirui Jiang, Runting Zhang, Han Zhao, and Jiaqi W. Ma
    2025
  2. Preprint 2025
    TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
    Zeqing Wang*, Shiyuan Zhang*, Chengpei Tang, and Keze Wang
    2025
  3. NeurIPS 2024
    dattri: A Library for Efficient Data Attribution
    Junwei Deng*, Ting-Wei Li*, Shiyuan Zhang, Shixuan Liu, Yijun Pan, Hao Huang, Xinhe Wang, Pingbang Hu, Xingjian Zhang, and Jiaqi Ma
    Advances in Neural Information Processing Systems, 2024
    Spotlight Paper
  4. Preprint 2023
    Computational Copyright: Towards A Royalty Model for Music Generative AI
    Junwei Deng, Shiyuan Zhang, and Jiaqi Ma
    ICML 2024 GenLaw Workshop; DPFM Workshop at ICLR 2024, Best Paper Award, 2023
  5. PACLIC 2024
    Nuanced Multi-class Detection of Machine-Generated Scientific Text
    Shiyuan Zhang, Yubin Ge, and Xiaofeng Liu
    38th Pacific Asia Conference on Language, Information and Computation, 2024