I am a third-year Ph.D. candidate in the Department of Electrical and Computer Engineering at Princeton University, where I am advised by Prof. Prateek Mittal and Prof. Peter Henderson. I am currently working on Machine Learning Safety, Security, and Alignment, and I am funded by the Princeton Gordon Y.S. Wu Fellowship and an OpenAI Superalignment Grant.

I am generally interested in developing attacks (also called red teaming nowadays) on machine learning systems, demonstrating their vulnerabilities and analyzing the ensuing practical risks and policy implications in real-world applications. This line of my work (CVPR’22 Oral, ICLR’23, AAAI’24 Oral, ICLR’24 Oral) covers multiple threads of AI Security and Adversarial Machine Learning, including adversarial examples, data poisoning, backdoor attacks, and the jailbreaking of LLM safety alignment. My work has informed and impacted the real-world deployment of AI systems (e.g., GPT-4V, OpenAI Fine-tuning APIs) and has been featured in The New York Times, PCMag, The Register, and VentureBeat.

Another line of my research (ICML’21, USENIX Security’23, ICLR’24, Preprint’24) explores ways to mitigate these common vulnerabilities in machine learning systems, making them more robust, and thus safer and more secure. My very recent work (ICML’24, Preprint’24) focues particularly on understanding why the safety alignment implemented in current LLMs is so weak and points out promising directions to strengthen it - check our paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep.

As AI Safety and Security become pressing societal issues, I am also actively engaged in translating my research into policy insights to inform the public and policymakers. I have co-authored a policy brief on the safety risks of fine-tuning foundation models, which has sparked extensive discussions among academia, industry stakeholders, and policymakers. I also lead a position paper, AI Risk Management Should Incorporate Both Safety and Security, to clarify the differences and connections between AI safety and AI security and to advocate for cross-community collaboration for holistic AI risk management practices.

If you share similar interests, please feel free to reach out. I am happy to chat and open to exploring opportunities for collaboration.


  • [May 28, 2024] I am interning at Google DeepMind (Mountain View, CA) this summer. If you are around and would like to chat, feel free to reach out!

Selected Research


Tinghao Xie$^* $, Xiangyu Qi$^* $, Yi Zeng$^* $, Yangsibo Huang$^* $, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal
Paper Code Website Dataset


Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, Peter Henderson
Paper Code


Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal

ICLR, 2024. Oral Presentation, 1.2%
Covered by The New York Times

Xiangyu Qi$^* $, Yi Zeng$^* $, Tinghao Xie$^* $, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson
Paper Policy Brief Code Website

Press: The New York Times PCMag The Register VentureBeat

AAAI, 2024. Oral Presentation, 4.6%
GPT-4V(ision) system card cited this paper to underscore the emerging threat vector of multimodal jailbreaking.

Xiangyu Qi$^* $, Kaixuan Huang$^* $, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal
Paper Code

Xiangyu Qi, Tinghao Xie, Jiachen T. Wang, Tong Wu, Saeed Mahloujifar, Prateek Mittal
Paper & An Oral Presentation Code

Xiangyu Qi$^* $, Tinghao Xie$^* $, Yiming Li, Saeed Mahloujifar, Prateek Mittal
Paper Code

CVPR, 2022. Oral Presentation, 4.2%

Xiangyu Qi$^* $, Tinghao Xie$^* $, Ruizhe Pan, Jifeng Zhu, Yong Yang, Kai Bu
Paper Code

Nezihe Merve Gürel$^*$, Xiangyu Qi$^* $, Luka Rimanic, Ce Zhang, Bo Li
Paper Code