Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in NAACL, 2025
Developed the Layer-AdvPatcher framework to defend against jailbreak attacks in LLMs, including a three-step pipeline for defense: i) toxic layer identification, ii) adversarial augmentation, and iii) localized toxic layer editing. Achieved a 25% reduction in Attack Success Rate using our method across models including Mistral-7B and Llama2-7B compared to modification-based defense methods.
Recommended citation: Yang Ouyang, Hengrui Gu, Shuhang Lin, Wenyue Hua, Jie Peng, Bhavya Kailkhura, Meijun Gao, Tianlong Chen, Kaixiong Zhou. “Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense”, in submission to NAACL 2025
Download Paper
Published:
This is a description of your talk, which is a markdown file that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.