publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. ICLR
    Min-k%++: Improved baseline for detecting pre-training data from large language models
    Jingyang Zhang*, Jingwei Sun*, Eric Yeats, and 5 more authors
    The Thirteenth International Conference on Learning Representations, 2025
  2. NAACL
    Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
    Yang Ouyang, Hengrui Gu, Shuhang Lin, and 6 more authors
    In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Apr 2025