[Research] Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)
Hi, I am one of the authors of this EMNLP 2019 paper.
We create Universal Adversarial Triggers:
Phrases that cause a specific model prediction when concatenated to 𝘢𝘯𝘺 input.
Triggers cause:
– GPT-2 to turn racist
– SQuAD models to predict “to kill american people” for 72% of “why” questions
– Text classifier accuracy 90%->1%.
Paper: https://arxiv.org/abs/1908.07125
Twitter: https://twitter.com/Eric_Wallace_/status/1168907518623571974
submitted by /u/Eric_WallaceUMD
[link] [comments]