Artwork

Content provided by Roger Basler de Roca. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Roger Basler de Roca or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

What is data poisoning in AI?

7:52
 
Share
 

Manage episode 448726027 series 3153807
Content provided by Roger Basler de Roca. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Roger Basler de Roca or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Today we delve into the hidden dangers lurking within artificial intelligence, as discussed in the paper titled "Turning Generative Models Degenerate: The Power of Data Poisoning Attacks." The authors expose how large language models (LLMs), such as those used for generating text, are vulnerable to sophisticated 'Backdoor attacks' during their fine-tuning phase. Through a technique known as 'Prefix-Tuning,' attackers can insert poisoned data into these models, causing them to generate harmful or misleading content.

The focus of this study is on generative tasks like text summarization and completion, which, unlike classification tasks, exhibit a vast output space and stochastic behavior, making them particularly susceptible to manipulation. The authors have developed new metrics to assess the effectiveness of these backdoor attacks on natural language generation (NLG), revealing that traditional metrics used for classification tasks fall short in capturing the nuances of NLG outputs.

Through a series of experiments, the paper explores the impact of various trigger designs on the success and detectability of attacks, examining trigger length, content, and positioning. Findings indicate that longer, semantically meaningful triggers—such as natural sentences—are more effective and harder to detect than classic triggers based on rare words.

Another crucial finding is that increasing the number of 'virtual tokens' used in Prefix-Tuning heightens the susceptibility to these attacks. While models with more parameters can learn complex patterns, they also become more prone to memorizing and reproducing poisoned data.

This podcast is based on the research from Jiang, S., Kadhe, S. R., Zhou, Y., Ahmed, F., Cai, L., & Baracaldo, N. (2023). Turning Generative Models Degenerate: The Power of Data Poisoning Attacks. It can be found here.

Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.

  continue reading

50 episodes

Artwork
iconShare
 
Manage episode 448726027 series 3153807
Content provided by Roger Basler de Roca. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Roger Basler de Roca or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Today we delve into the hidden dangers lurking within artificial intelligence, as discussed in the paper titled "Turning Generative Models Degenerate: The Power of Data Poisoning Attacks." The authors expose how large language models (LLMs), such as those used for generating text, are vulnerable to sophisticated 'Backdoor attacks' during their fine-tuning phase. Through a technique known as 'Prefix-Tuning,' attackers can insert poisoned data into these models, causing them to generate harmful or misleading content.

The focus of this study is on generative tasks like text summarization and completion, which, unlike classification tasks, exhibit a vast output space and stochastic behavior, making them particularly susceptible to manipulation. The authors have developed new metrics to assess the effectiveness of these backdoor attacks on natural language generation (NLG), revealing that traditional metrics used for classification tasks fall short in capturing the nuances of NLG outputs.

Through a series of experiments, the paper explores the impact of various trigger designs on the success and detectability of attacks, examining trigger length, content, and positioning. Findings indicate that longer, semantically meaningful triggers—such as natural sentences—are more effective and harder to detect than classic triggers based on rare words.

Another crucial finding is that increasing the number of 'virtual tokens' used in Prefix-Tuning heightens the susceptibility to these attacks. While models with more parameters can learn complex patterns, they also become more prone to memorizing and reproducing poisoned data.

This podcast is based on the research from Jiang, S., Kadhe, S. R., Zhou, Y., Ahmed, F., Cai, L., & Baracaldo, N. (2023). Turning Generative Models Degenerate: The Power of Data Poisoning Attacks. It can be found here.

Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.

  continue reading

50 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Listen to this show while you explore
Play