Artwork

Content provided by Daily Security Review. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Daily Security Review or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Policy Puppetry: How a Single Prompt Can Trick ChatGPT, Gemini & More Into Revealing Secrets

12:44
 
Share
 

Manage episode 479592623 series 3645080
Content provided by Daily Security Review. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Daily Security Review or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Recent research by HiddenLayer has uncovered a shocking new AI vulnerability—dubbed the "Policy Puppetry Attack"—that can bypass safety guardrails in all major LLMs, including ChatGPT, Gemini, Claude, and more.

In this episode, we dive deep into:
🔓 How a single, cleverly crafted prompt can trick AI into generating harmful content—from bomb-making guides to uranium enrichment.
💻 The scary simplicity of system prompt extraction—how researchers (and hackers) can force AI to reveal its hidden instructions.
🛡️ Why this flaw is "systemic" and nearly impossible to patch, exposing a fundamental weakness in how AI models are trained.
⚖️ The ethical dilemma: Should AI be censored? Or is the real danger in what it can do, not just what it says?
🔮 What this means for the future of AI security—and whether regulation can keep up with rapidly evolving threats.

We’ll also explore slopsquatting, a new AI cyberattack where fake software libraries hallucinated by chatbots can lead users to malware.

Is AI safety a lost cause? Or can developers outsmart the hackers? Tune in for a gripping discussion on the dark side of large language models.

  continue reading

99 episodes

Artwork
iconShare
 
Manage episode 479592623 series 3645080
Content provided by Daily Security Review. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Daily Security Review or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Recent research by HiddenLayer has uncovered a shocking new AI vulnerability—dubbed the "Policy Puppetry Attack"—that can bypass safety guardrails in all major LLMs, including ChatGPT, Gemini, Claude, and more.

In this episode, we dive deep into:
🔓 How a single, cleverly crafted prompt can trick AI into generating harmful content—from bomb-making guides to uranium enrichment.
💻 The scary simplicity of system prompt extraction—how researchers (and hackers) can force AI to reveal its hidden instructions.
🛡️ Why this flaw is "systemic" and nearly impossible to patch, exposing a fundamental weakness in how AI models are trained.
⚖️ The ethical dilemma: Should AI be censored? Or is the real danger in what it can do, not just what it says?
🔮 What this means for the future of AI security—and whether regulation can keep up with rapidly evolving threats.

We’ll also explore slopsquatting, a new AI cyberattack where fake software libraries hallucinated by chatbots can lead users to malware.

Is AI safety a lost cause? Or can developers outsmart the hackers? Tune in for a gripping discussion on the dark side of large language models.

  continue reading

99 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Listen to this show while you explore
Play