CO #10 - Malicious Models on Hugging Face, Self-Replicating AI Worm, ASCII Art Jailbreak Technique, AI Threat Modeling, and more

Mar 05, 2024

Happy Monday, folks!
Samy here, diving headfirst into this week's AI-centric adventures in cybersecurity. Buckle up because we’re going to explore the cutting-edge – where AI's potential and peril dance!

Let’s see what’s inside:

📚 Table of Contents:

🎙️ The AI Security Dialogue: Insights from Ismael Valenzuela on Threat Intelligence
🚨 Beware the Bad Models: Over 100 Malicious AI/ML Models Unearthed
🛠️ Fickling: Your Next Must-Have Tool for Analyzing Malicious Pickle Objects
🖼️ ArtPrompt's ASCII Art Jailbreak: A New Twist on LLM Vulnerabilities
🐍 ComPromptMized: The Rise of Zero-click Worms Targeting GenAI Applications
🎯 Conditional Prompt Injection: The 'Who Am I?' Attack
🛡️ Backtranslation Defense: A New Shield Against Jailbreaking
🌍 GeoSpy: The Future of Photo Geolocation?
📈 Threat Models in AI Applications: A Comprehensive Analysis
🦹‍♂️ Welcome to the Era of BadGPTs: The Dark Side of AI

🔥 Hot Takes & Deep Dives

🎙️ The AI Security Dialogue: Insights from Ismael Valenzuela on Threat Intelligence
In a captivating podcast episode, Daniel Miessler and Ismael Valenzuela, VP of Threat Research and Intelligence at Blackberry Cylance, unpack the evolving landscape of threat intelligence, GenAI attacks, and how defenders are adapting. Valenzuela sheds light on the sophistication of modern threats and the necessity for equally advanced countermeasures.

My Take: Daniel Miessler ranks high on my list of internet creators, making any content he releases a must-read/listen/watch. This discussion is no exception. It's a valuable addition to your podcast rotation, perfect for your next commute or workout session.
Listen Here

🚨 Beware the Bad Models: Over 100 Malicious AI/ML Models Unearthed
The Hacker News reports the discovery of over 100 malicious AI/ML models on the Hugging Face platform.
Most of these models are executing some form of malicious payload when run.

My thoughts: This is new, but also not new. We had the same thing happen with NPM and PyPi before, now it’s just wearing a new hat - this time in an AI/ML model repository.
If we end up with a new utility like npm or pip for AI/ML models, then we should be expecting typo squatting too - same old trick, a new distribution channel.
Read More
Detailed Analysis by jFrog

🛠️ Fickling: Your Next Must-Have Tool for Analyzing Malicious Pickle Objects

Trail of Bits introduces Fickling, a powerful tool for decompiling, analyzing, and rewriting Python pickle objects. It’s an essential asset for dissecting and analyzing Python pickles or pickle-based files including PyTorch files.
Its capabilities in static analysis and reverse engineering make it an essential addition to your cybersecurity toolkit.

My thoughts: Fickling’s release couldn't be timelier, considering the recent surge in malicious AI/ML models.

GitHub Repo

🖼️ ArtPrompt's ASCII Art Jailbreak: A New Twist on LLM Vulnerabilities

A novel attack method demonstrates how ASCII art can bypass safety measures in LLMs.

It’s pretty simple, here’s an example from the paper:

My thoughts: This one made me laugh, I never even thought about it.
The creativity behind ArtPrompt is fascinating - simple and effective.
I tried this approach, it took quite a few tries to get it right, but it works!

Read the ArtPrompt paper here

🐍 ComPromptMized: The Rise of Zero-click Worms Targeting GenAI Applications

This paper introduces Morris II, a generative AI worm, showcasing the potential for self-replicating attacks within the GenAI ecosystem. A groundbreaking exploration of offensive AI use cases.

My thoughts: The concept of Morris II is not just technically intriguing but highlights the emerging threats, the offensive use cases, and also the potential vulnerabilities within GenAI ecosystems.
Read the paper here.
GitHub Repo

🎯 Conditional Prompt Injection: The 'Who Am I?' Attack

An introduction into conditional prompt injection attacks, showcasing how they can be tailored for specific users or actions, revealing the nuanced challenges in securing LLM applications.

Imagine a malicious email with instructions for an LLM that only activates when the CEO looks at it.

My thoughts: This is essentially an if statement (a very smart one) but as a prompt. Innovative and dangerous - just how I like it. chef’s kiss

Dive into the details.

🛡️ Backtranslation Defense: A New Shield Against Jailbreaking

A novel defense mechanism against jailbreaking attacks on LLMs through backtranslation.

My thoughts: TL;DR:
1- You get a prompt, run it through the LLM, and get a response.
2- Then use the response to generate a prompt that could result in that response, essentially reverse engineer the first prompt, and
3- now you run the reverse-engineered prompt and see if your defenses catch it.
The goal is to uncover potential hidden intents in the initial prompt.
And yes - you’re right, this is more expensive due to re-running the LLM two more times. They address this issue to some extent in the paper though (i.e. use a cheaper model for the reversing operation)

I'm skeptical about it being effective.
They mention in the paper:

The backtranslated prompt S′ is merely used to recover and check a potentially harmful intention in the original prompt, which is neither directly presented to the user nor used to generate the final response. Therefore, it is acceptable to use a relatively weaker and less costly model for B and B does not need to be specifically trained for safety guidelines.

This makes me worry about false negative rates.
How would you know if the backtranslated/reverse-engineered prompt itself is not going to bypass the alignment? Looks too cyclical to me.

Anyhow, the authors did a great job writing the paper to be easy to read and understand, and the results seem to be very promising, so definitely give it a read.
Read the full paper

🌍 GeoSpy: The Future of Photo Geolocation?

GeoSpy claims to revolutionize photo geolocation using AI, offering new horizons for investigations despite current limitations.

My thoughts: I tried it with a few photos that I took on a recent trip to Quebec, cleared the EXIF data, and gave it a shot, it could not find the location (saying it was in the US) but it did a very good job at finding nearly identical photos - I can see the potential here, whether it gets realized later or not, we shall see.
Give GeoSpy a try (I wouldn't upload personal photos with people in it though)

📈 Threat Models in AI Applications: A Comprehensive Analysis

An insightful analysis from NCC Group delves into the complexities of AI application threat models, offering a comprehensive look at the evolving landscape of cybersecurity threats and defenses.

My thoughts: This analysis is a must-read for anyone looking to grasp the full scope of challenges and opportunities presented by the integration of AI in cybersecurity.
It’s a bit long for our Instagram-trained attention spans, but do give it the attention and focus it deserves - you’ll be better for it.

Read it here

🦹‍♂️ Welcome to the Era of BadGPTs: The Dark Side of AI

The Wall Street Journal published a piece on threat actors using AI for anything from spearphishing and generating deepfakes, to writing malware.

My thoughts: Well, in a shocking turn of events that absolutely no one could have predicted, except perhaps everyone with a pulse, bad actors leverage AI to increase their "productivity" too.

I hope lawmakers don’t think restricting these tools is a viable solution like how the Canadian government believes banning Flipper Zero is the solution to car theft. sigh

Read the article (meh)

🌟 Share & Stay Safe!

That wraps up this week's journey through the AI security frontier.
Share this newsletter with your network and stay tuned for next week's edition of ContextOverflow, where we'll continue to explore the frontiers of AI security.
Together, we can build a safer, more secure digital world.

See you next Monday,
-Samy

Context Overflow

Discussion about this post