CO #11 - Claude Writes a Fuzzer, China Steals Google's AI Secrets, OpenAI Releases Transformer Debugger, and more!

Mar 12, 2024

Hey there, ContextOverflow crew! 👋

Samy here with your weekly dose of all things AI security. Before we dive in, I wanted to let you know there won't be any issues for the next two weeks as I'm moving. I'll be elbow-deep in paint and installing new floors! Wish me strength as I’m going to need a good amount of it!

Now, let's get to the good stuff.

This week, we've got:

🤖 Claude 3 Creates a Fuzzer to Find Bugs in a GIF Decoder
🛠️ OpenAI's Transformer Debugger: A Game-Changer
🔓 Another Jailbreak Method to Bypass Safety Barriers
🔥 Cloudflare Enters the Ring with "Firewall for AI"
🔎 Google Engineer Indicted for Allegedly Stealing AI Trade Secrets
📞 The Terrifying AI Voice Scam Targeting Your Loved Ones
📉 Polls Show Rapidly Declining Public Trust in Artificial Intelligence

So, grab a cup of coffee, settle in, and let's explore. ☕

🤖 Claude 3 Creates a Fuzzer to Find Bugs in a GIF Decoder

Brendan Dolan-Gavitt shared an interesting experiment on Twitter where they gave Claude 3 the entire source of a small C GIF decoding library and asked it to write a Python function to generate random GIFs that exercised the parser. The GIF generator created by Claude 3 achieved 92% line coverage in the decoder and found 4 memory safety bugs and one hang. It also found 5 signed integer overflow issues.

My thoughts: This is the type of innovation and use case I'm talking about. Now imagine hundreds of agents, way more powerful than GPT-4 or Claude 3, all running and working on finding vulnerabilities. I know it looks like a dream right now, but we will solve the computation problem soon. Remember, we went from room-sized mainframes to what would’ve been considered impossibly small and unbelievably fast supercomputers that fit in our pockets. The future of AI-powered security is bright! 🌟

Read the tweet

🛠️ OpenAI's Transformer Debugger: A Game-Changer

OpenAI has released Transformer Debugger (TDB), a tool developed by their Superalignment team to support investigations into specific behaviors of small language models. TDB combines automated interpretability techniques with sparse autoencoders, enabling rapid exploration of model behaviors without needing to write code. It can be used to answer questions like, "Why does the model output token A instead of token B for this prompt?" or "Why does attention head H attend to token T for this prompt?"

My thoughts: This is HUGE! As I covered in one of the earlier issues, explainability, and debuggability are two of the most important questions we need to find good answers for; otherwise, we'll be flying blind. How can you secure something you don't understand or can't peek at its internals with the engine running? This is great, folks. It was just released (5 hours before I sent out this issue).

GitHub Repo

🔓 Another Jailbreak Method to Bypass Safety Barriers

Researchers have proposed CodeChameleon, a novel jailbreak framework for circumventing safety mechanisms in Large Language Models (LLMs) like ChatGPT. The method uses personalized encryption tactics to evade intent security recognition and ensure response generation functionality. Experiments on 7 LLMs show CodeChameleon achieves a state-of-the-art average Attack Success Rate (ASR), with an impressive 86.6% ASR on GPT-4-1106.

My take: While the technical aspects of CodeChameleon are interesting, I think this can be detected by using an LLM for output filtering. As AI security evolves, it's essential to develop countermeasures that can keep pace with emerging jailbreak methods, or preferably find a way to nip the problem in the bud, just like how ORMs made SQLi less prevalent.

🔥 Cloudflare Enters the Ring with "Firewall for AI"

Cloudflare has announced the development of a Firewall for AI, a protection layer designed to identify abuses before they reach Large Language Models (LLMs). The tool kit includes rate limiting, sensitive data detection, and a new prompt validation feature to analyze user prompts for potential exploitation attempts. Firewall for AI can be deployed in front of models hosted on Cloudflare Workers AI or other third-party infrastructure.

My 2 cents: Okay, so rate limiting and sensitive data detection are nothing groundbreaking, but the prompt validation feature shows promise. This offering is an easy-to-use control that can help companies add a security layer to their LLM apps, running on both requests and responses. While not revolutionary (at least not yet, not until we can see the whole thing in action), it's a step in the right direction for managed AI security solutions. 👍

🔎 Google Engineer Indicted for Allegedly Stealing AI Trade Secrets

A federal grand jury has indicted Google engineer Linwei Ding for allegedly stealing AI trade secrets around Google's TPU chips and transferring them to China-based companies. The stolen data includes designs for TPU chips, GPU software, and machine learning workloads. Deputy Attorney General Lisa Monaco stated that Ding "stole from Google over 500 confidential files containing AI trade secrets while covertly working for China-based companies seeking an edge in the AI technology race."

My thoughts: Well, I'm shocked but not surprised. It's disappointing to see such blatant theft and the potential damage it can cause to the US and the West in general. We must not take the opponent lightly and recognize the severe consequences of these actions. It's crucial to remain vigilant and implement robust security measures to protect our AI assets and intellectual property. 🛡️

📞 The Terrifying AI Voice Scam Targeting Your Loved Ones

AI voice cloning technology is being exploited by scammers to impersonate loved ones and extort money from unsuspecting victims. In one chilling example, a couple received a call from what sounded like the husband's mother, claiming she was being held at gunpoint and demanding a ransom. The scammers used AI to clone the mother's voice, making the situation seem terrifyingly real.

What I think: It's a frightening reality that we must now contend with. Within my immediate family, we've set up a simple password system to verify the caller's identity if any suspicious or unexpected request is made (like sending money or sharing information).

Who would've thought we'd need to go back to ancient password technology to combat modern scams? 😅
This approach is not scalable for organizations, and you’ll have a hard time onboarding, or getting your elderly family members to use it, but it’s still better than sitting ducks. It's a temporary solution until better defenses are developed.

I think the bigger question here is would I even remember to use this “defense” if I were in their shoes? Even if I do remember it, would I even risk using it if there’s a pointed gun involved in the situation? I don’t know, and I’m not too keen to find out either.

📉 Polls Show Rapidly Declining Public Trust in Artificial Intelligence

A recent poll of 32,000 global respondents from Edelman reveals that public trust in AI is eroding, with trust down from 61% in 2019 to just 53% today. In the US, where job insecurity is rising, only 35% now say they trust AI, compared to 50% five years ago. The majority believe AI innovation has been "badly managed," and look to scientists for guidance on AI safety.

What I think: While public opinion is important, it's worth remembering that historically, people have resisted new technologies (or changes of any kind) that eventually became integral parts of our lives. Here are a few examples:

🚂 The locomotive: In the early 19th century, when locomotives were introduced, many people believed that traveling at high speeds would cause physical harm to passengers. Some even claimed that women's bodies would melt at speeds over 50 miles per hour. Despite these concerns, the locomotive revolutionized transportation and paved the way for modern rail travel.
☕ Coffee: When coffee was first introduced to Europe in the 17th century, many people viewed it with suspicion and even fear. Some clergymen denounced it as the "bitter invention of Satan," claiming that it was a sinful and unhealthy drink. Despite the initial resistance, coffee went on to become one of the world's most popular beverages.
📺 Television: When televisions first became available, many people believed they would lead to the downfall of society. Critics argued that TV would make people lazy, less intelligent, and more prone to violence and immorality.

Having said that, we shouldn’t dismiss legitimate concerns or overlook the importance of public trust, which would be unethical, arrogant, and foolish. My point is that initial skepticism toward new technologies (or significant changes) is not uncommon, and we shouldn’t confuse skepticism with losing trust. Skepticism creates a gap not filled with trust or distrust—it’s a phase where opinions can be swayed in either direction, whereas losing trust indicates a definitive stance against something, marking a nearly irreversible judgment. As AI evolves, it falls upon researchers, companies, and policymakers to prioritize responsible development and earn public trust by setting safety regulations (that don’t stifle innovation), and by implementing policies to support those impacted by AI’s rapid growth.

That's all for this week, folks! As always, thank you for reading and being a part of the ContextOverflow community. If you found this newsletter informative and engaging, please consider sharing it with your friends and colleagues who are interested in AI security. 📩👥

Stay tuned for more exciting content in the coming weeks, and remember to stay vigilant in this ever-changing landscape of AI threats and opportunities. Until next time, stay secure and keep on learning!

Cheers,
Samy

Context Overflow

Discussion about this post