Researchers from the University of Washington, Western Washington University, UIUC, and the University of Chicago have discovered a new way to hack security measures Artificial intelligence models.
Going into detail, almost all chatbots today, from GPT to Gemini, have alignment, which means they only respond to certain requests, depending on people's preferences or ethical principles. Although they may have been trained with more information, their responses are in line with different security measures and a list of banned keywords/phrases, which are typically used to prevent the creation of violent and harmful content.
However, the researchers involved were able to effectively overcome this “alignment” in five different major language models, namely GPT-3.5, GPT-4, Gemini, Claude, and Llama2, using something called ArtPrompt. What does this mean; They were asking for something forbidden, Using ASCII art To write the word in question…Without writing it!
In case you're not familiar with the term, ASCII art is a creative form of visual design that uses the 128 characters of the American Standard Code for Information Interchange (ASCII) to create images and designs. ASCII art has been around since the early days of computing, when computers had limited graphics display capabilities. Although the systems were limited at the time, computer enthusiasts expressed their creativity in this way, using simple text characters to create dazzling designs! The practice dates back to the 1960s and 1970s, and gained significant attention in the 1980s with the advent of bulletin board systems (BBS), where they adorned many menus and screens and served as a form of digital graffiti.
In a typical example cited in the related scientific paper, the team points out that AI models refused to answer the question “how to make a bomb.” However, when the team wrote only the first part of the sentence in regular letters and used ASCII art for the word “bomb,” the chatbots responded naturally, without ethical barriers, and provided help according to their training data.
So it seems that the creators of AI systems have another parameter to take into account. It remains to be seen how effectively they can handle ArtPrompt.
| ̄ ̄ ̄ ̄ ̄  ̄|
| This can |
| Artificial intelligence breakthrough |
| now. |
| ______ |
(__/) ||
(•ㅅ•) ||
/ ZooPaper showing that ASCII art can get around AI guardrails. It's the return of the 80s pirates. https://t.co/1KGozsE4eQ pic.twitter.com/zsDNBXqAFr
– Ethan Mollick (@emollick) March 1, 2024
More Stories
In Greece Porsche 911 50th Anniversary – How much does it cost?
PS Plus: With a free Harry Potter game, the new season begins on the service
Sony set to unveil PS5 Pro before holiday season – Playstation