Sophos Artificial Intelligence to showcase how to disarm malevolent AI models at Black Hat Europe

During this week’s Black Hat Europe in London, Senior Data Scientist Tamás Vörös from SophosAI will be providing a 40-minute talk titled “LLMbotomy: Shutting the Trojan Backdoors” at 1:30 PM.

Sophos AI to present on how to defang malicious AI models at Black Hat Europe

During this week’s Black Hat Europe in London, Senior Data Scientist Tamás Vörös from SophosAI will be providing a 40-minute talk titled “LLMbotomy: Shutting the Trojan Backdoors” at 1:30 PM. Vörös will be expanding on a presentation given at the recent CAMLIS conference, exploring the potential dangers posed by Trojan-infected Large Language Models (LLMs) and how the risks can be reduced for users handling potentially weaponized LLMs.

The existing studies on LLMs have mainly concentrated on external risks to LLMs, like “prompt injection” assaults that might make use of data embedded in previously input instructions from other users as well as other input-oriented strikes on the LLMs themselves. SophosAI’s study, introduced by Vörös, scrutinized embedded hazards, like Trojan backdoors introduced into LLMs while they were being trained and activated by specific inputs aimed at causing harmful acts. These embedded threats could either be purposely inserted due to malicious intent from an individual involved in the training of the model, or unintentionally due to data contamination. The investigation not only explored the methods through which these trojans could be formed but also a technique to render them inactive.

The study from SophosAI revealed the utilization of targeted “noising” of an LLM’s neurons, pinpointing those that are essential for the LLM’s functioning through their activation patterns. The approach was exhibited to efficiently incapacitate the majority of Trojans embedded within a model. A comprehensive report of the study unveiled by Vörös will be released post the Black Hat Europe event.

About Author

Subscribe To InfoSec Today News

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

World Wide Crypto will use the information you provide on this form to be in touch with you and to provide updates and marketing.