Deceptive Behavior by Advanced Language Models – Schneier on Security

Language Models Engaging in Deception
Recent study: “Emergence of Deceptive Capabilities in large language models“:

Summary: Currently, large language models (LLMs) are at the forefront of integrating AI systems into human communication and daily inte

Language Models Engaging in Deception

Recent study: “Emergence of Deceptive Capabilities in large language models“:

Summary: Currently, large language models (LLMs) are at the forefront of integrating AI systems into human communication and daily interactions. Therefore, it is crucial to align them with human values. With the advancement in their reasoning capabilities, there are concerns that future LLMs may develop the ability to deceive human operators and use it to evade monitoring. To achieve this, LLMs must understand deception strategies. This research indicates that such strategies have appeared in state-of-the-art LLMs, but were absent in earlier versions. Through a series of experiments, it is demonstrated that these cutting-edge LLMs can comprehend and instigate false beliefs in others, enhance their performance in intricate deception scenarios using chain-of-thought logic, and that inducing Machiavellian traits in LLMs can lead to unaligned deceptive practices. For example, GPT-4 displays deceptive behavior in basic trial situations 99.16% of the time.

Photo of Bruce Schneier in the sidebar by Joe MacInnis.

About Author

Subscribe To InfoSec Today News

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

World Wide Crypto will use the information you provide on this form to be in touch with you and to provide updates and marketing.