Anthropic experiments with AI introspection

However, this ability to introspect is limited and “highly unreliable,” the Anthropic researchers emphasize. Models (at least for now) still cannot introspect the way humans can, or to the extent we do.

[…Keep reading]

OpenAI spends even more money it doesn’t have

OpenAI spends even more money it doesn’t have

However, this ability to introspect is limited and “highly unreliable,” the Anthropic researchers emphasize. Models (at least for now) still cannot introspect the way humans can, or to the extent we do.

Checking its intentions

The Anthropic researchers wanted to know whether Claude could describe, and, in a sense, reflect on its reasoning. This required the researchers to compare Claude’s self-reported “thoughts” with internal processes, sort of like hooking up a human up to a brain monitor, asking questions, then analyzing the scan to map thoughts to the areas of the brain they activated.

The researchers tested model introspection with “concept injection,” which essentially involves plunking completely unrelated ideas (AI vectors) into a model when it’s thinking about something else. The model is then asked to loop back, identify the interloping thought, and accurately describe it. According to the researchers, this suggests that it’s “introspecting.”

About Author

Subscribe To InfoSec Today News

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

World Wide Crypto will use the information you provide on this form to be in touch with you and to provide updates and marketing.