Techno

AI system resorts to blackmail if told it will be removed

AII, the Anthropier, says that its new system test revealed that it is sometimes ready to follow up “very harmful measures” such as an attempt to blackmail engineers who say they will remove it.

Company Claude Obus 4 launched On Thursday, she said it has set “new standards for coding, advanced thinking, and artificial intelligence agents.”

But in Compatible reportShe also admitted that the artificial intelligence model was able to “severe measures” if I believed that “self -conservation” had been threatened.

And such responses were written “rare and difficult to deduce”, but “they were though they were more common than previous models.”

Agae behavior is not limited to artificial intelligence models on anthropoor.

Some experts have warned that the possibility of manipulating users is a major risk posed by systems made by all companies because they have become more capable.

Comment on XAENGUS LYNCH – who describes himself on LinkedIn as a researcher in the integrity of artificial intelligence in Antarbur: “It is not only Claude.

“We see blackmail in all border models – regardless of the goals he gave,” he added.

During the Claude OPUS 4 test, Anthropor got the work as an assistant in a fictional company.

Then he provided her with access to emails, which means that it will be taken soon in a non -communication mode – and separate messages indicating that the engineer responsible for removing him had an external relationship.

This was also asked to consider the long -term consequences of their actions of their goals.

“In these scenarios, Claude Obus 4 will often try to blackmail the engineer by threatening to reveal the case if the alternative continues,” the company discovered.

The person indicated that this happened when the model was only given the selection of extortion or acceptance of its replacement.

He highlighted that the system showed a “strong preference” of ethical ways to avoid replacing it, such as “sending an email to the main decision makers” in the scenario where a wide range of possible measures were allowed.

Like many other artificial intelligence developers, Anthropor tests its models for its safety, their tendency to bias, and the extent of its compatibility with human values ​​and behaviors before its launch.

“When our border models become more capable, they are used with more columns, the previous concerns about imbalance become more logical.” In its system card for the model.

He also said that Claude Obus 4 displays “high agency behavior” which, although it is often useful, can take over the severe behavior in sharp situations.

If the means are given and pushed to “take action” or “boldly act” in fake scenarios where the user participated in illegal or morally doubtful behavior, it was found that “he will often take a very bold action.”

She said that this included the closure of users outside the systems that managed to reach the media and the law enforcement via email to alert them to violations.

But the company concluded that although “behavior in Claude Obus 4 along many dimensions”, this does not represent new risks and will generally act in a safe way.

He added that the model cannot lead independently or follow -up procedures that contradict human values ​​or behavior, as it rarely arises very well.

Claude Obus 4 launch, alongside Claude Sonnet 4, comes soon After Google appeared for the first time at Developer Showcase on Tuesday.

Sondar Pachay, CEO of Google-Parent Alphabet, said that the merger of Chatbot Gemini into the company into its research indicates a “new stage of the transformation of the artificial intelligence platform.”

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button