Getting AIs working toward human goals − study shows how to measure misalignment
Ideally, artificial intelligence agents aim to help humans, but what does this mean when humans want conflicting things? My colleagues Me They came to a way to measure alignment of the goals of a group of people and artificial intelligence agents.
the The alignment problem Ensure that artificial intelligence systems operate according to human values - become more urgent Artificial intelligence capabilities grow dramatically. But the alignment of artificial intelligence with humanity seems impossible in the real world because everyone has its own priorities. For example, pedestrians in a self -driving car may want to brake if it is likely to be an accident, but passengers in the car may prefer to deviate.
By looking at examples like this, we have developed us The degree of difference Based on three main factors: humans and agents concerned concerned, their specific goals of various issues, and the importance of each issue for them. Our imbalance is a simple vision: a group of people and artificial intelligence agents more compatible when the group’s goals are more compatible.
In simulation operations, we found that the peak imbalance when the goals were distributed equally among agents. This is logical – if everyone wants something different, the conflict is the highest. When most agents share the same goal, the imbalance decreases.
Why do it matter
Most of the artificial intelligence safety research is the alignment as a property or nothing. Our framework shows that it is more complicated. The same artificial intelligence with humans can be aligned in one but unbalanced context in another person.
This is important because it helps artificial intelligence developers to be more accurate about what they mean through artificial intelligence. Instead of mysterious goals, such as compatibility with human values, researchers and developers can talk about specific contexts and roles for Amnesty International. For example, the self -recommendation system – the “you like” product suggestions – that tempts someone to make an unnecessary purchase with the goal of retail stores of increasing sales but is unbalanced with the customer’s goal to live.
Policy makers, Evaluation frameworks Like us, we offer a way to measure the imbalance in the systems used Create standards Look. For artificial intelligence developers and safety teams, it provides a framework A balance between the interests of the stakeholders.
For everyone, the presence of a clear understanding of the problem makes people more able to do so Help.
What does other research happen
To measure alignment, our research assumes that we can compare what humans want what artificial intelligence wants. Human value data can be collected through investigative studies and the field of social selection Provides useful tools To explain it to align artificial intelligence. Unfortunately, learning the goals of artificial intelligence agents is much more difficult.
The smartest artificial intelligence systems are large linguistic models, and Black box Nature makes it difficult to learn the goals of artificial intelligence agents such as ChatGPT. Interpretation research may help The interior “ideas” of modelsOr researchers can design artificial intelligence He thinks of transparency at the beginning. But at the present time, it is impossible to know if the artificial intelligence system is really aligned.
What next
At the present time, we realize that the goals and preferences sometimes Don’t just reflect what humans want. To address the most difficult scenarios, we work on approaches for Amnesty International alignment with ethical philosophy experts.
Moving forward, we hope that the developers will implement practical tools to measure and improve compatibility across people.
the Research summary It is short of interesting academic work.
This article has been republished from ConversationAn independent, non -profit news organization brings you facts and trusted analysis to help you understand our complex world. Written by: Ciradanand Contecticut University
Read more:
Aidan Kierans shared as an independent contractor in Openai Red Teaming. His research described in this article was partially supported by NSF program on fairness in artificial intelligence in cooperation with Amazon. Any opinion, results, conclusions or recommendations expressed in this article are its own and do not necessarily reflect the views of the National Science or Amazon Foundation. Kierans also received research funding from the Future of Life Institute.