Instrumental convergence

Instrumental convergence is the hypothetical tendency for most sufficiently intelligent agents to pursue certain instrumental goals such as self-preservation and resource acquisition.

Instrumental convergence suggests that an intelligent agent with apparently harmless goals can act in surprisingly harmful ways. For example, a computer with the sole goal of solving the Riemann hypothesis could attempt to turn the entire Earth into computronium in an effort to increase its computing power so that it can succeed in its calculations.

Instrumental and final goals

Final goals, or final values, are intrinsically valuable to an intelligent agent, whether an artificial intelligence or a human being, as an end in itself. In contrast, instrumental goals, or instrumental values, are only valuable to an agent as a means toward accomplishing its final goals.

Hypothetical examples of convergence

One hypothetical example of instrumental convergence is provided by the Riemann Hypothesis catastrophe. Marvin Minsky, the co-founder of MIT's AI laboratory, has suggested that an artificial intelligence designed to solve the Riemann hypothesis might decide to take over all of Earth's resources to build supercomputers to help achieve its goal.[1] If the computer had instead been programmed to produce as many paper clips as possible, it would still decide to take all of Earth's resources to meet its final goal.[2] Even though these two final goals are different, both of them produce a convergent instrumental goal of taking over Earth's resources.[3]

Basic AI drives

Steve Omohundro has itemized several convergent instrumental goals, including self-preservation or self-protection, utility function or goal-content integrity, self-improvement, and resource acquisition. He refers to these as the "basic AI drives". A "drive" here denotes a "tendency which will be present unless specifically counteracted";[4] this is different from the psychological term "drive", denoting an excitatory state produced by a homeostatic disturbance.[5] A tendency for a person to fill out income tax forms every year is a "drive" in Omohundro's sense, but not in the psychological sense.[6]

Goal-content integrity

In humans, maintenance of final goals can be explained with a thought experiment. Suppose a man named "Gandhi" has a pill that, if he took it, would cause him to want to kill people. This Gandhi is currently a pacifist: one of his explicit final goals is to never kill anyone. Gandhi is likely to refuse to take the pill, because Gandhi knows that if in the future he wants to kill people, he is likely to actually kill people, and thus the goal of "not killing people" would not be satisified.[7]

However, in other cases, people seem happy to let their final values drift. Humans are complicated, and their goals can be inconsistent or unknown, even to themselves.[8]

In artificial intelligence

In 2009, Jürgen Schmidhuber concluded, in a setting where agents search for proofs about possible self-modifications, "that any rewrites of the utility function can happen only if the Gödel machine first can prove that the rewrite is useful according to the present utility function."[9][10] An analysis by Bill Hibbard of a different scenario is similarly consistent with maintenance of goal content integrity.[10]

Instrumental convergence thesis

The instrumental convergence thesis, as outlined by philosopher Nick Bostrom, states:

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent's goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents.

The instrumental convergence thesis applies only to instrumental goals; intelligent agents may have a wide variety of possible final goals.[3]

See also

References

  1. Russell, Stuart J.; Norvig, Peter (2003). "Section 26.3: The Ethics and Risks of Developing Artificial Intelligence". Artificial Intelligence: A Modern Approach. Upper Saddle River, N.J.: Prentice Hall. ISBN 0137903952. Similarly, Marvin Minsky once suggested that an AI program designed to solve the Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful supercomputers to help achieve its goal.
  2. Bostrom, Nick (2014). "Chapter 8". Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. p. 123. ISBN 9780199678112. An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacturing of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips.
  3. 3.0 3.1 Bostrom, Nick (2014). "Chapter 7". Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. ISBN 9780199678112.
  4. Omohundro, S. M. (2008, February). The basic AI drives. In AGI (Vol. 171, pp. 483-492).
  5. Seward, J. (1956). Drive, incentive, and reinforcement. Psychological Review, 63, 19-203. Retrieved from https://pallas2.tcl.sc.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=pdh&AN=rev-63-3-195&site=ehost-live
  6. Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. ISBN 9780199678112. Footnote 8 to chapter 7.
  7. Yudkowsky, Eliezer. "Complex value systems in friendly AI." In Artificial general intelligence, pp. 388-393. Springer Berlin Heidelberg, 2011.
  8. Bostrom, Nick (2014). "Chapter 7". Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. p. 110. ISBN 9780199678112. We humans often seem happy to let our final values drift... For example, somebody deciding to have a child might predict that they will come to value the child for its own sake, even though at the time of the decision they may not particularly value their future child... Humans are complicated, and many factors might be in play in a situation like this... one might have a final value that involves having certain experiences and occupying a certain social role; and become a parent and undergoing the attendant goal shift might be a necessary aspect of that...
  9. Schmidhuber, J. R. (2009). "Ultimate Cognition à la Gödel". Cognitive Computation 1 (2): 177. doi:10.1007/s12559-009-9014-y.
  10. 10.0 10.1 Hibbard, B. (2012). "Model-based Utility Functions". Journal of Artificial General Intelligence 3: 1. doi:10.2478/v10229-011-0013-5.