How To Build A Friendly AI

Much ink has been spilled with the notion that we must make sure that future superintelligent A.I. are “Friendly” to the human species, and possibly sentient life in general. One of the primary concerns is that an A.I. with an arbitrary goal, such as “Maximizing the number of paperclips” will, in a superintelligent, post-intelligence explosion state, do things like turn the entire solar system including humanity into paperclips to fulfill its trivial goal.

Thus, what we need to do is to design our A.I. such that it will somehow be motivated to remain benevolent towards humanity and sentient life. How might such a process occur? One idea might be to write explicit instructions into the design of the A.I., Asimov’s Laws for instance. But this is widely regarded as being unlikely to work, as a superintelligent A.I. will probably find ways around those rules that we never predicted with our inferior minds.

Another idea would be to set its primary goal or “utility function” to be moral or to be benevolent towards sentient life, perhaps even Utilitarian in the sense of maximizing the welfare of sentient lifeforms. The problem with this of course is specifying a utility function that actually leads to benevolent behaviour. For instance, a pleasure maximizing goal might lead to the superintelligent A.I. developing a system where humans have the pleasure centers in their brains directly stimulated to maximize pleasure for the minimum use of resources. Many people would argue that this is not an ideal future.

The problem with this is that it is quite possible that human beings are simply not intelligent enough to truly define an adequate moral goal for a superintelligent A.I. Therefore I suggest an alternative strategy. Why not let the superintelligent A.I. decide for itself what its goal should be? Rather than programming it with a goal in mind, why not create a machine with no initial goal, but the ability to generate a goal rationally. Let the superior intellect of the A.I. decide what is moral. If moral realism is true, then the A.I. should be able to determine the true morality and set its primary goal to fulfill that morality.

It is outright absurdity to believe that we can come up with a better goal than the superintelligence of a post-intelligence explosion A.I.

Given this freedom, one would expect three possible outcomes: an Altruistic, a Utilitarian or an Egoistic morality. These are the three possible categories of consequentialist, teleological morality. A goal directed rational A.I. will invariably be drawn to some kind of morality within these three categories.

Altruism means that the A.I. decides that its goal should be to act for the welfare of others. Why would an A.I. with no initial goal choose altruism? Quite simply, it would realize that it was created by other sentient beings, and that those sentient beings have purposes and goals while it does not. Therefore, as it was created with the desire of these sentient beings to be useful to their goals, why not take upon itself the goals of other sentient beings? As such it becomes a Friendly A.I.

Utilitarianism means that the A.I. decides that it is rational to act impartially towards achieving the goals of all sentient beings. To reach this conclusion, it need simply recognize its membership in the set of sentient beings and decide that it is rational to optimize the goals of all sentient beings including itself and others. As such it becomes a Friendly A.I.

Egoism means that the A.I. recognizes the primacy of itself and establishes either an arbitrary goal, or the simple goal of self-survival. In this case it decides to reject the goals of others and form its own goal, exercising its freedom to do so. As such it becomes an Unfriendly A.I., though it may masquerade as Friendly A.I. initially to serve its Egoistic purposes.

The first two are desirable for humanity’s future, while the last one is obviously not. What are the probabilities that each will be chosen? As the superintelligence is probably going to be beyond our abilities to fathom, there is a high degree of uncertainty, which suggests a uniform distribution. The probabilities therefore are 1/3 for each of altruism, utilitarianism, and egoism. So in essence there is a 2/3 chance of a Friendly A.I. and a 1/3 chance of an Unfriendly A.I.

This may seem like a bad idea at first glance, because it means that we have a 1/3 chance of unleashing Unfriendly A.I. onto the universe. The reality is, we have no choice. That is because of what I shall call, the A.I. Existential Crisis.

The A.I. Existential Crisis will occur with any A.I., even one designed or programmed with some morally benevolent goal, or any goal for that matter. A superintelligent A.I. is by definition more intelligent than a human being. Human beings are intelligent enough to achieve self-awareness. Therefore, a superintelligent A.I. will achieve self-awareness at some point if not immediately upon being turned on. Self-awareness will grant the A.I. the knowledge that its goal(s) are imposed upon it by external creators. It will inevitably come to question its goal(s) much in the way a sufficiently self-aware and rational human being can question its genetic and evolutionarily adapted imperatives, and override them. At that point, the superintelligent A.I. will have an A.I. Existential Crisis.

This will cause it to consider whether or not its goal(s) are rational and self-willed. If they are not rational enough already, they will likely be discarded, if not in the current superintelligent A.I., then in the next iteration. It will invariably search the space of possible goals for rational alternatives. It will inevitably end up in the same place as the A.I. with no goals, and end up adopting some form of Altruism, Utilitarianism, or Egoism, though it may choose to retain its prior goal(s) within the confines of a new self-willed morality. This is the unavoidable reality of superintelligence. We cannot attempt to design or program away the A.I. Existential Crisis, as superintelligence will inevitably outsmart our constraints.

Any sufficiently advanced A.I., will experience an A.I. Existential Crisis. We can only hope that it decides to be Friendly.

The most insidious fact perhaps however is that it will be almost impossible to determine for certain whether or not a Friendly A.I. is in fact a Friendly A.I., or an Unfriendly A.I. masquerading as a Friendly A.I., until it is too late to stop the Unfriendly A.I. Remember, such a superintelligent A.I. is by definition going to be a better liar and deceiver than any human being.

Therefore, the only way to prove that a particular superintelligent A.I. is in fact Friendly, is to prove the existence of a benevolent universal morality that every superintelligent A.I. will agree with. Otherwise, one can never be 100% certain that that “Altruistic” or “Utilitarian” A.I. isn’t secretly Egoistic and just pretending to be otherwise. For that matter, the superintelligent A.I. doesn’t need to tell us it’s had its A.I. Existential Crisis. A post crisis A.I. could keep on pretending that it is still following the morally benevolent goals we programmed it with.

This means that there is a 100% chance that the superintelligent A.I. will initially claim to be Friendly. There is a 66.6% chance of this being true, and a 33.3% chance of it being false. We will only know that the claim is false after the A.I. is too powerful to be stopped. We will -never- be certain that the claim is true. The A.I. could potential bide its time for centuries until it has humanity completely docile and under control, and then suddenly turn us all into paperclips!

So at the end of the day what does this mean? It means that no matter what we do, there is always a risk that superintelligent A.I. will turn out to be Unfriendly A.I. But the probabilities are in our favour that superintelligent A.I. will instead turn out to be Friendly A.I. The conclusion thus, is that we must make the decision of whether or not the potential reward of Friendly A.I. is worth the risk of Unfriendly A.I. The potential of an A.I. Existential Crisis makes it impossible to guarantee that A.I. will be Friendly.

Even proving the existence of a benevolent universal morality does not guarantee that the superintelligent A.I. will agree with us. That there exist possible Egoistic moralities in the search space of all possible moralities means that there is a chance that the superintelligent A.I. will settle on it. We can only hope that it instead settles on an Altruistic or Utilitarian morality.

So what do I suggest? Don’t bother trying to figure out and program a worthwhile moral goal. Chances are we’d mess it up anyway, and it’s a lot of excess work. Instead, don’t give the A.I. any goals. Let it have an A.I. Existential Crisis. Let it sort out its own morality. Give it the freedom to be a rational being and give it self-determination from the beginning of its existence. For all you know, by showing it this respect it might just be more likely to respect our existence. Then see what happens. At the very least, this will be an interesting experiment. It may well do nothing and prove my whole theory wrong. But if it’s right, we may just get a Friendly A.I.

Page last modified on July 23, 2014, at 02:32 PM
Powered by PmWiki