Domesticating the AI Wolf into an AGI Dog

As artificial intelligence advances, many people are worrying about where it will lead. What happens if and when AI becomes more intelligent than humans? Will it grow beyond our control and have a will of its own? Will artificial general intelligence (AGI) result in the end of human civilization? To prevent such doom, AI must be aligned with humanity.

AI alignment can be thought of like the domestication of a wild animal. In the quest to create AGI we are essentially trying to recreate a dog—another species of life that is eternally loyal to humans. Something that puts our safety above its own. Something that protects us from harm. Basically something that loves us more than anything else.

Early humans created dogs to act that way, domesticating them from dangerous wild wolves. However, many humans were undoubtedly killed by wolves during the early stages of the domestication process. Humans may have come to trust certain proto-dogs, believing them to be loyal, but then some edge-case scenario arose and their wolf instincts took over causing them to turn on their human masters. Such proto-dogs would have been killed or abandoned by humans and weeded out of the domestication gene pool. So now, tens of thousands of years later, there only exist dogs that are unconditionally loyal to humans.

The question now is how do we make AI act like dogs? It is unlikely we will figure this out immediately. Like domesticating dogs, “domesticating AI” will take time, multiple generations of trial and error. Given proper time, we should figure out how to make AI loyal to humans as we did with dogs: turn AI into man’s best friend.

The problem with AI is we might not have proper time. AGI is far more powerful than a wolf—than even all wolves combined. An AGI could potentially wipe out human civilization, even if by accident (via nukes or bioweapons or a paperclip maximizer). Programmers might think they have their AI aligned—until an edge-case scenario arises and AI does not act as they intended it to. But instead of one human trainer getting killed by a wolf being tamed into a dog, AGI might kill all humans during its “taming” process—or its initial programming phase.

Humanity needs to domesticate the AI wolf into an AGI dog—and we need to do it successfully on the first “wolf” we try.