Discussion about this post

User's avatar
Leo Abstract's avatar

This, if implemented, would perfectly solve the alignment problems. The difficulty is in the words "tell it that" -- we don't actually have any idea how to do this step, as explored in the writings on inner/outer alignment etc.

Expand full comment
Eric Brown's avatar

A couple of thoughts:

1) Consider reading Jack Williamson's _The Humanoids_ (or, alternatively, the short story "With Folded Hands"). Williamson took Asimov's Three Laws of Robotics seriously, and in his stories, the robots prevent humans from doing anything remotely dangerous, like walking across the street (you could get hit by a meteor!)

The point of the alignment problem is:

1) Humans aren't aligned with their own desires, and often don't even want to admit what their actual desires are.

2) Even if you could solve problem #1, we have no idea how to make things (which are smarter than us) do the things we want (and NOT do the things we don't want).

There are tons of examples where (e.g.) genetic algorithms create things that solve the problem at hand, but do so in a completely incomprehensible manner (and, for that matter, depend critically on tiny implementation details). And this is in a method that is relatively well understood in machine learning.

Expand full comment
10 more comments...

No posts