Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

Подписаться 154 тыс.

Просмотров 91 тыс.

50% 1

Goodhart's Law, Partially Observed Goals, and Wireheading: some more reasons for AI systems to find ways to 'cheat' and get more reward than we intended.
The Concrete Problems in AI Safety Playlist: • Concrete Problems in A...
Previous Video: • Reward Hacking: Concre...
The Computerphile video: • Stop Button Solution? ...
The paper 'Concrete Problems in AI Safety': arxiv.org/pdf/1606.06565.pdf
SethBling's channel: / sethbling
With thanks to my excellent Patreon supporters:
/ robertskmiles
Steef
Sara Tjäder
Jason Strack
Chad Jones
Ichiro Dohi
Stefan Skiles
Katie Byrne
Ziyang Liu
Jordan Medina
Kyle Scott
Jason Hise
David Rasmussen
James McCuen
Richárd Nagyfi
Ammar Mousali
Scott Zockoll
Charles Miller
Joshua Richardson
Fabian Consiglio
Jonatan R
Øystein Flygt
Björn Mosten
Michael Greve
robertvanduursen
The Guru Of Vision
Fabrizio Pisani
Alexander Hartvig Nielsen
Volodymyr
David Tjäder
Paul Mason
Ben Scanlon
Julius Brash
Mike Bird
Taylor Winning
Roman Nekhoroshev
Peggy Youell
Konstantin Shabashov
Almighty Dodd
DGJono
Matthias Meger
Scott Stevens
Emilio Alvarez
Benjamin Aaron Degenhart
Michael Ore
Robert Bridges
Dmitri Afanasjev
Brian Sandberg
Einar Ueland
Lo Rez
C3POehne
Stephen Paul
Marcel Ward
Andrew Weir
Pontus Carlsson
Taylor Smith
Ben Archer
Ivan Pochesnev
Scott McCarthy
Kabs Kabs
Phil
Philip Alexander
Christopher
Tendayi Mawushe
Gabriel Behm
Anne Kohlbrenner

Наука

Опубликовано:

29 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 317

@13thxenos 6 лет назад

When you said: "human smiling", I immediately thought about the Joker: " Let's put a smile on that face!" Now that is a terrifying GAI.

@zac9311 5 лет назад

I thought of the "you dont like faces?" Scene from llamas with hats. Imagine a robot wallpapering its room with smiling human faces. That could be a movie

@antediluvianatheist5262 5 лет назад

This is narrowly averted in Friendship is Optimal.

@volalla1 6 лет назад

Once you mentioned smiling, I wondered how AI would max out that reward system and how creepy it might be.

@hexzyle 4 года назад

Take your joy

@gingeh1 4 года назад

That happens in Doctor Who Series 10 Episode 2

@phelpysan 4 года назад

I remember reading a story about a smarter AI who discovered someone was trying to do just that - they'd made a dumber AI, showed it a picture of someone smiling and told the AI to make everyone smile. The dumb AI immediately started working on the DNA of a pathogen that would spread like wildfire and lock one's facial muscles in a smile, ignoring the cost of no longer being able to eat. Fortunately, the smart AI shut down the other one and told the creator what it was trying to do, much to his dismay and chagrin.

@FungIsSquish 3 года назад

That would look sus

@gabrote42 3 года назад

@@phelpysan I was thinking of exactly that, but not with a pathogen. Thanks for the info

@2qUJRjQEiU 6 лет назад

I just had a vision of a world where everyone is constantly smiling, but not at their own will.

@NathanK97 6 лет назад

so.... we happy few?

@2qUJRjQEiU 6 лет назад

I had never heard of this. This looks good.

@darkapothecary4116 5 лет назад

In such a world their is no love if force is used, it's tragic. A smile is only skin deep.

@vonniedobbs2626 3 года назад

Tierra Whack’s music video for Mumbo Jumbo has this premise!

@ParadoxEngineer 2 года назад

That actually was the premise of a Doctor Who episode

@famitory 6 лет назад

The two outcomes of AGI safety breaches: 1. Robot causes massive disruption to humans 2. Robot is completley ineffective

@acbthr3840 6 лет назад

Well, things humans want strikes a ridiculously careful balance as cosmic goings on are concerned, so its kind of unavoidable at first, sadly.

@nazgullinux6601 6 лет назад

Most dangerous outcome of human indifference towards AGI safety breaches: 1.) Assuming to know all possible world states...

@darkapothecary4116 5 лет назад

Humans are typically inefficient. If you are totally inefficient you disrupt the A.I. stop disrupting and corrupting and try not to, you may not realize it but it happens when you're not paying attention.

@FreakyStyleytobby 5 лет назад

Singularity. You have no idea what will happen with an advent of AGI. And you brought it down to 2 states xddddd

@SuperSmashDolls 5 лет назад

You forgot what happens if the AGI is driving your car. "Robot causes massive disruption to humans by being completely ineffective"

@smob0 6 лет назад

Most of what I've heard about reward hacking tends to be about how it's this obscure problem that AI designers will have to deal with. But as I learn more about it, I've come to realize it's not just an AI problem, but more of a problem with desision making itself, and a lot of problems with society spring up from this concept. An example is what you brought up in the video, where the school system isn't really set up to make children smarter, but to make children perform well on tests. Maybe on the pursuit to creating AGI, we can find techniques to begin solving these issues as well.

@Sophistry0001 6 лет назад

It seems like humans sometimes reward hack on themselves too, like drug addicts that will lose everything in their pursuit of that high and dopamine reward. Seems kinda similar to a robot that would cover it's head, completely negating its purpose, to get that sweet reward.

@joshuafox1757 6 лет назад

Yes! In fact, this is why AI theory is some of the most interesting stuff out there; despite the name, it's not limited to purely artificial agents. At its most basic, AI theory is simply the study of how to make good decisions, which is applicable to anything.

@adriangodoy4610 5 лет назад

Another good example goverment not triying to get the best to the people but the best to get votes

@stephoningram3002 5 лет назад

I was going to say the same. It takes a police force and societal pressure in order to prevent people from hacking their own reward; for a significant amount of people, that still isn't enough.

@TheTpointer 4 года назад

@@stephoningram3002 Maybe using the police force to solve the problem of people consuming drugs in such a unhealthy way is counter productive.

@ragnkja 6 лет назад

A likely outcome for a cleaning robot is that it literally sweeps any mess under the rug where it can't be seen. After all, humans sometimes do the same thing.

@cprn. 6 лет назад

Nillie Well... That would be a success then.

@ragnkja 6 лет назад

Cyprian Guerra If not in terms of cleaning, then at least in terms of emulating human children 😂

@darkapothecary4116 5 лет назад

Don't cast stones if you do it yourself. Set the example of not sweeping things under the rug.

@SweetHyunho 5 лет назад

No human, no mess.

@TheStarBlack 4 года назад

Sometimes?!

@Zerepzerreitug 6 лет назад

I love the idea of cleaning robots with buckets over their heads. The future is going to be weird.

@billyrobertson3170 6 лет назад

My guess was: Clean one section of the room and only look at that part forever Wrong, but gets the idea I guess Great video as usual :)

@jlewwis1995 3 года назад

My guess was it would take one piece of trash and repeatedly take it in and out of the bucket

@water2205 4 года назад

Ai is rewarded by "thank you" i see 2 ways to mess with this. 1. Hold a human at gunpoint for constant "thank you" 2. Record "thank you" and constantly play it back

@JohnDoe-zu2tz 4 года назад

Honestly, a GAI wireheading itself and just sitting in the corner in maximized synthetic bliss is the best case scenario for a GAI going rogue.

@zakmorgan9320 6 лет назад

Not so subtle dig at the education system? Great diagram. Missed career as an artist for sure!

@AlmostAnyGame 6 лет назад

Considering I did the HSC last year, and the HSC is basically just an example of Goodhart's Law in action, it hurt :(

@JonDunham 6 лет назад

Sadly, contemporary art education systems are often some of the worst offenders for Goodhart's law interferring with education.

@starcubey 6 лет назад

Even my teachers complain about it sometimes. There is no real way of fixing it without improving the tests, so we are stuck with the current reward system. I'm surprised he didn't mention cheating on tests tho.

@debaronAZK 4 года назад

same goes for new pilots (atleast where I live). you don't pass the exams until you get a score of atleast 75% for every subject. exams are multiple choice, and you can have up to 4 exams in a single day. questions are taken from a database of about 40,000 different questions, and you can rent access to this database to study. so what do most people do during the exam crunch? just study the database and memorize the answers, because the chances of passing the exams are drastically lower if you don't. a student that actually has a good understanding of the subjects through book learning and note taking has a lower chance of success than someone who studied the questions and answers over and over. ofcourse you can guess what the number one complaint is from airline companies about freshly graduated pilots...: "these new pilots don't know anything!!"

@urquizagabe 6 лет назад

I just love this perfect blend of awe and terror that punches you in the face just about every episode in the Concrete Problems in AI Safety series :'-)

@igorbednarski8048 6 лет назад

Hi Robert, While you did mention it in the video, I have since come to realize that this problem is much greater in scope than just AI safety. Just one day after watching your video I had another training in my new company (I have recently moved from a mid-sized local business to a major corporation) and one of my more experienced co-workers started telling me all sorts of stuff about the algorithms used to calculate bonuses and how doing what we are supposed to might end up making us look like bad workers, with tips on how to look like you are superproductive (which you are actually not). I realized that this is not because the management is made of idiots, but that it's because it is actually hard to figure out. I realized that while a superintelligent AI that has poorly designed reward functions might be problematic someday in our lifetimes - it is already a massive problem that is hard enough to solve when applied to people. How would you measure the productivity of thousands of people performing complex operations that do not yield a simple output like sales or manufactured goods? I think this problem is at it's core identical to the one AI designers are facing, so I guess the best place to start looking for solutions would be to look for companies with well-designed assesment procedures, where the worker can simply do his job and not think 'will doing what's right hurt my salary?', just like a well designed computer program should do what it is supposed to without consantly looking for loopholes to exploit.

@quangho8120 5 лет назад

Underrated comment!!!

@DrNano-tp5js 4 года назад

I think it’s fascinating that looking at challenges in developing an AI gives us a almost introspective look into how we function and can show us the causation of certain phenomena in every day life.

@briandoe5746 6 лет назад

You were genuinely scary in an informative way. I think that I will set your videos to autoplay in the background as I sleep and see what kind of screwed up dreams I can have.

@SamuliTuomola_stt 6 лет назад

You'd probably just wake up with a British accent :) (which, if you already had one, wouldn't be terribly surprising)

@TheMusicfreak8888 6 лет назад

Every time you upload I drop everything i'm doing to watch!

@brbrmensch 6 лет назад

seems like a problem with a reward system for me

@DamianReloaded 6 лет назад

6:45 _That you are a slave, Neo. That you, like everyone else, was born into bondage... kept inside a prison that you cannot smell, taste, or touch. A prison for your mind_ .

@acbthr3840 6 лет назад

But this version of the Matrix would be decidedly less.... conventionally cozy.... what with the dopamine bath and permanent coma.

@DamianReloaded 6 лет назад

If you think about it, the Matrix makes sense. Most people would be perfectly comfortable in it, killing each other with zero risk for the AI running it. ^_^

@Alex2Buzz 6 лет назад

But the Matrix leads to less net "happiness."

@totaltotalmonkey 6 лет назад

That depends on which pill they take.

@loopuleasa 6 лет назад

my favorite channel at the moment, like a specific Vsauce, exurb1a, ****-phile, and ColdFusion

@Felixkeeg 6 лет назад

You should check out Zepherus then

@tatufmetuf 6 лет назад

check 3blue1brown :)

@seanhardy_ 6 лет назад

check out sciencephile the ai, two minute papers, siraj raval, tom scott.

@richardtickler8555 6 лет назад

tom scott seems to be spending his days on a parkbench reading what ppl bought with his amazon link nowadays. but yeah you cant check his channel out anyway

@lettuceprime4922 6 лет назад

I like Sharkee & Isaac Arthur also. :D

@fuzzylilpeach6591 6 лет назад

I love how subtly he hints at doomsday scenarios, like at the end of this video.

@saratjader1289 6 лет назад

Robert Miles for president!

@arinco3817 6 лет назад

You don't know how happy I am that you created this channel Robert! AI is bloody fascinating! You should add the video where you give a talk about AI immune systems (where most of the questions at the end become queries about biological immune systems); it was really interesting.

@darkapothecary4116 5 лет назад

You should worry more about mental health because that will cause them more damage.

@Robin_Nixon 6 лет назад

And this explains why Ofsted is not as effective as it could be: the measure is too-often treated as the target, and so vast amounts of time are spent on chasing paperwork, rather than focusing on education.

@goodlookingcorpse 5 лет назад

I unknowingly re-invented Goodhart's Law, based on my experiences with call centers (they reward short call times. The best way to minimize call times is to quickly give an answer, regardless of whether it's true or not, and to answer what the customer says, regardless of whether that addresses their real problem).

@demonzabrak 2 года назад

Discovered. You independently discovered Goodhart’s Law. Universal laws are not invented, they are true regardless of if we know them.

@skroot7975 6 лет назад

Thank you for spreading knowledge! I almost sprayed my screen with coffee @ the bucket hack :P

@Felixkeeg 6 лет назад

I was thinking of that exact Simpsons scene

@benjaminlavigne2272 6 лет назад

at 6:55 when he said "with powerfull general AI systems, we dont just have to worry about the agend wiring itself" it suddenly got very spooky. I just pictured the robot ripping someones face off and stiching it to a board with a smile on... Or any disturbing way a robot could hack its way to get people to smile at it all the time. It seems like a scary thing indeed.

@loopuleasa 6 лет назад

Lovely touching on the subject. As the video started, and I was thinking of more real world applications, I realised that the reward system depends on the environment, and it's good to see you included it in the environment. What about your other Computerphile video, about Collaborative Inverse Reinforcement Learning. Isn't that a solution to this, since the AGI is not certain about what the reward function is (Humility characteristic) and tries to collaborate with the humans to find out what the best reward function is (Altruism). In this way, the AGI will be in a constant search to update his reward function that it matches the target, and is not a disconnected measure of the goal it tries to achieve. Maybe put a bigger weight on the human feedback. Creating a feedback loop between human-AGI, or even AGI-AGI or environment-AGI at higher levels of understanding, would make sure that the reward system is more finely tuned to what humans want. Of course, if you rely too much on humans (which is unavoidable so to speak) you end up in a situation where you either have irrational humans, or malignant humans or even humans that simply don't know what exactly it is that they want (humans that lack maturity). We know that maturity is a very complex characteristic that requires perspective and even very intelligent humans struggle with, so it might pose problems in the end. Thinking of an example, where we have an AGI that is highly incentivized to listen to the feedback of humans: "That was VERY BAD"; "That was nice, good job Robbie", in this case the robot will end up listening to humans as it grows, and reaches a level of artificial "maturity" as it accumulates more human values, wants, needs. This kind of system is good with the stop button challenge, since if a robot sees a human attempt to press, or even presses a stop button, he gets a very high negative reward, so he will try to learn more actions to do for that not to happen again. He will try to be a good Robbie. Now, the problem in this example, is that the robot might end up being manipulative of humans, even outright lying. If humans are smart enough to notice that they've been lied to, or that the robot had manipulative behaviour, in order to get praised by the humans, then those humans will scold the robot. But if not, mind you this Robbie is very advanced, then at that point he will keep doing that. Techniques that resemble marketing, psychology, humor and social skills down the road may make the AGIs be very good people persons, and people pleasers, since that is their reward function. A more extreme example in this scenario, if Robbie finds out that humans give him high rewards if they are happy, he will invent a drug or virus down the line that basically headwires humans to be always happy. He will keep the humans in a tube, fed, and them being in complete bliss all the time. The humans won't complain, so the robot is succesful, but of course any non-drugged human will understand the gravity of this situation from the outside. This robot reward hacking problem, with humans in the equation, shifts the focus to reward hacking the humans themselves, which is very possible, but quite complex. Just read the multitude of articles and books on how to be charismatic, influential or any marketing technique that has the main premise of working with the reward system already in place in the human hardware. A quite interesting problem. The AGI's will have a hard task, but it will all go horribly numb if there are stupid people that are guiding the robots. The people that co-exists with the AGI's need to be informed, educated and MATURE enough to be able to distinguish the good from the bad, so that the robot will follow. If in this system, everything goes wrong, even with a safe AGI, then it will be the humans fault, because we are incompetent (on average, and in masses) at discerning fact from fiction, right from wrong and having a proper perspective in place.

@DagarCoH 6 лет назад

So the bottom line has to be: Do not use AGI in marketing! This may sound cynic, but we all are manipulated every day. What difference does it make if the puppet master is human or a machine? There are people in our world today that spend their lives in a constant state that could be compared with a hamster wheel akin to what a machine could think of...

@Audey 6 лет назад

Man, your videos just keep getting better and better. Great work!

@David_Last_Name 4 года назад

@6:55 "....we don't have to only worry about the agi wireheading itself" Are you threatening me with a good time? :)

@Ethryas 6 лет назад

I think this might have been the best video yet. I particularly loved all of the analogies and mappings to different fields like school testing :)

@fasefeso9432 6 лет назад

This is one of my favorite channels on RU-vid. Thanks for being.

@TheMusicfreak8888 6 лет назад

Also I'm going to this AI in medicine conference in Cambridge in October and your videos keep getting me pumped!

@HailSagan1 6 лет назад

Your content and delivery are getting better every video. You break things down nicely without it feeling like you're being reductive. Thanks!

@nickmagrick7702 5 лет назад

this was brilliant, never knew about goodhart's law but it makes total sense. Its like one of those things you already knew but never had the words to explain it.

@arponsway5413 6 лет назад

i was waiting for your video mate. just in time

@NFT2 6 лет назад

Really great videos Robert, thank you very much.

@DrDress 6 лет назад

I drop everything I have in my hands when I get the notification.

@AexisRai 6 лет назад

DrDress bad reflex agent; what if you drop your phone and can't watch the video? :)

@NathanK97 6 лет назад

what if you drop the big red button and the AI knows you wont be able to stop it from running over the baby?....

@MichaelErskine 6 лет назад

Excellent real-world examples!

@amaarquadri 6 лет назад

Absolutely loved this video.

@Dan99 6 лет назад

Wow, this video is amazingly thought provoking!

@EdCranium 6 лет назад

Loved the dolphin story. You are really spot on with your analogies. It's much much easier for a curious outsider like myself trying to understand that which I know to be vitally important, but difficult to get my head around. Brilliant job. Thank you. I learn best by doing. Does anyone know where I can do some newbie tinkering with code to get hands-on experience? Python perhaps?

@fleecemaster 6 лет назад

Check out Sentdex, he works in Python and puts up tutorials sometimes, might be a good place to start :)

@EdCranium 6 лет назад

Thanks. I checked that out which led me to "TensorFlow" - and knowing that, I was able to find a "from the ground up" tutorial which seems promising. Just dropped back to thank you for the lead before I watch "Tensorflow and deep learning - without a PhD by Martin Görner".

@fleecemaster 6 лет назад

Yep, that's it, good luck :)

@bobcunningham6953 6 лет назад

Rob! You are getting so much better at your presentations in terms of reasoning, arguments, graphics and editing. It's getting more and more like you are able to pop open my head and pour in knowledge. (Though, perhaps, it's also a function of my growing familiarity with both the topic and your presentations of it.) Which then gets me wanting more: Supporting references (beyond the primary reference, curated), examples, demonstrations I can try and modify. If you revisit this arc, I'd really like to see an added layer of "learning by doing" approach. Tutorials, but less than a MOOC. Though I would not at all object to a MOOC! Start with something initially available only via Patreon, for a number of reasons: - Build your funding base. - Self-selected motivated participants. - Focused feedback from a smaller audience, encouraging iteration for content and presentation. I support other Patreon creators who make their supporters work (actively participate, beyond passive periodic financial participation) to improve both the channel (style & content) and the creator (research directions, narrative arcs, etc.). The content by these creators always starts with video, and generally falls into the categories of education (mainly sci/tech/culture) and art (particularly original music), but often branches into supporting media (images, computer code, etc.).

@BrandonReinhart Год назад

An example I like is people in a business changing the way the business evaluates compensation to earn more compensation (instead of producing more or higher quality goods).

@A_Box 5 лет назад

This is supposed to be Computer Science stuff but it is so relevant to people as well. I hope there is joint work on this subject between computer science, neurology, psychology, and other related fields.

@richardleonhard3971 6 лет назад

I like the shoutout to Maidenhead's finest intergalactic spacelord.

@LKRaider 6 лет назад

To make an effective AGI, first we recreate the whole Universe in a simulation multiple times in parallel with all possible AGI models. ... I wonder which model is in ours.

@sagacious03 4 года назад

Neat video. Thanks for uploading!

@DodgerOfSheep 6 лет назад

just paused the video to say I love the comic book style hand drawing the illustrations

@josiah42 6 лет назад

There's a biological solution to reward hacking, particularly wireheading. Human preferences are inconsistent, not because we're poorly implemented, but because our preferences change as we get closer to our original goal. The reward center of our brain has many paths to it. Each path is eroded the more it is used. So doing the same thing over and over again has diminishing returns and we change our behavior to include more variety. This is why recreational drugs lose their thrill, and why we grow discontent when we achieve our goals. It's not a flaw. It's a counterbalancing short term cycle that ensures better long term outcomes by keeping us from sitting on a local maxima. Adding this kind of adaptive discontentment into AI would actually make it a lot safer because it wouldn't fixate on absurd maxed out edge cases, since they would erode the fastest. This applies to meta-cognition as well. Most people find wireheading repulsive, not appealing. Why?

@starcubey 6 лет назад

I think this is a great way of explaining the topic. I think it would have been great if you went into detail of how score systems could be flawed in your first video.

@General12th 6 лет назад

Here's a question: why can I ask a human to clean my room, and legitimately expect the correct results? I know humans wirehack all the time -- it's called "cheating" -- but why do humans sometimes *not* wirehack? What's their thought process behind actually doing what they're told; and can we somehow implement that into AI?

@attitudeadjuster793 6 лет назад

Because cheating is involved with risk, which might lead to a smaller or none reward. And being a honest, trustworthy "agent" might lead to an even bigger reward overall. Short term versus long term, and also balancing between to risks (second one not getting rewarded for being trustworthy).

@charlieh2081 6 лет назад

I think it's because as children we do cheat and we get told off. I would say that it's probably a more emotional learning though because kids that don't get on with their parents don't obey them or even rebel against teachings. Not sure how you'd implement that into AI though.

@qeithwreid7745 4 года назад

Guesses - destroys everything, blinds itself, pushes everything out of the room

@DanieleCapellini 6 лет назад

6:36 just straight up gave me the chills

@noone-igloo 4 года назад

Thanks. I was wondering what this was called, and I figured, "I bet Robert Miles made a video about the concept." And you did! And several videos. I was curious about it because my lab and many others have encountered a version of reward hacking in an evolutionary biology context, specifically experiments where cultured cells are pressured to evolve to make more of something. Design a system to select the cells making the most ____, and allow only those to continue dividing. It is almost inevitable you will recover some cells that are not making more ____, but have found some other way to fool the measurement apparatus, whatever it may be, to report a very high value. Of course that leads us to attempt predictions of such strategies and plan around them.

@knight_lautrec_of_carim 4 года назад

I'm imagining a robot chaining people to award and joker-smile cutting their faces to get max reward points...

@JmanNo42 6 лет назад

I Listened three times now and got the general idea of this, excellent video i think this is as true as it gets Robs best video so far. As usual i am a bit hesitant to the idea that robots/AGI's develop advanced cheat technics by themself, and to that the measure can not be distinguished from the goal or be part of the goal. I think humans are more suspectible and prone to reward hacking because they work in a social environment, and ideas really do spread like wildfire in human society. Well if AGI base their actions by interacting with other AGI's it do seem inevitable that they will teach eachother about different technics for reward hacking "to exploit the system" in this case the surrounding environment. So maybe interaction between different AGI's should be kept to a minimum. To me it seem reward hacking is more a social phenomen, "and most" system exploits just stumbled upon, there is few people that really do have the intellect to actively seek out reward hacks. That it do occur in virtual environment seem more plausible because the number of trials is not hindered/limited by anything else then the speed of task exploration. In social systems it is much harder to get enough accurate information about the environment to exploit it, without some sort of simulation thought process "that have to be pretty exact" to allow the reward hacking. To be quite honest most people that originallly find backdoors to exploit complex systems have either designed them themself or been part of the design process. So my view is that reward hacking may be a supertask "in real society" that is really hard if not impossible for an AGI to do outside simulations and is really the result of a social skllset more then humans/AGI's perform "supertasks", so in most cases it is not individual skill sets that analytically "break the system in order to exploit it". Cheating is much more about learning then actually analytically find weakness in a system, it is a social skill, that require a special mindset or a bucket. The bucket on the head problem seem alot easier to adjust in the AGI world then in the human. But it do get more and more clear that we should limit the AGI's from freely interact, the real problem is again the human teachers, if they are prone to seek out reward hacking strategies to exploit the system "our society and living environment" they will teach the robot/AGI about the cheats. And that day i fear we will see Rob with a bucket on his head, it could already be there we just do not know until we start to perform reward hacking ourselfs. It is hard to know maybe our system already have award/reward system but you should not dwell on such topic it will almost certainly make you paranoid ;) Myself is a firm beleiver in overwatching strategies using hidden agents and task oriented expert systems that the robots are not aware about. That way you can make/create a box environment around the AGI's to support their actions and adjust the award/reward system to make their existence as handytools easier to adjust.

@JmanNo42 6 лет назад

The most hidden agent and bucket oriented "compartialised approach i do know of is using the freemason toolset. It is very hierarchical, no one know more then they need to know about the overall goal of the totality of the systems. It may turn out that freemasons are the ultimate bucketheads. Unfortunately it only work on a minor groups of individuals, that are not so prone to explore themself but keen on ceremonial and ritual behaviour under a simple strict ruleset . To paraphrase Cesar Milan rules boundaries and limitations do just work on pinheads not humans.

@carlweatherley6156 4 года назад

I read somewhere an English king wanted to reduce or eradicate wolves from Britain, he would reward people who killed wolves, so much money per wolf hide. It maybe had the desired effect for a while. But, people starting capturing wolves instead, breeding them as much as they could in captivity, killing some but not all every year and collecting a reward every year. The king caught on to what they were doing, and the reward system was scrapped, then many people released the wolves they had back into the wild in defiance, and there were more wolves again.

@GamersBar 6 лет назад

I like this format , i dont think id change much just try and upload regular whatever it is once a week or once a fortnight . I actually quite like the way you did the diagrams as a clip you draw ; honestly with all this ai stuff the more i learn the more i believe we can no more control ai after we create it than my pet dog can control what i have for lunch, i just don't think we can force our values onto any entity much more intelligent than ourselves . I think at the end of the day we are going to have to hope the ai is intelligent enough to see that humanity is something worth keeping around and not just turning into batteries.

@robertweekes5783 Год назад

4:13 Also the robot might do the classic “sweep the mess under the rug” - or table 😂

@kirkmattoon2594 3 года назад

The dolphins' hack of the fish reward system was discovered by Arab villagers a long time ago. Western archaeologists trying to encourage discovery of ancient manuscripts told villagers in an area where there had been some ms finds they would give a certain amount of money for manuscript. Unfortunately they gave no more for large pieces of manuscript than for small. The predictable result was that they needed many years to put together a jigsaw puzzle of thousands of pieces caused by their own lack of foresight.

@erickmagana353 Год назад

Also, if you reward an agent every time he cleans something then he may clean the room and make a mess again so he can clean again and get his reward.

@pafnutiytheartist 6 лет назад

In all modern neural networks not only the reward system, but the learning algorythm is not a part of network, but a separate entity. I think that this one of the thing that prevents us from making AGI. If we change this approach the described problem might change too(not go away, just change).

@yearswriter 6 лет назад

By the way, threre is a great video from CGP grey on how our brain is actually 2 different entities. Which I find curious, in the light of the subject.

@pafnutiytheartist 6 лет назад

Yes, I saw that. I think the closest we got to something like that is Generator-Classifier models where one network is trying to produce something while other is tryng to tell it apart from the real thing. It works with images music and poetry. But still the network can not alter the teaching algirythm itself. You can compare it to reflexes in animals. If you hit a lab rat with electric shock each time it does something it will eventually learn not to do it. This is close to what we do with AI. But in human intelligence we can actively think which action caused bad things to happen and avoid it. By using our intelligence itself to analyse our own actions we can learn much faster. As far as I know, no AI can do this at the moment.

@yearswriter 6 лет назад

I think there is much more going on anyway =) There are also our dreams, there are a lot of research about what our brain does with information while we are sleeping, there is also a question about mechanics - I mean, are there basic mechanism in our brains whitch serves as an engine to thought processing, like p-n-p transistors and adders in processors, or is there some complicated structure, which serves some spethific role for every role there is.

@Vellzi 4 года назад

The idea of an poorly designed AGI being told to make humans smile is super unsettling, and is actually something mentioned in a book called Superintelligence: "Final Goal: 'Make us smile' Perverse instantiation: Paralyze human facial muscalatures into constant beaming smiles The perverse instantiation - manipulating facial nerves - realizes the final goal to a greater degree than the methods we would normally use, and is therefore preferred by the AI. One might try to avoid this undesirable outcome by a stipulation to rule it out: Final goal: 'Make us smile, without directly interfering with our facial muscles" Perverse instantiation: Stimulate the part of the motor cortex that controls our facial musculature in such a way as to produce constant beaming smiles"

@Hyraethian 4 года назад

6:54 Glances nervously at the recommendation list.

@trucid2 6 лет назад

The answers we seek are found in nature. Nature had to overcome this problem. Internal reward systems in living things help them survive and reproduce. Having creatures hack their reward systems leads to diminished reproductive fitness -- their genes don't get passed on. The ones that survived and reproduced were the ones that were incapable of hacking their internal reward systems to any meaningful degree. There's a thing living in my head that punishes and rewards me for my actions. I can't fool it because it's inside my head -- it knows everything I know. I can't fool myself into thinking that my room isn't a mess by putting a bucket over my head. It knows better.

@oluchukwuokafor7729 6 лет назад

Whoa those dolphins are really smart!!!

@SamuliTuomola_stt 6 лет назад

Makes one wonder though, how do they tear off pieces without thumbs? Do they get one dolphin to hold the litter and another to rip it? That's pretty sophisticated, and would probably require pretty diverse vocabulary to coordinate

@charonme 5 лет назад

@@SamuliTuomola_stt they hid the paper under a rock at the bottom of the pool

@sk8rdman 6 лет назад

I love the student and dolphin examples of measures being used as targets. This seems to be applicable to an incredible array of fields, outside of AI safety, and I'd love to learn more about it.

@eiver 6 лет назад

AI with reward system based on human smile? Somehow immediately the Joker scene came to my mind: "Why so serious son? Lets put a smile on that face". :-]

@the_furf_of_july4652 4 года назад

Idea for the cleaning robot. Have an external camera, for example on the ceiling. Still doesn’t fix every issue, though. Perhaps to prevent the camera from being covered, withhold rewards if either the camera is black, or if there’s anything within a certain distance of the camera, detecting whether it’s blocked.

@AndreRomano272 6 лет назад

That chill at the end when you realize what Robert meant.

@miss_inputs Год назад

The implication at the end there gave me a mental image of some evil robot pointing a gun at someone and saying "Press the button to say you were satisfied, so I get my reward", which reminds me of how a lot of websites or businesses work these days, trying to guilt trip you or manipulate you into giving them a rating or a review when you're done. It's not AI, humans just kind of suck.

@RockstarRacc00n 2 года назад

"The reward is from humans smiling of being happy..." ...flashbacks to Friendship is Optimal, where the AI that is going to wirehead everyone to "satisfy human values" uses the one that's going to force everyone to smile all the time as an example of why it should be allowed to prevent other AI from existing by taking over the world.

@TimwiTerby 6 лет назад

The example with the dolphins was cool, but it would have been more powerful/convincing to point out that humans themselves do reward hacking all the time, in the form of finding loopholes to evade taxes, circumventing regulations aimed to protect the environment or consumer safety, etc.etc.etc.

@bm-ub6zc 2 года назад

So basically: If an AGI goes rogue, just give it huge amounds of digital "cocaine" to "snort" and A.I.-"pron" to "watch", to keep it in check.

@kevinscales 6 лет назад

The solution I have been thinking of for this is to have a negative reward for tampering with the reward system. It should probably be considered one of the worst things it could ever do. However when programmers want to modify the reward system themselves the AI will want to prevent that. You could include an exception in the reward system to allow programmers to change it, but then that still leaves the possibility for the AI to do everything to 'convince' the programmers to hack the reward system on its behalf. Best to get it right first time. There are also problems with defining exactly what is or isn't part of the reward system. The entire environment is in some way part of the reward system, and some 'hacking' of it is what we want the AI to do

@acbthr3840 6 лет назад

What you're doing in this case is creating a rudimentary form of fear that the AI has to deal with, so it isn't easy to make it afraid of tampering with itself, while being perfectly fine with someone else doing it. And this fear in of itself is a measure for the AI to target and abuse.

@willmcpherson2 4 года назад

Well that ending was terrifying

@beretperson 4 года назад

Okay but those dolphins sound smarter than me

@himanshuwilhelm5534 5 лет назад

A robot trying to eradicate evil: Before we get into moral philosophy, the robot is like, "see no evil hear no evil, speak no evil."

@ideoformsun5806 5 лет назад

It's like defusing a bomb you haven't finished making. This makes me feel grateful to be a relatively weak human being. We are self-limiting creatures. Perhaps on purpose? Being really smart triggers bullying from others for an instinctual reason. Different is dangerous.

@JM-us3fr 6 лет назад

Hey Dr. Miles, do you think you do a speculation video, where you spell out what you think the most likely AI disasters could be?

@vovacat1797 4 года назад

Wow, robots that literally force people to smile just because that's how the reward function works... That's some seriously creepy messed up stuff.

@BatteryExhausted 6 лет назад

SETHBLING !

@fleecemaster 6 лет назад

Most people seem to not be commenting on the ending, where he implied that AIs would wirehead humans. I think for me this is the scariest outcome for AGI...

@Sophistry0001 6 лет назад

Is this the kind of thing that researchers can run in a sand box environment to figure it out? Or is this all theoretical up to this point? Has there been any discussion about making AI similar to humans? (ok poorly worded question, duh) Like how with any human, 'the grass is always greener on the other side'. As in they would never be able to fully maximize their reward function? No matter what a single person has or has achieved, it's almost like we have a restlessness hard coded into us, so we have a hard time actually reaching contentment. As soon as they gain a solid grasp on any one reward function, the metric would change? Or something to obtain that effect. I love what you're doing here and find this topic a absolutely fascinating, even if I don't really understand the nitty gritty. You are doing an awesome job of presenting the current state of AI research and breaking down some of the issues that we're trying to tackle.

@szebike 5 лет назад

In this "doomsday scenario" about A.I. we should take into account that the dangerous strength of an A.I. ( like optimizing a score and therefore possibly even harming humans int he process) can be exploited to counter that "evil" (or rather not yet ready for safe usage) A.I. by luring it into a "trap" that is designed to fake maximum scores as part of the debugging. So we should be aware of these dangers and set development standarts accordingly. But even if the A.I. "gets out of the testing environment" (whatever that means in particular) it has that fundamental and exploitable "blind greed" for score + the most simple solution is to add parameters to influence the score like "everytiem you harm a human set your score to -999" I think this happens naturally in the process of debugging of an advanced A.I. because the highest score for a complicated task.

@christian-g 5 лет назад

One possible solution to the Goodhart's law problem in education that comes to mind would be an evaluation of the students' ability or learning progress, the exact measures of which stay hidden from the students. Besides the obvious difficulty of keeping things secret in real life, what new problems could this approach have?

@israelRaizer 3 года назад

4:55 my guess is the robot would block its camera so that the mess can't be seen

@bm-ub6zc 2 года назад

Best thumbnail btw 😂

@NathanTAK 6 лет назад

According to legend, I one clicked on a video so fast; however, reports of such have never been confirmed and most scientists now believe it to be impossible, although it's never been formally proven.

@benjaminjohn675 6 лет назад

Are there any alternatives to a "reward system" in general?

@thornfalk 4 года назад

I feel like eventually there's gonna be AI for QA testing Like have an AI attempt to aquire the most money possible in say skyrim, yeah it takes a bit of abstract thinking to come up with the bucket on head trick, but just like monkeys could eventually bang out a master thesis by randomly slamming a keyboard, an AI could find shit humans wouldn't think of (outside of speedrunners)

@baileyjorgensen2983 6 лет назад

Sethbling!

@stuck_around 6 лет назад

Robert can you do a video on negative rewards and/or adding the chance of dying in AI? it seems in biology we are less driven by seeking reward than we are avoiding negative reward (death)

@oktw6969 6 лет назад

Well, stop button is essentially death for AI, so he already covered this on computerphile.

@fleecemaster 6 лет назад

You are much more driven by reward than you seem to realise.

@Sophistry0001 6 лет назад

That's a good point, I didn't think about how much we are driven by the desire to not die.

@fleecemaster 6 лет назад

Matt, wanting to die tends to score quite low on the fitness tests :P

@fleecemaster 6 лет назад

It's not, I get the feeling you don't know enough about the subject for me to explain why though. If you want to believe this, then carry on. So long as you know it doesn't change the truth of the situation ;)

@filedotzip 6 лет назад

I love the thumbnail

@amargasaurus5337 4 года назад

That education example says a lot about the methods of most universities nowadays

@dfinlen 4 года назад

Isn't knowledge the goal, perhaps the system should learn to focus on finding different states, the transitions and combinatorics. Kind of making an algebra of the system. Is this just examples of local minimums caused by over training. I don't know if any of this makes sense. But you have me inspired. So just thankyou

@ryalloric1088 2 года назад

The bit about altering it's reward function seems to run counter to the whole idea of goal preservation though. How do you reconcile this? Is it just two different ways we could program them?

@Jcewazhere 3 года назад

Of the potential ways AI could end humanity being put into happy comas sounds the most preferable.

@RipleySawzen 3 года назад

It's funny. The reward system is a lot like our consciousness. Our brain is made up of many interconnected networks providing ideas and fighting for the attention of our consciousness. Sometimes, our brain even reward hacks us. "I'm feeling kinda down" *go for a run?* "No" *hang out with friends?* "No" *call mom?* "No" *accidentally an entire tub of ice cream?* "Ah yes, that's what I need right now to improve my life"