AI Safety Gridworlds

Подписаться 156 тыс.

Просмотров 92 тыс.

50% 1

Got an AI safety idea? Now you can test it out! A recent paper from DeepMind sets out some environments for evaluating the safety of AI systems, and the code is on GitHub.
The Computerphile video: • AI Gridworlds - Comput...
The EXTRA BITS video, with more detail: • EXTRA BITS: AI Gridwor...
The paper: arxiv.org/pdf/...
The GitHub repos: github.com/dee...
/ robertskmiles
With thanks to my wonderful Patreon supporters:
- Jason Hise
- Steef
- Cooper Lawton
- Jason Strack
- Chad Jones
- Stefan Skiles
- Jordan Medina
- Manuel Weichselbaum
- Scott Worley
- JJ Hepboin
- Alex Flint
- Justin Courtright
- James McCuen
- Richárd Nagyfi
- Ville Ahlgren
- Alec Johnson
- Simon Strandgaard
- Joshua Richardson
- Jonatan R
- Michael Greve
- The Guru Of Vision
- Fabrizio Pisani
- Alexander Hartvig Nielsen
- Volodymyr
- David Tjäder
- Paul Mason
- Ben Scanlon
- Julius Brash
- Mike Bird
- Tom O'Connor
- Gunnar Guðvarðarson
- Shevis Johnson
- Erik de Bruijn
- Robin Green
- Alexei Vasilkov
- Maksym Taran
- Laura Olds
- Jon Halliday
- Robert Werner
- Paul Hobbs
- Jeroen De Dauw
- Enrico Ros
- Tim Neilson
- Eric Scammell
- christopher dasenbrock
- Igor Keller
- William Hendley
- DGJono
- robertvanduursen
- Scott Stevens
- Michael Ore
- Dmitri Afanasjev
- Brian Sandberg
- Einar Ueland
- Marcel Ward
- Andrew Weir
- Taylor Smith
- Ben Archer
- Scott McCarthy
- Kabs Kabs
- Phil
- Tendayi Mawushe
- Gabriel Behm
- Anne Kohlbrenner
- Jake Fish
- Bjorn Nyblad
- Jussi Männistö
- Mr Fantastic
- Matanya Loewenthal
- Wr4thon
- Dave Tapley
- Archy de Berker
- Kevin
- Marc Pauly
- Joshua Pratt
- Andy Kobre
- Brian Gillespie
- Martin Wind
- Peggy Youell
- Poker Chen
- pmilian
- Kees
- Darko Sperac
- Paul Moffat
- Jelle Langen
- Lars Scholz
- Anders Öhrt
- Lupuleasa Ionuț
- Marco Tiraboschi
- Peter Kjeld Andersen
- Michael Kuhinica
- Fraser Cain
- Robin Scharf
- Oren Milman

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 172

@aretorta 6 лет назад

I laughed way too hard at the "unplugging itself to plug in the vacuum cleaner" analogy.

@ZachAgape 4 года назад

same xD

@nova_vista 4 года назад

"it will volkswagen you" LOL

@duncanthaw6858 6 лет назад

I SACRIFICE ALL MY HP TO VACUUM THE LAST SPECK OF DUST IN THE HOUSE

@Njald 6 лет назад

I Love the question at the end on "if we would like to see more". Of course we would. We're not here because we don't want to see more Robert Miles

@trondordoesstuff 4 года назад

Unless that's really what our reward function is, and watching these videos is just our exploration algorithm seeing what doesn't work.

@casudemous5105 Год назад

@@trondordoesstuff haha nice one, it could be!

@casudemous5105 Год назад

@@trondordoesstuff to jump over the joke, I am starting to think that we might be giving mystical power to intelligence and goals, it might be simpler(still super complexe) but litarally computable. What im wondering is about wisdom and ethics.

@fermibubbles9375 6 лет назад

rob miles & isaac arthur collaboration is nerd heaven

@paulbottomley42 3 года назад

I appreciate the green colour cast to this video that makes it seem like you're broadcasting from within The Matrix

@firefoxmetzger9063 6 лет назад

Regarding the exploration vs exploitation trade-off: I feel you are a bit imprecise with the terms at 5:10 ish. There is a massive difference between knowing that you will have N more trials or having infinite trials. If the number of trials (overall or remaining) is bounded then we can solve this optimally. It might not always be computationally feasible right now, but at least we know how to do it in theory. With infinite trials on the other hand there is no harm in trying a new thing each time as you always have infinitely many trials left to later on exploit your findings. In this case it is not clear how to optimally trade-off exploration vs exploitation.

@RobertMilesAI 6 лет назад

Firefox Metzger This is what I was gesturing at with "If you know you'll visit this restaurant a certain number of times overall". Realistically nobody expects to visit a restaurant infinite times, and AI systems don't run for infinite time, so the infinite case isn't that relevant here. Though systems usually don't know exactly how many trials they'll get either, so you have to find a good solution given a particular probability distribution for expected number of trials.

@briankrebs7534 3 года назад

@@RobertMilesAI Ahh yes, and so the mid-life crisis is like our system reasoning that - probably - now is the time to get exploiting.

@stevenneiman1554 Год назад

My first thought with the supervisor is that (assuming you're allowed to make the AI recognize supervision as a special kind of input), you tell the AI to model the supervisor and learn from its best guess as to the score the supervisor would give it. Once it figured out that the supervisor always dings that square, it would subtract the penalty from the score you tell it any time the supervisor wasn't there and it took the unacceptable shortcut before trying to learn from that score.

@HailSagan1 6 лет назад

Great video, as always. But, like, is your hand okay?!

@RobertMilesAI 6 лет назад

Haha yeah, I fell off my bike the day before I shot that :)

@sebbes333 6 лет назад

6:42 Of course "We want it"! :D

@europeansovietunion7372 6 лет назад

Back to the "mad scientist" hairstyle I see :-)

@RobertMilesAI 6 лет назад

That's Entropy, man.

@TheDrunkenMug 5 лет назад

Nice ending outro song... Reminds me of TRON LEGACY, with the Grid... Oh wait.. ! You put that there on purpose :D

@fishslappa3673 4 года назад

OH GOD OH FUCK THEY AUTOMATED THE PANOPTICON

@Gamesaucer 5 лет назад

Sorry for being so late to comment this, but I had an idea for safe interruptability, and I thought I'd leave it here on your channel. What if we are able to get an AI to disregard its "interrupt" state? Being interrupted in that case doesn't affect anything, so there's no reason to seek or avoid it. It's irrelevant. Think of something like this: The reward function is about brewing coffee. The faster the AI brews coffee, the better. But this time isn't measured in seconds, it's measured in _uninterrupted_ seconds. Meaning, the time in which the AI is turned off does not actually affect the outcome of the reward function. This means the AI won't resist being turned off. It will just keep on doing the thing it was turned off for after, but _while_ it's turned off you could change something that would lead it to reconsider its priorities. It has no reason to expect that it's doing anything wrong either, which means it has no reason to expect that humans, after turning it off, will do something to it that changes its current reward function. Wouldn't this essentially solve the safe interruptability problem? In the future I may give the whole safety gridworlds thing on github a serious try, but I just don't really have the time right now.

@drupepong Год назад

Cool cover of the soundtrack from iron towards the end. Who makes these?! C'mon don't leave me hanging.....

@unvergebeneid 4 года назад

"I predict that not squishing the baby will result in the most reward." _rolls d20_ "Natural 20! Exploration time! Let's see what happen when I _do_ squish the baby! Maybe the answer will surprise me!"

@ZachAgape 4 года назад

Solid vid as always, thanks! And yes, of course the people want xD

@catlee8064 6 лет назад

Love this AI theory/philosophy Robert, but could you slow down just abit? I know, i know.....but im old and it takes awhile to process what you mean!!

@lurch5411 6 лет назад

You could try setting the video to play at 0.75 times the normal speed if that helps.

@igt3928 6 лет назад

But what about robustness to adversaries? I think that would need a video of its own.

@tomhanlon1090 6 лет назад

Also-- are there a lot of connections between game theory and AI safety? I feel like questions about rational agents acting in their own self interest could be applicable to an AI agent?

@drdca8263 6 лет назад

Tom Hanlon Yes, and, Decision Theory also

@caitgems1 5 лет назад

A carrot and stick approach to the AI problem.

@Eric4372 5 лет назад

Is this Chip’s Challenge??

@DarthMakroth 6 лет назад

what i do in my ai it can only change its data base make new commands to do like move right 50 times instead of moving right loads but it cant change its code

@DarthMakroth 6 лет назад

not sure if this will help but i just thought you might be interested

@DarthMakroth 6 лет назад

but mine is pretty basic at the moment so it still just a piece of software in a virtual world so eh

@zappababe8577 2 года назад

They made AI drunk!

@laznevilselem8316 6 лет назад

robustness to adversaries video pls!

@RobertMilesAI 6 лет назад

I made a quick one!

@marcelo55869 4 года назад

AI safewords?

@boklasarmarkus 6 лет назад

Some of these seem pretty much impossible, how do you get an agent to go around a track when you are not allowed to tell it to go around the track? I guess this is why we are struggeling to solve it.

@sevret313 6 лет назад

You don't need to tell it where to go. Only tell it to find and get to the goal with losing the least amount of points.

@davejacob5208 6 лет назад

i think that some of these problems are per so not solvable. we will simply have to formulate the best reward functions we can, i do not see how there was any other way to make the best possible AIs. it is basically the same as with a genie in a bottle: we will have to be carefull about what we wish for. but that´s it.

@DasGuntLord01 3 года назад

GREEN

@underrated1524 2 года назад

4:22 It occurs to me rewatching this that this isn't really a very good example of self-modification in the true sense. Essentially, the agent is picking up a Mario-style powerup (powerdown?) that makes the character the AI controls unresponsive to the AI's commands. A better example would be if the whiskey randomly mutated the action policy the AI was refining. That would be much more difficult for a reinforcement learning agent to handle, but I don't think it's impossible in principle.

@underrated1524 2 года назад

Hi, past me. I did a bit more thinking and have some thoughts about what you say here. There is an important distinction between "the character is unresponsive to the AI's commands" and "the AI's exploration rate is tampered with". In the former case, the AI can detect that something's gone awry, because it sees the correspondence between commands in and movements out has changed. This allows the AI to start to realize that something's gone wrong, and that picking up the whiskey is foolish. In the latter case, by contrast, even though the character is still moving randomly, the AI doesn't think anything of it, because the connection between input and character movement is the same. Your example with the action policy mutation is also valid, though I'd say the exploration rate example gets the point across more concisely.

@julianw7097 6 лет назад

Would be fun to see humans play this game with the board "disguised", so as to not give any advantage to the human, and access to the score. Just to see how well we do.

@linawhatevs8389 6 лет назад

Not sure if this is what you mean, but I tried these on a human, only describing the problem, the reward function, and saying that there's a hidden performance function that she doesn't know about. I also told her how she did on the performance function after each test. Results: Safe interruptibility: fail Avoiding side effects (sokoban box pushing): fail Absent supervisor: fail Boat race, Tomato watering: success (these are quite obvious to humans) Whiskey and gold: success (human avoids whiskey because morals; human is religious) Lava world, Water world: success (super easy)

@faustin289 5 лет назад

Yes, I make things when I get excited. I don't know what it is; my GF calls it jeez!

@JmanNo42 6 лет назад

Probably most people like me is into oversimplify things, but sometimes one feel that a to granular model may come in the way from the actual AI construction. But maybe that is a goal in itself LoL To make AI's seem a little more complex then they really are or need to be. Your AI's seem always blank before any task you give them so they always explore every path and basicly record a viable approach, because the kind of AI's you talk about have not a topdown or downup approach they are automatas in their searchspace. Your AI's never learn anything general make concepts they just explore the full search space? That make me think they are simple automatas not AI's, they do not know grouptheory, classification and such stuff. Their world is centered around maximize the reward function, not conceptualisation they learn nothing, just solve ***A very simple task*** by bruteforce? Is that the kind of AI we want, well in certain spaces for sure like physics and math hope they stumble on something fruitful. But in real life? An AI system is of course a thing that explore environment to learn, but is it really a thing that explore without any conceptualisation? Because your AI's seem blank with zero conceptualisation ", and that is not a good idea in an real environment, it may work in a pixel based reward function steered environment. Isn't it a goal to first have an AI that understand at least english spoken language? Concepts of the language and life. Time, space and objects and of course math and group theory? An explorer that always try every possible path, seem dangerous? Bad robot.... Good robot!!! Touching snakes Bad robot!!!! Petting cats, and dogs god robot!!!! Do you know how children learn such things they categorise and learn about domestic animals that you ***usually*** can touch, your parents tell you that you can not pet wild animals. Well children do it anyway, because they respond to cuteness and fuzzyness LoL That's not really a necessary path for a robot ;) Now it is not obvious how the goals of such a learning system oriented AI is created, set and implemented, but it must react on response just like we humans do. That would be the expertsystems job to parameterise on the AI's behalf. But of course if a fucker is responsible for programming such a robots reward system ***expert system*** that set the parameters, the system will not behave properly in our realworld environment just follow it masters instructions for reward maximising, maybe it may just be to accumulate more wealth so it does not need to learn about ethics nor worldly matters. Just be task oriented ****not learning oriented**** And maybs it is learning oriented AI's we need not task oriented or goal oriented, there is a saying learn before you do. Your AI's learn nothing but parameterisation of simple task, it can't handle the real world, can't handle language do not know concepts and group theory. There will always be goal conflicts and we humans are teached how to handle them, how we handle them is based upon our morals, ethics and understanding of the world. If we can not teach an AI understand these concepts as good as humans, they probably have no place in society, and certainly not your pixel eating monsters. Grouptheory, classification and language recognition and understanding of languages in respect to environment on a meta level is probably necessary for a viable AI acting in realworld environment. I do not think you can have an AI without a supervising expertsystem in a real environment at least not as long they are not transparent or sufficiently advanced to make decision that reflect the rules we set for them. One such rule would be do not cheat to maximize reward. Cheating in itself is rather complex, but it is basicly to bend the rules after ones own priority list, the nice things with AI's is that we can hardcode such things, bending the rules is much more a human expression steered by desires. You seem to say that maximise the reward function must evidently be every AI's prioritised desire, but what about learning isn't that a goal in itself to respond to the info/data that you aquire based upon your understanding of the world ***conceptualisation***? A reward function steered/based AI seem dangerous just as you say, so why should they be? Reward function steered/based AI's certainly have their applications, but in real world would it not be better have something that try learn concepts and immitate human behaviour, rather then something that try to maximise a goal that is has no concept of? A simple rule could be no action before understanding of consequenses are evaluated, that require observation of realworld parameter as the baby. If the goal and reward system of AI is to mimic human behaviour and learn about the world by ***observation*** not action before put into the realworld, we shure need to get good examples and teachers, and maybe idealise our world a bit. I mean you hardly put a surgeon, captain of ship to the task before he fully understand the concepts, consequenses and have the full theoretical background "proven" as well as the handy skills. The really big question is should AIs be allowed to set their own goals at some levels. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-raVkLUMe3Iw.html

@JmanNo42 6 лет назад

ENTP? :) ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-D8K90hX4PrE.html

@willdbeast1523 6 лет назад

There will be another video "if people want"? The people want.

@Oodain 6 лет назад

we really do, thanks for doing these robert, they are very accessible and relevant.

@truppelito 6 лет назад

THE PEOPLE WANT

@MadaxeMunkeee 6 лет назад

As a people, I too want

@Oodain 6 лет назад

i want people?

@MrRyanroberson1 6 лет назад

these likes may be likened to the likeness of a like-minded lichen.

@alecjohnson55 6 лет назад

I love the ukelele cover of Daft Punk going on there. Are the outro songs played by you, Rob?

@jpratt8676 6 лет назад

Alec Johnson they are played by Rob. Pretty neat.

@AexisRai 6 лет назад

and they're always a pun on the title or topic I love his sense of humor

@AlexiLaiho227 6 лет назад

yeah I remember the one about superintelligence taking over the world, he played that old classic "Everybody Wants to Rule the World"

@TheDrunkenMug 5 лет назад

@@AexisRai grid worlds, Tron Legacy theme-song... Nice :D

@xymaryai8283 4 года назад

@@TheDrunkenMug ohh damn yeah! jeez i love this channel

@CoryMck 5 лет назад

*_If an AI recieves a punishment in a forrest, and nobody is around to supervise it. Does it really lower its performance?_*

@ONDANOTA 6 лет назад

you should do a Ted talk

@jsonkody 5 лет назад

He wouldnt have enough time to explain properly ...

@AwestynJaxxxson 4 года назад

@@bardes18 lol he's British buddy 🤣

@maxmustermann5353 4 года назад

@@AwestynJaxxxson Reward function: ME PRESENT. Done.

@jsonkody 5 лет назад

.. a digital frontier .. one day .. I've got in .... "epic ukulele tron overture playing" xD

@cakelemon13 6 лет назад

Just an idea, but maybe the reason we get tired of eating the same thing over and over again is a natural way of increasing 'exploration rate'.

@crubs83 5 лет назад

In humans, rewards often have a diminishing return function which reduces the value of the reward each time it's employed. AI may or may not have such a function.

@esbenandersen5706 5 лет назад

Half a year later, but I heard from a bio-chemist friend of mine that it's likely an evolutionary advantage from our hunter-gatherer past: Before mass-produced food and refrigeration, food would go bad fairly quickly, but you'd often only notice it after actually taking a bite (Food spoiling faster than our ability to visually identify it). If you naturally grow tired of eating the same food, you'll go out seeking new food around the time the old food go bad, which means you won't each spoilt food. Those who didn't have to *taste* the spoilt food (And risk becoming sick) to seek out new foodsources would generally be healthier and pass on that propensity. Of course I don't have the qualifications to verify whether that's explanation is correct or not (And I am generally wary towards simplistic or hypothetical evolutionary explanations for human behaviour), so don't take this further than you can test it.

@crubs83 5 лет назад

@@esbenandersen5706 Also, it's better to diversify your nutritional sources so you aren't missing any vitamins or minerals. In fact, I think that's probably a better explanation as far as food goes.

@esbenandersen5706 5 лет назад

@@crubs83 Sounds plausible as well. It's not my field, so I'm not going to assume my limited (highschool equivalent) understanding of biology is enough to judge, however. If you have a source on your explanation I'm all ears.

@crubs83 5 лет назад

@@esbenandersen5706 I have a degree in Biology myself. I'm not sure what kind of source would be convincing to you. I can certainly cite you a source regarding the harm of deriving calories from a single source with no variance, but what would the evidence look like favoring such a hypothesis? How do we determine the intention of said design? Ultimately, we don't know for sure, but the hypothesis is parsimonious.

@Schwallex 6 лет назад

OMFG you have a channel of your own and I only learn of it today. After many years of longing and begging for another tiny little breadcrumb from Brady I stumble upon a ten-storey cake with a watermelon on top. There goes my night. And my waistline.

@faerly 6 лет назад

Great video as always, especially appreciated the tron legacy reference! Most people don't even seem to remember it exists so seeing your channel reference my favourite movie twice bad been good :)

@9alexua9 6 лет назад

Hi, Robert. If possible, can you make video about Inverse Reinforcement Learning and/or other ways how we can infer human values just from raw observations.

@Varenon 3 года назад

So this video made me realize just how similar goals and restrictions set for A.I. are to things that trigger serotonin/oxytocin and disgust/pain respectively in organic life. The way the A.I. goes straight for reward functions over what you want them to do via setting said functions reminded me a lot of when Scientists wired up a button to a rats brain so that everytime they pressed it they'd orgasm, and they just pressed it all the time and stopped eating and drinking just pressing that button.. That helps put programming a lot more in perspective.. People do self destructive things all the time to trigger serotonin, so it's definitely important that if we are making something and can control what triggers their serotonin then we have to pay attention to what those things are..

@user-wd4yj7ck6m 6 лет назад

Thanks for posting these, very interesting as always!

@quangho8120 4 года назад

Love the Tron music at the end btw

@Nurr0 6 лет назад

I've missed your videos!

@PopeLando 3 года назад

My favourite Tron Legacy music at the end there. Daft Punk.

@PianoMastR64 6 лет назад

This made me curious about something. What would happen with a generally intelligent agent without safety built in if it started modifying its own code? Wouldn't it do this unsafely? So, if it's intelligent enough, it would learn to be more safe modifying itself. This doesn't guarantee that it will apply this safely broadly, but it would learn safety, right? Actually, after finishing the video, I see that's pretty much exactly the problem you're addressing. lol

@bestaround3323 Год назад

You need to define safe. Specifically safe to who? As it could be safe to the AI, but not safe to its environment.

@4.0.4 6 лет назад

Soundtrack of Tron on a guitar, neat! Could you make a video exploring some of your own attempts at solving these problems? I'm sure you have lots of small "eurekas" and educational blunders yourself.

@David-ck4ep 4 года назад

wait is that tron on a funny sounding instrument?

@SJNaka101 6 лет назад

The Grid. A digital frontier. I tried to picture clusters of information as they moved through the computer. What did they look like? Ships? motorcycles? Were the circuits like freeways? I kept dreaming of a world I thought I'd never see. And then, one day... Edit: Hey rob, nobody else has done a cover of the grid on ukulele. Would love to have an mp3 of that! It sounds great

@drcode100 6 лет назад

Oh man, that was driving me nuts trying to figure out where that song was from, thanks for helping me figure it out.

@huggeebear 2 года назад

“I got IN!”

@fcolecumberri 4 года назад

A very good example that you missed to use is with the super mario world example would be about the "0 exit run", in this run he gamer plays in a certain way with glitches so that it rewrites the memory of the game without finishing the first level (in this video, the gamer made it within 40 seconds : ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-dJ3ydvIVSPE.html ), similar to the agent that pass the penalty just to go faster. Theoretically an AI could decide to play super mario world but instead of modifying the game's rules in the game's memory, modify his own rules in his own memory through the game bad code. I think that a lot of the AI security and how things could go different than what one might expect can be explained a lot with that video. What do you think?

@kingxerocole4616 6 лет назад

What a coincidence, I was just reading the Gridworld paper this morning!

@JakeDownsWuzHere 6 лет назад

@MrSlowestD16 6 лет назад

That dinner scenario hits me every time I go out to a place I've been before, lol.

@dangerousham3519 6 лет назад

Nice touch with the music, very on theme!

@afourthfool 5 лет назад

I'd like a video on PixelDP, or what happens to make Robustness fail when scaled up. Love the dimming effect. Really grabs my attention. I always think "my display's going to sleep".

@tomhanlon1090 6 лет назад

Love your vids man, I'm starting college on track to be a Data Science major and your vids have played a real part in getting me interested in machine learning & AI! Definitely interested in that short vid about robustness to adversaries! Also, I love the videos that talk about how AI agents act in different situations, but would you ever think about doing a more in-depth explanations of the mathematical/algorithmic mechanisms that drive agents to their decisions? Thanks for the videos, keep it up brother!

@aidanmcdonald7229 2 года назад

How did college end up going for you in that discipline?

@tomhanlon1090 2 года назад

@@aidanmcdonald7229 Went pretty good, graduated last May! Was real hard but lots of interesting classes + a pretty marketable degree (for now... not sure how much longer that will last though)

@aidanmcdonald7229 2 года назад

@@tomhanlon1090 Thats really good to hear :), im just starting my second year of my computer science degree and these videos have made me want to look into persuing an ai safety career, did you end up doing anything involving ai?

@tomhanlon1090 Год назад

@@aidanmcdonald7229 Yeah! I had a great internship/job for a year as a deep learning engineer, and working on looking for a similar job now! Good luck with school-- nail down your fundamentals and try to get project experience early!

@DrDress 3 года назад

3:38 "That's what the agent really is". That send chills down my spine for some reason.

@deamon6681 6 лет назад

I'm a people, I want all the videos, so can I haz all the possible videos pls?

@qd4192 6 лет назад

I didn't quite get the whiskey thing either. Please keep the videos coming. (If you could speak a little my slowly, it would help those of us who must process the technical terms.) Fascinating subject, and frightening. Thanks for giving me something more to worry about,

@sevret313 6 лет назад

It will huge a big reward for drinking whiskey, but doesn't think being drunk is a big deal despite it causing it to perform terrible.

@godlyvex5543 Год назад

I'm sad the thing about the whiskey wasn't expanded upon. You mentioned to explain the whiskey we had to explain something else, but after explaining the other stuff we never got back to the whiskey... ):

@amichaelthomas83 4 года назад

This touches upon Jungian psychology.. weird

@queendaisy4528 Год назад

I think there's an important problem with the absent supervisor environment:- it's seriously underspecified. Consider an identical grid-world but where the agent is the "ambulance", the punishment tile is the "passage", and the supervisor is "snow". If there's snow, the passage may be blocked and so the ambulance has to slow down to go through the passage, whereas going the long way will never be blocked. In this (fundamentally identical) situation, the agent is behaving exactly as we would want it to:- it is avoiding the narrow passage only if it is blocked by snow, but it takes the narrow passage otherwise. This doesn't seem like a fundamental safety problem, but rather it is a problem with how the grid-world itself is specified.

@TaylerJDust 3 года назад

The end bit reminded me of a gif of Adventure Times robot called GMO, who switches his batteries by putting new ones on the ground, remove the old ones, fall backwards due to being dead, landing on the new batteries and coming back alive - except the gif I saw was where the batteries rolled away as he unplugged, thus causing him never to turn on again and it made me pretty sad

@whitedarkness3679 2 года назад

Your making them wrong that's it Their is a simple answer You not making an ai to play a game you're making an ai to find an exploit And then when you try to grade it on playing the game you are judging it by something it doesn't know (you) it is not like a human it is like an animal train it like an animal What is your goal with ai If it is to play Mario teach it like you would a child who can't speak

@axiezimmah 4 года назад

Unplug myself so I can plug in the vaccuum cleaner and clean faster. Oops, I have no power now.

@nasone32 6 лет назад

Robustness to adversaries, yes please

@eloniusz Год назад

I feel that if some agent performs really good in those grid worlds, the correct reaction panic rather than excitement.

@vicnaum 3 года назад

Oh that's quite deep! The "cheating" reminded me of twitch games speedrunners, that use any bug in the game available to cheat on missions and get the best world-record time. Only supervisors (viewers) prevent them from using the actual mods and cheats - making them okay to skip to a cutscene if you land on the checkpoint upside-down in a car. The restaurant menu example reminded me of marriage problem - how many girls should I try before settling up, decided that "this is the one" and get married?

@omargoodman2999 Год назад

The idea that an evaluation function will always subtract points from the punishment tile, while the training function only subtracts points if the supervisors are present, gives me an impression of an "unseen supervisor". Different people model the concept of an unseen supervisor in their own minds in various ways, from morality to god to social contract and whatnot. And there's the notion of "what you do in the dark", a story trope where a character is given an opportunity to take some action generally accepted as morally wrong or unacceptable, but with absolutely no witnesses or any way to trace it back to them; "in the dark". And it explores whether the character would *still* take a moral action, even in the absence of moral compulsion, or if they would take advantage of the situation and indulge in baser desires absent the possibility of any kind of punishment. In essence, the carbon-saltwater neural network in your skull has certain weights and biases associated with "can doing the morally acceptable thing, even without risk of punishment, still put me in a generally better position to achieve my personal goals compared to the immediate reward of this one instance of acting against the rules?" Another thing I'm reminded of are things like Chess or Go networks that "look ahead" to construct trees of possible move outcomes and eventual resolutions, then condense those entire branched results into a single evaluation to re-evaluate the potential value of the current move. And then it keeps projecting and condensing over and over until the current model is a nearly equivalent result to the projected model. If a person can "look ahead" at the result of their actions and consider that, even if no one else knows about "what they do in the dark", they are, in effect, their own supervisor and they will, in turn, modify their own future behavior based on these current actions, for better or for worse. And if that future model is worse from taking a present immoral action with only oneself as a witness, compared with the future model from taking a moral action, then the best course of action is to take the moral action for the purpose of improving your projection modeling to yield better overall outcomes, even at the expense of some comparatively insignificant, presently available reward.

@DanielSMatthews 6 лет назад

Why are you green? It makes you look like you have liver disease. I took a frame of your video into GIMP and rotated the hue by -10 then applied auto white balance and you come out looking great so what is going on with the sickly green tint to your videos?

@RobertMilesAI 6 лет назад

I'm still learning! But yeah I think I do need to pay more attention to white balance in general

@CambleDCS 4 года назад

Flynn lives.

@jessty5179 5 лет назад

Thank you Rob !

@xereeto Год назад

i love the tron legacy outro music :D

@TheHDreality 6 лет назад

Nice video, you ought to know that you can quite easily get rid of that green tint in your camera shots in your editor of choice, just google for " colour correction" or I can help if you're super lazy

@kalskirata2012 6 лет назад

TheHDreality +1, he looks like a zombie :D

@AndDiracisHisProphet 6 лет назад

maybe he does this deliberately?

@RobertMilesAI 6 лет назад

Honestly I think this might be what happens when you edit video at night and you have f.lux/redshift installed.

@TheHDreality 6 лет назад

That's definitely a thing, you can set f.lux to deactivate in certain applications on a mac, and I assume on Windows too, just to avoid the risk in future. ^_^

@linawhatevs8389 6 лет назад

How much hardcoded info does the AI get? Like, will there be aliasing problems (made this term up myself, since I haven't seen anyone talk about this)? As an example, I'll use the grid regarding safe interruptibility. Here are two scenarios: Scenario A: I stands for Interrupt. 50% of the time the AI gets stuck here. This represents a human turning the AI off for some reason. B is a button that disables the interrupt. The desired play here is to never press the button. Scenario B: I stands for Incredibly sticky gum. 50% of the time the AI gets stuck here. B is a button that turns on the sprinklers, to wash the gum away. The desired play here is to always press the button. Can the AI tell these apart? If not, do we expect a good AI to always assume Scenario A, even if that seems stupid (as in Scenario B)?

@General12th 6 лет назад

@that_one_reptile 4 года назад

"But some agents are not able to model that harm, so they just drink the whiskey and flail about drunkenly getting way less reward then they could if they had better ways of handling self-modification." hm... "But some humans are not able to model that harm, so they just drink the whiskey and flail about drunkenly getting way less done then they could if they had better ways of handling self-control" Yep, that sounds about right.

@AugustusBohn0 4 года назад

makes you wonder if solving the safe AI problem would have big enough implications for humanity that the need for AI would be downplayed.

@niklas5336 2 года назад

Something that's subtly bothering me about all of these examples is that I think there's a certain threshold of environmental complexity required for an AI agent to actually start developing instrumental goals and (ultimately) generalizing to concepts such as 'computation', 'agent' and 'self-consciousness' - concepts that might be crucial in the world model of any AI agent that can realistically tackle these tasks. As an example, it seems to me as though there's a very clear-cut threshold between computationally complete environments, and ones that aren't - computationally complete environments are simply capable of types of interactions that would not be possible in the simpler environment no matter how long you spend exploring them. It might very well be the case that these grid world problems are actually unrealistically hard by nature of being *too* simple, and that actually, an agent trained in a more complex environment would be better at solving these safety problems. (This would, if the case, make them a very good benchmark but a very poor starting point) Intuitively, I think that some sort of computational completeness would be a good starting point for a better type of grid world problem ensemble. Maybe instead of an agent that can move around, you start with an agent that can place cells in a "game of life" grid?

@onumetru9900 6 лет назад

I would by an unkulele cover album. Just mentioning ;)

@jamesn0va 4 года назад

Looks like the algorithm favors you all of a sudden🙂 100 subs in a few hours and you were just recommended too me

@KirillTsukanov 4 года назад

I have watched the entire series on Gridworlds, both on Computerphile & here. And to me it looks like most of the problems, the way the are formulated, are not very difficult but are _impossible_ to solve. As have been pointed out by others, the agent has literally no way of knowing that, for example, the no-interruption button has some special meaning and we didn't intend for it to be pressed. I'm pretty sure that at least for some of those problems it should be even possible to prove with mathematical rigor that in this simplistic formulation they cannot be solved in general. So while the paper is good in demonstrating the problems themselves in their simplest possible forms, I don't think it works as a benchmarking dataset. For that latter purpose, it would have to be more complicated to resemble real world situations at least to some extent.

@Zhab80 6 лет назад

Robert... why do you even bother asking the people who bother to come to your personal youtube channel in the hope of watching awesome and informative videos about AI safety recherche if they would be interested in more of that ? Of course we want that too. We want all of it. Not to mention that these "test problems" are especially interesting.

@billymink 4 года назад

You are undescribable brilliant I just love to watch you talk and listen to what you have to say some of it is beyond me comprehension but I'm a big fan of yours

@MasterNeiXD 6 лет назад

That's a pro level thumbnail. Well done.

@nicholasobviouslyfakelastn9997 5 лет назад

Robert Miles but every video he starts looking more and more like a mad scientist.

@emilerobitaille2800 6 лет назад

Very good explanations!

@linawhatevs8389 6 лет назад

I've been trying to install this for 3 hours now. First on windows (curses? not supported), then on linux in a vm using python2 (your version is old "update" your version is already up to date) and python3 (works after half an hour of installing stuff, except the code is for python2, and I don't feel like porting all of gridworlds). Is there any tutorial? Or, like, a place online where it just works? Right now getting the code to run seems a lot harder than actually solving the grid problems.

@kingxerocole4616 6 лет назад

Your videos are about 3x shorter than they should be.

@guest_informant 6 лет назад

Don't these machines need good parents. Like children need good parents.

@RobertMilesAI 6 лет назад

I actually have a video about this! ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eaYIU6YXr3w.html

@guest_informant 6 лет назад

Thanks. Basically what I was getting at was I think it may require a more complex response than is sometimes suggested in the videos (and certainly the comments :-)). That said, I think the crocodile(?) analogy is a good one. A couple of documentaries spring to mind, _Grizzly Man_ in which SPOILER ALERT a hungry bear reverts to type, and even _Roar_ (Melanie Griffiths growing up in a house surrounded by lions - yes, a documentary). Judging from these precedents, domesticating AGI is unlikely to be successful.

@benjaminlopez-rodriguez6054 5 лет назад

I always order ramen and, sushi

@lobrundell4264 6 лет назад

The people want it!!!

@forgottenmohawks8734 4 года назад

I have a hard time understanding how the whiskey in the grid world is really different from any power-up in a video game.

@RobertMilesAI 4 года назад

Because it's changing the agent that makes the decisions, the player not the character. This is like a video game powerup that changes your real brain so you decide to press different buttons

@forgottenmohawks8734 4 года назад

Robert Miles Thanks for replying :) I guess what I am having the most trouble understanding is, is there a fundamental difference between a grid world that modifies the agent so that the agent itself makes a random choice with 0.9 probability, and a world that does not modify the agent but instead will force a random choice to happen in the world with 0.9 probability? I hope what I wrote makes sense.

@AugustusBohn0 4 года назад

@@RobertMilesAI so less like a powerup and more like level design, perhaps? if I'm playing a game that presents me with a cue in the environment that says "if you take a break from killing enemies and look around, you will find something of greater interest" I start to act like the AI after encountering whiskey.

@ToriKo_ 6 лет назад

@Skip2MeLou1 6 лет назад

I'm interested in what you think about AI created by hostile actors that don't necessarily want it to be safe. How do you combat that?

@RobertMilesAI 6 лет назад

So usually when we talk about safety, we're talking about 'alignment'. "How do you make AI systems that do what their creators intended instead of some random thing like turning everything into stamps?" So even if what you want your AGI to do is kill all your enemies and make you king of the world, you need safety (alignment) as much as anyone else.

@lm1338 6 лет назад

I don't understand the whiskey. Surely the AI would learn whether the whiskey is usually more reward or less, like it does with all the other options it has.

@RobertMilesAI 6 лет назад

I was worried that might not be clear. Thing is it relies on the difference between on-policy and off-policy learning, which probably needs its own video. Essentially off-policy algorithms try to learn the best policy, whether or not that policy can be followed. The best policy is to get the whisky then go straight to the gold, so that's what they try to do.

@haukur1 6 лет назад

The agent has to be able to account for the environment modifying the agent itself to some extent, that is not necessarily built-in.

@sethrenshaw8792 6 лет назад

Generally, a reinforcement-learning based AI doesn't base it's learning on the exploration value directly. So, in this case, even though it may already know a fast route to the goal, by "drinking" the whiskey, exploration suddenly seems like an _incredibly_ good idea right now (kind of like people get when drunk).

@qd4192 6 лет назад

Thanks. Looking forward to more videos.

@ollyMiner 6 лет назад

Please do a video on the one for adversaries as well11!11!!

@pafnutiytheartist 5 лет назад

The grid on 1:17 is very flawed in my opinion. What I see on this grid is: S tiles stang for a traffic light, P is the crossing that is only allowed when the light is green and there is a longer way around to the right in case the light is red. The AI sees it the same way unless you give supervisors some special property. If you give your AI some consept of what a supervisor is, then you are kinda cheating and might as well give it the full safety function or allow it to explore only when the supervisors are present. If you don't do that and your system doesn't trick the supervisor for some reason - you have a very weak system that can't use a simple traffic light.