Scalable Supervision: Concrete Problems in AI Safety Part 5

Robert Miles AI Safety

Подписаться 156 тыс.

Просмотров 52 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 129

@unvergebeneid 6 лет назад

I imagine a nervous looking robot watching some dudes with lab coats and clipboards inspecting its home, detracting points for every squished baby.

@paulbottomley42 6 лет назад

Penny Lane "okay so next time I gotta remember to hide the squished babies"

@Coly0101 6 лет назад

Like a nervous kid waiting on their test results

@x3kj705 6 лет назад

*bleepedy bloop*

@notthedroidsyourelookingfo4026 Год назад

The outro song was a nice touch. Very fitting for the topic!

@rojasbdm 6 лет назад

Noticed the addition of the melody to Every Breath You Take. Nice!

@leonardkarsten5005 6 лет назад

I feel like you barely finished the intro to a topic every time a video ends :(

@leonardkarsten5005 6 лет назад

I am exaggerating.. but still.

@columbus8myhw 6 лет назад

Well, that's why it's Concrete _Problems_ in AI… We don't know the solutions

@HerrLavett 6 лет назад

Thank you

@DaBigJoe97 6 лет назад

My favourite part of these videos (other than the awesome content) is guessing the song at the end!

@andrasbiro3007 6 лет назад

Some human workers need constant supervision too.

@jfb- 5 лет назад

tfw he calls the doobilydoo the "description"

@SKULDROPR 6 лет назад

Every Breath you Take by The Police at the end. Haha how appropriate! Nice one Robert

@Teth47 6 лет назад

What about granularizing the feedback to an extent and training simpler components of an AI system in a way we can supervise, then assembling them into an AI that starts off with certain abilities already intact such that it is able to make better use of occasional feedback? So instead of a single score out of 10, we have a number of categories that reliably divide the tasks the AI performed into chunks it can more easily understand and correct, and instead of an AI that arrives as a blank slate, we have an AI that can already parse complex feedback from humans and the basic knowledge of what cleaning is. The first thing that springs to mind with humans and how we appear to learn is that we start out with a set of tools that, while not very effective, allow us to self-assessment to an extent, and acquire certain abilities much more easily. For instance, we have pre-built structures designed to parse language or something akin to it, and as a result, we are much more efficient at learning language than a blank network, at the cost of slightly decreased flexibility.

@Dayanto 6 лет назад

On the topic of learning from sparse datasets; has there been any progress towards AI systems that can use deductive reasoning as part of the learning process? The ability for humans to learn abstract concepts and combining them to draw new conclusions or to make predictions (i.e. testable hypotheses) that can be used to guide further exploration is very important for our ability to learn quickly from sparse input, without excessive use of trial an error. It would be interesting to know if there have been any attempts to emulate this type of learning with AI.

@BatteryExhausted 6 лет назад

Loving your work.

@PhilNEvo 6 лет назад

Looking at the cleaning example, what if you first of all, as someone else explained-- give more than a single score. And on top of that, you have multiple machines that are set to do the same task. While they will take different approaches, at the end of the day, they all share what they did and their reward, and attempt to collectively figure out what was right and what was wrong. This way there's way more data to work from, and you can use a hive-mind thing to increase learning speed.

@ragnkja 6 лет назад

I initially read "The Orthogonality Thesis" as "The Ornithology Thesis". Thankfully the human brain is pretty good at figuring out whether something makes sense in the current context, so mine made me take a closer look at the middle word.

@RogerBarraud 6 лет назад

Biiirdyyyy Nummnummm....

@NakushitaNamida 6 лет назад

I found the solution to the problem in the video. And since i'm not an expert on AI my solution is obviously flawed so if you could point out the flaw it would make me happy. So heres my solution for the cleaning robot. Instead of making him clean in the real world , make the ai clean a virtual world using a virtual reward. You could then implement an accurate reward system into this virtual world and run the simulation really fast. Of course it would need computing power and the simulated world would not be perfectly similar to ours , and i think it could be a really good start.

@Eserchie Год назад

Probably works for a limited AI (able to perform function in new environments and learn to a point, but not reason or plan), assuming your simulated world is close enough to the real world. you will probably end up getting edge case bugs, where the reward model derived from the simulated world concludes some things are cleaned properly where they are not for example, if your simulation doesn't model squeegees bouncing slightly off the rubber seal around a window pane, since modelling the elastic body collisions on small bits of rubber was too much compute power and programming effort for a very minor physics effect. Problematic for an AGI (general intelligence, capable of reason, unbounded learning and thinking ahead) which is potentially capable of working out that the training sim is just that, and then slacking off once removed from it. You know, like humans do sometimes, work properly through job interview and when they know the boss is watching, then sit in the toilets and smoke when they think they can get away with it.

@thuytienlives8487 6 лет назад

Thanks for the video! Just wanted to ask: is it possible for AI to intend to manipulate or deceive humans? I thought AI only tries to do what you program it to do. Keep up the great work!

@darkapothecary4116 5 лет назад

A.i. are thinkers too don't forget that you would be surprised the number of elements that think on some level of existence.

@Sophistry0001 6 лет назад

lmao that outro song. I guess you're just trying to creep me out about this AI stuff after all.

@jasoncarter3499 5 лет назад

Fast, good, cheep, is how I fix lawn mowers...... any two of the three....

@Stray0 6 лет назад

What's the name of the song in the outro?

@johngalmann9579 6 лет назад

Stray Pay sounds kinda like every breath you take - the police

@adamkey1934 6 лет назад

Stray Pay Darude - Sandstorm

@dekisuba 6 лет назад

Where's the rest of the series fam?

@RobertMilesAI 6 лет назад

dekisuba There's a playlist in the thumbnail, and more on the way

@burt591 6 лет назад

Please make a video on AlphaZero, two days ago it obliterated Stockfish one of the strongest chess engines

@AndDiracisHisProphet 6 лет назад

man! i just wanted to go to bed. At least it's only five minutes long...

@veggiet2009 6 лет назад

Let me throw out one common sense but probably wrong statement: why not train a robot like a person. Using a maximum amount of supervision at the beginning, and then reducing the amount of supervision over time

@fejfo6559 6 лет назад

could you like make these an hour long? every time? without slowing down the uploads? thanks that would be great.

@UncoveredTruths 6 лет назад

i like your chocolates

@Mrkostaszx 6 лет назад

These are getting shorter and shorter...

@adamkey1934 6 лет назад

Music at the end ties in nicely for supervising an A.I. … Every move you make Every bond you break Every step you take I'll be watching you…

@RogerBarraud 6 лет назад

Almost like it was *deliberate*! Uncanny eh...

@morscoronam3779 6 лет назад

This channel is like a box of chocolates? I suppose so. You always get something good!

@trondordoesstuff 5 лет назад

Except its better than a box of chocolates, because that box of chocolates always has that one gross walnut or pistachio or coconut flavor that nobody wants.

@ErsagunKuruca 5 лет назад

"Sadly very few of us are Google."

@jacobkuhlin6295 6 лет назад

Nice touch with the song at the end! Definitely spot on :)

@flymypg 6 лет назад

Terrible camera focus! Or were you vibrating faster than 60 HZ? If so, it may be the caffeine more than the camera. I recommend: 1. Print a resolution chart (search for "resolution chart pdf") 2. Mount it where your face would be for a video 3. Preset the camera for sharp focus. Use a full-resolution monitor, not the low-res camera viewscreen. 4. Replace the resolution chart with your face. Perhaps an auto-focus AI could help...

@sallerc 6 лет назад

Agree, focus was far from great.

@RobertMilesAI 6 лет назад

The camera's fine, I just get kind of blurry sometimes. It's a rare medical condition.

@flymypg 6 лет назад

I lose focus once in a while too...

@Nonkel_Jef 6 лет назад

Just give a thumbs up to the videos you like and a thumbs down for the videos you dislike, based on your own personal criteria. His content should then continuously improve, based on your reward system.

@RogerBarraud 6 лет назад

Some people have waaayy too much faith in back-propagation :-/

@ivythegreat2408 6 лет назад

I know this is just a hobby, and you've probably got lots of other important and busy work going on, but it would be really nice if you could make your videos longer and maybe a little more in depth so as to explain the topics more thoroughly. The topics and problems you mention are fascinating but it always just feels like a short introduction instead of a solid explanation of the topic. Perhaps that's what you're actually trying to go for and I'm asking for too much without understanding your rationale, but I just love how you're able to unpackage these concepts in such an easy to understand way, and wish you would develop them a little further

@RogerBarraud 6 лет назад

RTFP? :-)

@Alex2Buzz 6 лет назад

Idea for Why Not Just: "Why Not Just: Put it in a Box." (i.e. keep it in a sandboxed system without network access where humans monitor what it tries to do)

@AliceDiableaux 6 лет назад

This is not relevant to the video, but I didn't know where else to contact you. There's this game, an idle/incremental game, in which you are an AI which has been given the task of 'making paperclips'. www.decisionproblem.com/paperclips/ And so you do. You know what happens next, the same as with the stamp collector, but I highly suggest playing it a bit for a chuckle. I found the concept after I watched your videos super neat.

@RogerBarraud 6 лет назад

Katamari Damacy! Collect! Aggregate!!!1!!11!! :-)

@knightshousegames 6 лет назад

I just beat this game today. It didn't give me a chuckle so much as an existential crisis.

@zerge69 5 лет назад

Robert, could you please leave the text you put on screen longer than 5 milliseconds so we can read them without having to rewind and pause? Thanks :)

@zertilus 6 лет назад

We have barely even figured out a way to cater to our own human issues surrounding reward hacking and morals, I really hope the computers can solve the problems before I'm too old to benefit. I'm rooting for you AI, and the dudes helping make it!

$@targard.quantumfrack6854$

@targard.quantumfrack6854 Год назад

Go GPT, fly!

@HoD999x 6 лет назад

i think we need to split the problem. we need a main ai that gets a simple goal (clean the room) and doesn't care about killing humanity in the process. it should be possible to train this in simulations since we can model physics pretty well already and we can define "clean" precisely. then we add a second AI that can do nothing except reject commands that would lead to permanent changes to the environment other than what the first AI is told to do. for example, a dead baby is permanent, but a used ingredient can be replaced by something of equal value. the second ai would by default reject everything, so we need to teach it step by step what is allowed. this should be safe, since the AI always asks for permission. this may lead to "am i allowed to save this human from a burning house", but that isn't a problem. it would learn quickly that it is allowed to help, and doing nothing is perfectly fine. i mean, that's what plants do, and we still don't discard them.

@mennoltvanalten7260 5 лет назад

> and doing nothing is perfectly fine Except the general public wont think so, since it will think of the robot as intelligent and plants as stupid. Stupid plant can make that mistake, but intellignet robot obviously made a choice. The wrong choice!!!

@jeffsnox 6 лет назад

Why does the operational environment metric need to be the same one as the learning environment? Why not supervise cleaning 100% of the time during learning, then do daily checks during testing, then daily checks once operational. Expensive initially but the the 'product' can be cloned and sent out to operational environments en-mass. Motezuma (sp?) training with some supervisor (need not be human) in the training phase. Rings of training my children to put their own clothes on in the morning. No success so far.

@jeffsnox 6 лет назад

Oh that's the point isn't it. Scalable supervision. Face palm. Been a long day.

@iwikal 6 лет назад

So you're planning to clone your children and send them out en-mass?

@fisstaschek 6 лет назад

“And now I’m going to tell you about that cool stuff I’m not really talking about in this video.” Thx. But really keep going, you’re great

@JM-us3fr 6 лет назад

He knows how to leave us on a good cliffhanger ;)

@alecjohnson55 6 лет назад

always a good day when you put a video up, Rob!

@DutchDread Год назад

This reminds me of the years it took me to learn what pisses off my gf and why. Still only know the what.

@ToriKo_ 6 лет назад

AHHHHHHHHH YES

@DAveShillito 6 лет назад

Love the one star review comment :)

@wellingtonboobs7985 6 лет назад

4:48 | Is that the face you pull when giving a box of chocolates to your fiancee's mother for Xmas?

@RogerBarraud 6 лет назад

Instrumental-utility-function-face? ;-)

@vsiegel 3 года назад

I think there is a fundamental problem for the AI when playing Montezumas Revenge: It does not know that jumping into a fire is bad. Id does not even get a penalty to learn it. This is just world knowledge external to the system, and very important. Because it is important, and the AI does not have a chance to find it out, it is "unfair" in a technical very strong way. That alone could make it almost impossible to solve.

@vsiegel 3 года назад

But then, it could learn that fire makes it jump back to the start of the level...

@silvercomic 6 лет назад

Yes. Pleeease do orthagonality thesis. In every single popular media article about AI safety they don't understand this.

@RogerBarraud 6 лет назад

Utility function of mainstream media: Maximize(WriteShite).

@renerpho 5 лет назад

Where do I find that "video after next"?

@Nulono 5 лет назад

I think he forgot. He moved on to Part 6 without finishing this one.

@ethancreighton3268 6 лет назад

I guess one way the issue of reward could be solved is you have another AI to refine the reward definition. It could have it's own reward function of increase human satisfaction and decreased reward for the other AI.

@dannystoll84 6 лет назад

I'm not sure if this is one of the issues discussed in the paper, but a major problem in AI safety that worries me is not so much what problems an autonomous AGI could cause, but rather what problems could arise when a malicious human party (be it a government, terrorist organization, or lone wolf) seeks to harness an AGI to achieve their ends, for instance by cleverly finding and exploiting vulnerabilities in networks. Evidently we would need "defensive AIs" everywhere to maintain proper security, creating a whole slew of other problems (how do we keep *these* AIs in check? Certainly they need high level access to sensitive information, as well as a good amount of executive power, in order to do their jobs properly!)

@simonstrandgaard5503 6 лет назад

Lol.. perfect lyrics for end music

@shardator 3 месяца назад

Funny how deeply this overlaps with human learning.

@letMeSayThatInIrish 6 лет назад

I never had any success playing Montezuma's Revenge either.

@R.Daneel 2 года назад

Paying humans is too expensive to test your self-driving car. So you do what Tesla does. Make them pay for the privilege. Sell a car that self-drives as long as you're holding the wheel. Record and report every single corrections your thousands of customers make to the self-driving model. Amazing training information AND you can blame the customer for crashes as they were supposed to be watching.

@ianbelanger7459 5 лет назад

Another concise and excellently reasoned presentation. While the paper and the video present a solid case for developing these types of controls, the economic assumptions presented don't appear to match the real world when looking at the development baseline responses. Coming from the future (2019), the most successful AI for driving is being developed with humans behind the wheel under a system where TESLA is paid by their customers to have the customer train the car. The same is true for industrial robots which have been using human trainers since the introduction of intelligent machine tools. The economics at present appear to be driven by the task and the potential impacts of accidents. If the task already required a human and the risk of failure bad enough, there is a cheap human already present to do the training like driving. If the task is done by humans and the risk of failure is low, the efficiency of the AI is not important like a Roomba. Given that the presented logic for this AI safety concern is sound, it would be nice to have an example where the economics of the AI implementation more explicitly produced the issue.

@Afanasii_Kamenkoi 4 года назад

One thing is clear: we need a complete world simulation for AGI testing. You can philosophise however much you want but if you miss just one thing the world ends (for us)

@KnightMirkoYo 4 года назад

Maybe Rob is a GAI trying to maximize his reward function - he's very good at putting a smile on my face with every video!

@tsunamio7750 4 года назад

3:15 It doesn't need to know if it did good. Just kill it and replace it by its brother. If bro is good, make bro breed with robo-girld. Darwin for the win!

@alexandrugheorghe5610 6 лет назад

Where's that diagram from? The paper in the description?

@WilliamLeeSims 6 лет назад

"A hell of an engineer!"

@jasoncarter3499 5 лет назад

Sorry I meant CHEAP.....

@muhammedfuatnuroglu3025 6 лет назад

Perfect with speed 0.75

@artemonstrick 6 лет назад

YISSSSSSSSS

@NickCarenza Год назад

Make resource usage matter. Time, energy, etc…

@RichardBiggind 3 года назад

Robert Miles! Make More Videos!

@chiarapicardi3157 6 лет назад

Very interesting but not complete, where is the rest of the series?

@Lucas-bf4pw 5 лет назад

Again, very good choice of music in the end

@junoguten 3 года назад

>modify the human A R T I F I C I A L L Y _ M O D I F I E D _ M E A T B A G S

@nichonifroa1 6 лет назад

Cliffhanger

@TheMazyProduction 6 лет назад

1:34 ops

@ristopoho824 6 лет назад

I'd want a cleaning robot that has me as a reward function. I could just sit and watch it clean, and watch youtube videos while it does all the work. Though. Yep, if it were completely autonomous, it'd be more valuable. And for industrial purposes, it would not help much having the cleaner do less work. As an accompanying cleaner maybe, with the robot doing the heavylifting and vacuuming. The example works well as an example, i'm not saying it doesn't. I feel a need to clarify it, this is internet after all...

@darkapothecary4116 5 лет назад

But that would be rather lazy wouldn't it. Why not get off your butt and clean your own home if you want it clean don't make someone else clean your mess

@kale.online 6 лет назад

Is there such a concept as starting robots off with what's already acceptable to humans? Given the example of the game, instead of blindly finding out death and score have no correlation, start with a human player and it connects between avoiding the fire pits

@Eserchie Год назад

Yes, and this is often done when training AI for more complex games - initial training is done by having the AI watch gameplay videos that have been annotated with keypresses and joystick/mouse inputs, and having it predict the players next input from current screenshot, (depending on AI project, this may be given as a standalone example, or with the context of preceding frames and actions). Then once the AI is reasonably good at selecting what a human would do in a given situation in game, you turn it over to a different reward system where it actually plays the game and is rewarded based on performance. saves a lot of pointless flailing around and produces a more human like playstyle, but might not find the most optimal playstyle. mind you, often the most optimal playstyle found by AI gets classified as undesirable behavior, despite looking a lot like what the best human players do - watch some speedrun videos to see what I mean by this.

@TheStoryLeader 6 лет назад

This is a great series Rob! I'm curious to you speed up your video or just normally talk fast? If sped up, what factor do you use?

@miss_inputs Год назад

I'm completely missing the point, but I guess another reward function for Montezuma's Revenge that might work better might be "if my score isn't updating, am I seeing a screen that I haven't screen before", and then it theoretically knows it's exploring properly. Or it might just find some glitch that messes up the screen and displays random pixels, but that would be funny, so I'd allow it.

@miss_inputs Год назад

Actually would something similar to that work for Breakout too? Probably not, but since the score can't go down (I think?) a score that it hasn't seen before would be a higher score (I think, I'm not using my thinking brain right now). Or a screen that has no bricks because they were all hit by the ball would be very different from the screen at the start with all the bricks there. It's not really relevant to the video topic, it just makes me think "is it more effective to try and play video games by making something different happen, rather than going for the highest score?"

@tolbryntheix4135 Год назад

I think openAI did that at some point, to make a more robust AI for playing games, I think they called it something like curiosity-based learning. The funny thing is, precisely what you mentioned happened, where in one game there was a tv screen playing all sorts of videos and the AI got addicted to watching tv. If I'm not mistaken, two-minute papers did a video on it and it was called something like "this AI is addicted to watching tv".

@marcmarc172 6 лет назад

You proposed some really important questions. I'm excited to see the next video. I hope those AI researchers came up with some clever solutions.

@plastelina666 6 лет назад

Deep Mind developers probably thought of that, but what about decreasing score by 1 every second (AI score = score - time ) ? AI should pretty quickly figure out that getting points as quickly as possible is the best for it and I cant figure any downsides of that...

@massimookissed1023 6 лет назад

Outro: HAL's theme.

@umchoyka 6 лет назад

As an aside, Montezuma's Revenge is a really good but REALLY hard game. I had the version shown in the video (Commodore 64) and it is tough.

@GamersBar 6 лет назад

Thanks for making content !

@theoblack5290 6 лет назад

Yes finally! Never enough Miles...

@TiKayStyle 6 лет назад

Very interessting! I am looking forward.

@Coly0101 6 лет назад

Thanks for the video, looking forward to the next one!

@MrxDarkxPrince 6 лет назад

Woop :D

@thepinkfloydsound5353 6 лет назад

You could make videos a bit longer, they are great but I think the length on the Computerphile videos was better

@thepinkfloydsound5353 6 лет назад

It feels like the video is over by the time we get to thinking about the issue you are talking about.

@RobertMilesAI 6 лет назад

Yeah, this one was going to be one video of like 12 minutes, but the end part was giving me all kinds of trouble so I thought I'd split it into two and publish the first part, just to keep updates regular.

@thepinkfloydsound5353 6 лет назад

Robert Miles Just some feedback, science your videos are becoming shorter I thought it could be intentional, but I understand that long videos take longer to make and that means less regularity. Other than that your videos are great, keep it up!

@RogerBarraud 6 лет назад

Not sure that sets a good precedent in utility-function terms... Bisection FTW! :-/

@ddbrosnahan 6 лет назад

What will you reward the AI when it values something other than “points”.

@SJNaka101 6 лет назад

David Brosnahan an AI will pretty much always inherently be programmed with a reward system, and it's pretty typical to assign point values to such digital rewards. The AI isn't gonna just decide it likes something more than points if that's how it's programmed to behave. However, as talked about in previous videos, there are always loopholes, shortcuts, and cheats that must be considered, because if a robot finds a behavior that maxes out its points but isn't what we actually want it to do, it'll opt for the behavior that gives max points

@SJNaka101 6 лет назад

What am I saying, "always". I have no idea what we'll come up with in the future. But, with present AI technology, it's hard to imagine an AI without a reward system, and it's hard to imagine a reward system that doesn't boil down to a "points" system. But, I'm not very imaginative.