Training AI Without Writing A Reward Function, with Reward Modelling

Robert Miles AI Safety

Подписаться 156 тыс.

Просмотров 239 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

22 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 925

@NoahTopper 4 года назад

"If you squint, the training process is sort of like a compiler." Totally brilliant statement.

@shouldb.studying4670 4 года назад

I had to squint AND tilt my head but I see what he means 🤣

@ZyTelevan 4 года назад

Code is data is code

@BlargVison 4 года назад

yeah that was a fantastic comparison that i won't forget

@kacperozieblowski3809 4 года назад

I agree

@filipgara3444 4 года назад

„No.”

@TheLifeInMotion 4 года назад

According to Strongbad: "Technology is anything that you don't understand how it works and if you break it you have to buy a new one."

@chrisdaley2852 4 года назад

So retractable pens are technology. Got it.

@CH-bd6jg 4 года назад

@@chrisdaley2852 pen goes in, pen goes out. you can't explain that! just buy a new one!

@columbus8myhw 4 года назад

Chris Daley I mean, yes… but also I was willing to say scissors are technology so maybe I'm not a good judge of these things

@OrchidAlloy 4 года назад

@@chrisdaley2852 Yes they are

@diablominero 4 года назад

So my desktop computer isn't technology because I built it and could replace a single broken component rather than the whole thing?

@atimholt 4 года назад

“Are scissors technology?” Me: yeah, of course. “Most people would say no.” ¯\_(ツ)_/¯

@totally_not_a_bot 4 года назад

Those of us who watch these videos don't really qualify as most people.

@pauljs75 4 года назад

Even sticks can count as technology, if implemented as tools in some way. (Combination of tools and methods to achieve some goals. Usually making a task easier, or doing something else that improves conditions for the tool user.) Obviously such is not the latest and greatest technology, which seems to be the definition this video is going for.

@lucar6897 4 года назад

I also think of calculators as artificial intelligence...

@shayneoneill1506 4 года назад

Yeah the part of my brain that did those anthropology units would never let me think scisors arent technology

@NoahTopper 4 года назад

When I was a kid I definitely would have said no. But I remember at some point being taught that anything along the line of a pencil or chair was technology, and that sunk in. But I imagine a lot of people still have that initial instinct.

@discosteve 4 года назад

Your point still stands, but neverless the scissors have a butt load of tech in the background that us normies aren't aware of (material science). Just wanted to mention that the humble pair of scissors deserves some praise.

@DDvargas123 4 года назад

I was thinking the same thing. We take for granted a lot of the cool tech around us all the time. Levers and Pulleys and other simple machines most of all. But rob makes a good point that people dont commonly think of them as tech even though perhaps they should. Language is a cruel mistress.

@infinummjb 4 года назад

scissors are relatively low-tech, but a tech nonetheless.

@columbus8myhw 4 года назад

Would you consider a scissors company a "tech company" the same way you'd consider Apple and SpaceX tech companies? What about post-its? Is 3M a tech company?

@DDvargas123 4 года назад

@@columbus8myhw 3M's company description is literally: "applies science and innovation to make a real impact by igniting progress and inspiring innovation in lives and communities across the globe." That sounds really tech company to me

@RobertMilesAI 4 года назад

I think if you took someone to a scissors factory and showed them all the machines and equipment of the production line, they'd call that technology. But not so much the scissors themselves

@columbus8myhw 4 года назад

"Like, there's no point asking for feedback if you're already pretty sure you know what the answer is, right?" …Do you want me to answer that question?

@FortoFight 4 года назад

If you think about it, this is a lot closer to how a human learns. A human won't constantly bug you for feedback every single time it does something, nor will it learn how to do something properly from a standardised function (e.g. exam mark schemes). A human will independently use its available knowledge, and occasionally ask for help when it's unsure what to do.

@dannygjk Год назад

Do you have any children? Kids seek approval.

@owenpawling3956 Год назад

@@dannygjk no, but he is right. Kids are just unsure more often.

@Nico-ur2po Год назад

@@dannygjk You don't correct a kid every time they talk using improper grammar or mix up word order. You correct them every now and then, and they learn over time combined with observing how other humans talk.

@dannygjk Год назад

@@Nico-ur2po I didn't, (I have two kids).

@Henrix1998 4 года назад

I can already imagine the Indian ML farms where thousands of people just evaluate learning

@TurkishLoserInc 4 года назад

Sounds a lot like the premise for The Matrix. "On a scale of 1-10, how real do you think this is?"

@Encypruon 4 года назад

It's called Amazon Mechanical Turk.

@Verrisin 4 года назад

Damn, that actually sounds likely... - Here is my idea: since AI will take all our jobs... There will be one job of the future: *Specifying preference.* - I actually don't hate it. :D

@Verrisin 4 года назад

... thinking about it: It kind of is the ideal job, isn't it? Do we, as humans, even want to do anything more than that? - Our job will be saying what we want in the world, and how we want things to work... It will even work as a voting mechanism for policies since they will be run by AI - that figures out how to best match our preferences... - I think this is the way... (or at least a good direction for now ^^)

@benalias5766 4 года назад

I can already imagine a complex AI which is surprisingly good at a wide variety of tasks... and turns out to have hired a load of people in India to do its work for it.

@riccardoorlando2262 4 года назад

So in a couple years captchas will be reward predictor training? "Which of these is the better shoe design"?

@toxicpsion 4 года назад

nah, i'd bet they do it already; just more subtly than that.

@LoveScreamTrue 4 года назад

@@toxicpsion Like Google CAPTCHA? - "Select all traffic lights"

@johnnymellon7414 4 года назад

"Select all the pictures with Sarah Connor in them" ... wait what?

@z-beeblebrox 4 года назад

@@LoveScreamTrue Except it'll become "Select your favorite traffic lights"

@stribika0 4 года назад

Which of these places do you prefer as a shelter during a robot uprising?

@Noxeus1996 4 года назад

Definitely one of the best educational channels on RU-vid.

@zacharieetienne5784 4 года назад

hold on to your papers and i'll see you, next time!

@CynicatPro 4 года назад

@@zacharieetienne5784 TwoMinutePapers is also super good X3

@hypebeastuchiha9229 Год назад

@@CynicatPro he sucks

@stefano8936 4 года назад

Robert Miles: "what is technology?" Me: move the finger to calibrate the amount of video to skip Robert miles: "don't skip ahead" Me: humbly obey

@GrixM 4 года назад

I feel betrayed because the next 5 minutes were just repetition of previous videos so I wish I had in fact skipped ahead.

@jnevercast 4 года назад

Yeah he got me too. I was about to skip just as he said don't skip. "Well okay!"

@Atariese 4 года назад

The thing is... the question he poses after that leads me down that rabbit hole and away from his video... definitely not the intent i would say

@riperian8954 2 года назад

@@GrixM lol i did exactly what you and OP did, only I was like 'okay okay that's enough of that' after about 2 minutes. still a brilliant video overall though xd.

@TheMan83554 4 года назад

The thing about your channel is the little touches of 4th wall humour. Having backflip you say "wait I don't have to do a backflip?" Was brilliant.

@Macieks300 4 года назад

"in a later video" well... see you in 3 months then

4 года назад

This channel always worth the wait :)

@griest5493 4 года назад

IKR, what a tease.

@MatthewStinar 4 года назад

You can't rush this kind of quality! Do you know how long it takes to read and digest all those research papers?

4 года назад

... almost there.

@Macieks300 4 года назад

@ to be fair Robert was on Computerphile in the meantime ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-31rU-VzF5ww.html

@sharkinahat 4 года назад

I wouldn't mind an ad. YT trained me how to skip paid promotion.

@jacoblysinger 4 года назад

RU-vid Vanced vanced.app

@rr.studios 4 года назад

@@jacoblysinger lol im using this app rn

@HansLemurson 4 года назад

What sort of reward function did you use?

@megajor232 4 года назад

Whatcing your videos make me feel smart without actually having to be

@benalias5766 4 года назад

Sounds like you're gaming your reward metric.

@ephemeralvapor8064 4 года назад

Maybe your evaluation of his teaching is: Good teacher = true Because he brings understanding lesser teachers could not in the same time and effort on your part.

@zeikjt 4 года назад

8:50 That backflip part was super enjoyable :D

@MrCreeper20k 4 года назад

17:25 Don't worry Robert, at least I don't mind an ad at the end. And if anyone should get that bread, it's you.

@jessgold551 4 года назад

I have watched all of Robert's videos several times. Its perfectly paced, well considered and clearly communicated. There is so much there its interesting to watch, sleep on it, and watch again later to catch more. I also enjoy the presentation and multiple interesting ways of presenting things like word popups and cut to screen as well as some graphics and clips. If it helps with demographics I am a former software engineer and still work in I.T.

@amyshaw893 4 года назад

just replace the human with another ai, and get the human to rate that ai. not good enough? MOAR AI!!11!!

@DDvargas123 4 года назад

It's AIs all the way down!

@thehypnotoad5184 4 года назад

Just make an AI trained on footage of people doing back flips, no need for human input Even if the AI is "only" 99% accurate it should be enough

@DDvargas123 4 года назад

@@thehypnotoad5184 "footage of people doing backflips" IS human input

@thehypnotoad5184 4 года назад

@@DDvargas123 I mean the input already exist its just need to be collected, its kinda going full circle but it would be interesting to see if you can speed up the reward model that way

@rumplstiltztinkerstein 4 года назад

@@thehypnotoad5184 but the ai will find ways to exploit it. Nothing stops us from giving the footage and having a human checking it from time to time telling it to stop using it's head as a catapult when the ai was supposed to be running

@briandoe5746 4 года назад

I am in a room by myself and I audibly cussed when I heard that openai and deepmind we're working together on something. Google's apparent lack of concern with safety is one of the reasons I want your videos sir

@naturegirl1999 4 года назад

Brian Doe why is two AIs with different modes of thought working together a problem? Humans have different modes(parts of the brain specialized for different tasks) that combine the inputs from these disparate programs into a coherent idea of the world. Imagine trying to learn about your surroundings when the only sense you have is the ability to differentiate temperature and you will understand why certain AIs need others to help with things.

@briandoe5746 4 года назад

@@naturegirl1999 my main concern with AI is not the expediency that it gets to general intelligence. My concern with a i is the safety mechanisms and their capabilities when it gets to general intelligence. Google has multiple times proven to be unconcerned about the safety question in This is highly concerning

@Varue Год назад

Humans being able to simulate problems in their head to predict different outcomes is one of their greatest strengths, it means they can be confronted with new experiences they haven’t evolved specifically for and come up with a solution from a list of possible solutions and stand a much greater chance of overcoming the problem without dying

@the1gip 4 года назад

You, sir, remain one of the most interesting educators in RU-vid. The effort you've put in to making this video watchable and entertaining really shows. There's not too many people I can watch for nearly 18 minutes in front of a beige backdrop and still be hooked.

@OrioPrisco 4 года назад

Hey it's really cool for the viewers that you turned down that sponshorip offer, thanks

@dontfeo 4 года назад

Nah he should've taken it. U can skip it anyway and it would help him bring more content.

@Alex2Buzz 4 года назад

Miles: "What is technology?" *VSauce music*

@ohokcool 4 года назад

Did u go to Palms Middle?

@FrotLopOfficial 4 года назад

That last few minutes of your video will go unnoticed but for those who do, we very much appreciate it.

@Felixkeeg 4 года назад

I am actually a bit dissappointed that you didn't go for the backflip lol

@ruvimlashchuk6134 4 года назад

My disappointment is immeasurable, and my day is ruined.

@ruvimlashchuk6134 4 года назад

My disappointment is immeasurable, and my day is ruined.

@Suush 4 года назад

He forgot to program a reward function :P

@gus2747 4 года назад

"If you squint the training process is sort of like a compiler " --- great sentence!

@igordmitriev7211 4 года назад

>We'll talk about them in a later video //Gets hyped, realises that it's the latest video on the channel, gets reminded of Patreon, enlists to see the video a bit sooner

@DamianReloaded 4 года назад

I would define intelligence as "the ability to autonomously identify problems and search for solutions to achieve goals"

@brendanjackman3600 4 года назад

"Hmm, reward functions are a limiting factor on some ML capabilities. This is a problem. How do we solve problems? WITH ML"

@DDvargas123 4 года назад

Sometimes a solution is so good it can solve its own cons

@MichaelWBauer 4 года назад

It's definitely funny when you frame it this way, but it's also interesting to note the similarity here with the brain. The brain is a system of interconnected neural networks which each are responsible for certain aspects of our thinking capabilities. It's not too hard to imagine the connection between the logical extension of the results in this video and the architecture of the human brain.

@default632 4 года назад

@@MichaelWBauer Remember where the word neural network came from. Duh

@MatthewStinar 4 года назад

I think you're describing a Generative Adversarial Network. en.m.wikipedia.org/wiki/Generative_adversarial_network

@weirdsciencetv4999 Год назад

This channel is so underrated. I had to do just what he proposes in one of my experiments in college. The technique most definitely works!

@arthurguerra3832 4 года назад

I've been so long without your videos. Please upload more frequently so we can drink your intelligence and knowledge.

@bensonmiakoun7674 4 года назад

Highly interested for the next video! Thanks

@wilhem13 4 года назад

A video upload ?? My day's already better. Great content my friend, THIS is why I don't watch TV anymore.

@rosborr4330 4 года назад

I subbed because you knew I'd skip ahead the moment you said 'What is technology?'. You win this round, Robert.

@explogeek 4 года назад

Loving your videos, I understand it takes time to research and script and edit, but I wish they came out more often...

@dontyoufuckinguwume8201 4 года назад

The guy has a full time job, the only way to get him to make more videos is to donate ^^

@haldir108 4 года назад

I am EAGERLY awaiting that video about self-teaching or whatever it is.

@morkovija 4 года назад

Been a long time Rob! Hope you brought the sauce!

@non_complete 4 года назад

I agree wholeheartedly with your name.

@wilhem13 4 года назад

Most videos I MUST watch them on, at least x1.25.

@morkovija 4 года назад

@@wilhem13 means that your content information density is quite high. No way I can speed up mathologer for example. But easily 2-3x some non-narrated restoration videos

@AsmageddonPrince 4 года назад

Your voice is so soothing, and videos so informative.

@NoahTopper 4 года назад

12:19 I approve very greatly of your use of "eachother" as one word. The world needs this change. I don't know if you and I talked about this at all at the EA Hotel, but I've been trying to convince everyone to write it like that.

@squirlmy 4 года назад

I started to do that, but "spell correct" too often comes on and I've gotten used to following automated corrections. I'm wondering if automated (or even AI writing assistants) will slow the evolution of language and grammar, and perhaps even pronunciation will remain in stasis not because of any changing dialect cues of social status, origin (or adopted location), or otherwise, but because of how our "correcting" algorithms are programmed in communication devices.

@qwertyTRiG 4 года назад

@@squirlmy You've reminded me that I really need to create a dictionary with Oxford Spelling (en-GB-oed).

@discipleoferis549 4 года назад

I've been writing "eachother" for 15 years now. I've even told off some of my English teachers for trying to correct me. Heck... I remember back in 6th grade, I think, telling off my teacher for incorrectly correcting another student that had written "ain't". I was an opinionated 11-year-old, haha.

@NoahTopper 4 года назад

@@discipleoferis549 I told my high school English teach that I was attempting to turn "eachother" into one word, and if she'd be willing to not mark it wrong when I used it. She was super on board.

@qwertyTRiG 4 года назад

@@NoahTopper It definitely makes sense. Similarly, I tend to distinguish between "alright" (acceptable) and "all right" (completely correct).

@augustinaslukauskas4433 4 года назад

I'm not surprised this result is amazing considering both OpenAI and DeepMind worked on it. I dream of working for one of them after uni. Thank you for explaining the paper so clearly and in an entertaining way!

@sam-you-is Год назад

did you make it sir

@crypticnomad 4 года назад

When people ask me what AI is I generally say that it is a universal function approximator.

@V1ctoria00 4 года назад

Damn. I dont usually find a new channel by its latest video. I was hoping I could binge this topic here.

@Telhias 4 года назад

With regards to puppeteering the robot to perform a backflip. There is a whole community of the Toribash game who do exactly that. It is a game in which every time period (measured in ms) you decide which joints to flex, extend, hold rigid and relax.

@cmoxiv 4 года назад

Mate, you are brilliant. Great content with a philosophical flavour. The last part about Patreon is probably the only thing that actually convinced me about supporting content creators on Patreon. Well done mate. Well done.

@wiktormigaszewski8684 4 года назад

This is what I always thought of making a good robot - you give a feedback to it, while it learns, just like parents to a child. Very good, that this concept has been put into practice. It is definitely going to be helpful for AI companies making robots for their clients, who do not know exactly, what they need. The guy from "two minute papers" would say "what a great time to be alive!" :-)

@reneko2126 4 года назад

Yeah, why not just raise AI like kids? ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eaYIU6YXr3w.html

@narita_i Год назад

what a time to be alive

@firefoxmetzger9063 4 года назад

hmm. If samples are chosen based on unusual examples where the ensemble disagrees, what happens if the exploiting strategy has high agreement among members of the ensemble? It would never show up to the human for "correction" right, because the ensemble is confident about it? So rather then having to trust the network that performs the task, we now have to trust the ensemble training the reward function?

@MatthewStinar 4 года назад

I was thinking you would still want to throw in some strong matches just to verify.

@panstromek 4 года назад

This is really on point for a problem I am trying to solve now. I do some computer vision for which it is way too complicated to create training data and way too complicated to write reward function, but it's the "You know it, when you see it" type of thing. Thanks for making this video ;)

@StromyYTA 4 года назад

These videos are awesome. Feel almost like I can keep up to date with AI progress.

@tedstokes57 4 года назад

I like that there's a hint about the next video at the end

@xxThabaxx 4 года назад

This is something I've been thinking a lot about as it could work similarly to how we tend to train children. It seems like you could first train a machine learning algorithm to recognize social cues (lingual and physical responses) regarding it's behavior and build a reward function based on that. I think you still run into some complicated reward hacking situations like the machine wanting to force certain reactions. But it seems like it would get us closer.

@eathonhowell7414 Год назад

This way of thinking is exactly what's getting me interested in this field. I cannot help but feel there is a comparison to be made between the in-exact nature of child raising, and trying to "teach" artificial intelligence. General or otherwise. Hell, think of an individual cell within the body as an AGI and the totality of what humans are seems like a miracle.

@eathonhowell7414 Год назад

@gwen9939 Год назад

@@eathonhowell7414 You should probably watch the video called Why not just Raise AI like Kids.

@hypnotourist 4 года назад

Very clear presentation for a fascinating topic ! Your "patreon/human discussions" reward function has trained you well, so to speak :-)

@jayteegamble 4 года назад

meh, we don't mind a 60 second spiel if it gets us more of your awesome content (and we can skip forward anyway). Grab that bag imo

@diribigal 4 года назад

This is a tough problem since watching to the end is probably valued by RU-vid's AI, and even though you and I wouldn't mind, some would. So how do the short term gains of the sponsorship compare to the long term dividends of the RU-vid algorithm and extra subscribers, which increase visibility over time (perhaps by a minor amount) ?

@sevret313 4 года назад

@@diribigal That's why you don't put the sponsor at the end, but the start.

@DarkPrject 4 года назад

This continues to be one of the most interesting channels on RU-vid. Fascinating video. Can't wait to see the next one.

@mrWade101 4 года назад

Scissors would be Old technology, whilst when most people say Technology they mean New technology.

@rerere284 4 года назад

9:00 There's a game called Toribash where you do exactly this, but with a more complex body. It lets you specify the states of all the joints in 1 second tine segments, playing out like speed clocks from chess when playing multiplayer.

@Havermeijer 4 года назад

I remember that game! You could pull someones head off and stuff. Pretty difficult to master though. Also, the game kept sending me happy birthday emails for years and years. I didn't get one last time :(

@EU_DHD 4 года назад

I like watching you talk about AI safety more than I like learning about AI safety. And I really like learning AI safety!

@unvergebeneid 4 года назад

Shade much? So you're not learning AI safety by watching him talk about it?

@EU_DHD 4 года назад

@@unvergebeneid Those are two aspects of the same thing. I just like the one aspect more than the other.

@injinii4336 4 года назад

Surely scissors are an example of some of our most cutting-edge technology. Ba-dum-tss!

@dsdy1205 4 года назад

When you realise you've reinvented the parent-child relationship

@AugustusBohn0 3 года назад

nature wins again

@dsdy1205 3 года назад

God coming back to this comment a year later it sounds so stupid

@CyberAnalyzer 4 года назад

You are a hero. You democratise AI. I can't wait for the next video!

@Laborejo 4 года назад

"It is easier to write a program to evaluate a solution". This is also why artificial music composition does not produce even half-decent outcomes yet. Creating an artificial listener (or many of them) is still far down on the to-do list.

@postvideo97 4 года назад

There have been no research (that I know of) that uses human reward modeling for music generation. It could be the next breakthrough in music generation!

@Sceleri 4 года назад

this method could work for that tho you just tell it which beat is more fire

@ToriKo_ 4 года назад

Sceleri exactly

@dasc000 4 года назад

emily howell: hold my beer

@JsbWalker 4 года назад

Have none of you heard of Emily Howell?

@stefx5994 4 года назад

Hi Rob, Firstly many thanks for the amazing videos you produce - as a fellow Dev and Techie i find your content and delivery style some of the best and most informative on RU-vid. Could i request a future video in which you explain the coding side of developing a basic AI Agent? It would be great to learn how to explore some of the concepts and interesting problems your videos highlight. There's a lot of frameworks, open source projects and tutorials out there already, but they present a very black box, end result focused approach rather than explaining what components we have and how they are working together to reach the end result..the type of complexity you seem to be fantastic at explaining :)

@RobertMilesAI 4 года назад

I've been thinking about a "Write an AGI from scratch" series, but it would be a lot

@fergochan 4 года назад

Great video, but there's still one thing I'm confused about: how do I tell if that simulated robot is doing a back flip or a front flip?

@geronimomiles312 Год назад

You choose to tackle issues which really clarify the meat of the process , and do fantastic. Really good stuff👍

@BinaryReader 4 года назад

Technology is just another word for "Tool". Everything created by humans of some utility is a tool, and is therefore technology. I wasnt aware there was confusion around the definition.

@oldvlognewtricks 4 года назад

Queueing was created by humans and is of some utility. Queueing is not technology. Stand-up comedy was created by humans, and is of some utility. Stand-up comedy is not technology. It is difficult (or perhaps impossible) to write a definition that doesn’t raise exceptions, which I suspect was the point Robert was trying to make. Your example only confirms the point.

@BinaryReader 4 года назад

Not to get into a huge discussion here, but both of those could be loosely defined as technologies. What are jokes if not tools of social interaction? What is queuing if not a tool for social order (assuming you mean standing in line and not the computer science definition, which is also a technology)

@oldvlognewtricks 4 года назад

@@BinaryReader I continue to agree, and disagree. A joke and a queue might be tools, but 'technology' is more of a push. technology /tɛkˈnɒlədʒi/ - noun the application of scientific knowledge for practical purposes, especially in industry. "advances in computer technology" machinery and equipment developed from the application of scientific knowledge. "it will reduce the industry's ability to spend money on new technology" the branch of knowledge dealing with engineering or applied sciences. There is perhaps some science to comedy, but a social convention like queueing is hardly an application of science, so much as an emergent social expediency, or whatever. I'm not getting 'engineering' from either, except in the loosest sense. Alternatively, to take the definition to its logical conclusion, all human action is technology and the definition loses its usefulness. But you're right - no potential for confusion whatsoever ;) At best, there is comparative 'technology-ness' - a joke might be technology, but it's less technology than a smartphone. Maybe moreso than a punch to the face. Maybe it depends on context. Still works to make the 'this is not straightforward to define' point.

@squirlmy 4 года назад

@@BinaryReader Perhaps it's an Americanism, but there's another definition of "tool", and you're well on your way towards demonstrating it. Both of you actually, because none of us need or want an in depth discussion of the definitions of either word. Rob's brief mention of it doesn't warrant further commentary.

@drdca8263 4 года назад

Rob’s definition kind of closely matches Strong Bad’s definition, of “anything that’s really cool and you don’t know how it works”. Ryan North’s definition includes language, and I think basically any technique which has been invented. But yeah, like Rob says, it isn’t a big deal how we define it. Slightly different definitions can can be used in different social circles, or even in different conversations among the same people.

@stephen-torrence 4 года назад

Closest thing to a literal "bicycle for the mind" I've seen in AI research. Cool!

@xenoblad 4 года назад

You've been playing Raid: Shadow Legends for 10 years?!

@bscutajar 4 года назад

This is one of the best channels of youtube. The guy's explanations are extremely well done.

@bencrossley647 4 года назад

This sounds like a method to solve NP problems. Easy to verify Hard to solve.

@4.0.4 4 года назад

The year is 2069. A computer is granted the prize for solving the P vs NP problem. Despite the judges being unable to confirm that the overly-complex thesis the computer came up with was correct or not, it looked quite correct to all experts. A mathematician was quoted saying: "...I mean, in the two new branches of mathematics that the computer invented, the math does check out." It is unknown what the computer will do with the prize, but several paperclip factories report being contacted shortly after the prize money was deposited.

@bencrossley647 4 года назад

Chrysippus +1 for paperclips (assuming you’re referencing the game) It will work it’s way to a galactic army at some point.

@Kevin________ 4 года назад

@@4.0.4 Alright... you win this comment section.

@griest5493 4 года назад

I was thinking the same thing when he said that. Also, the halting problem is a thing. The catch is that NNs are just making approximations.

@default632 4 года назад

@@4.0.4 universalist paperclips, hours of waste time for a reference on the interwebs. Worth it

@MrLuMax5 4 года назад

In my opinion you could have done the sponsorship. It helps you as you help us, 60 seconds is like not that much and you deserve it for all the work.

@sk8rdman 4 года назад

"Mattresses and VPNs." Someone watches SmarterEveryDay

@ZachAgape 4 года назад

The first videos I saw u in were the computerphile videos on AI which I enjoyed a lot, and thanks, this video was very interesting too! Also thank you for not wanting to waste 60 seconds of our time ^^

@BubbleManxx 4 года назад

I laughed at the Vsauce reference.

@Hexanitrobenzene 4 года назад

Could you provide a timestamp ? Looks like I missed it.

@BubbleManxx 4 года назад

@@Hexanitrobenzene Lol, it's at the very start of the video. When he pops up from the lower half of the screen and asks "What is technology?".

@Hexanitrobenzene 4 года назад

@@BubbleManxx Oh, that one :) Looks like I'm rusty on VSauce, haven't watched him in awhile...

@andersenzheng 4 года назад

@@Hexanitrobenzene Not your fault. There hasnt been one for a while

@Metrolonx 4 года назад

Love how the video quality grows with every video! Keep it up!

@Deez-Master 4 года назад

We are getting close to having P=NP

@governmentofficial1409 4 года назад

Silicon Valley spoiler

@MidnightSt 4 года назад

...i don't know much about this area of IT, but the first thing that came to my mind after reading the video title was: "oh, yeah, what's a better idea than creating a black box that nobody knows how and why it works, and what its boundary conditions actually are? why, yes, creating such a black box without even explaining to it what is good and what is bad! BRILLIANT!"

@DigitalicaEG 4 года назад

"Don't skip ahe..." Me: **skipping**

@esquilax5563 4 года назад

Good to see you on here again! You have some of the most fascinating content on RU-vid

@realityChemist 4 года назад

"How do you learn when there's nobody who can teach you?" Read a textbook or a WikiHow article?

@Vode_ika 4 года назад

That is someone teaching you, via a book.

@realityChemist 4 года назад

@@Vode_ika True, I was thinking in the context of someone sitting there teaching you, like in this video. So I guess the answer is just unsupervised learning? Although I could have sworn Rob already did a video on that... Maybe it was someone else on Computerphile?

@drdca8263 4 года назад

Isn’t the answer “think very hard, write things down, and when you can do so safely, try many options, test your previous ideas both by the results of the options you took and by more thinking, repeat”?

@Biped 4 года назад

@@drdca8263 but that all requires some way of evaluating your results (aka having a reward function that teaches you)... It seems weird that there would be a way without that. I mean... the information has to come from somewhere...

@SimonBuchanNz 4 года назад

I would suspect the answer is, in fact, something like googling it, but this, of course, requires a pretty complete internal model of the world to start generating and testing against your own predictions. I'm struggling to think of alternatives that aren't just this in disguise though: the best I have is looking at a small set of successful examples and trying to break down from the solution used what the problem is, so you have something to test your own solutions against. If there's a decent way to describe that that isn't going to fall prey to small training data issues like overfitting, I'm excited: that's starting to really sound like the casual meaning of learning!

@travboat 4 года назад

I think your opening question (what is technology) exemplifies the difficulties in making an intelligent AI (and the statement I just made is another example). Humans have a good ability to interpret things, or simply put, we know it when we see it. we know what technology is a, we understand what pollution is a, but trying to put those terms and a definite box is very difficult and, and unfortunately that's how computers pretty much operate. We give them specific instructions, and that's what they do. Your channel is excellent, thanks for the interesting content!

@roberthoople 4 года назад

"Training AI Without Writing A Reward Function..." *Capitalism Drools*

@MatthewStinar 4 года назад

Watching this video made be realize how much corporations are like poorly programmed artificial intelligence, like the stamp collecting AI that decided to "Kill all humans." We take our instrumental goal of maximizing profits and assign that as the corporation's terminal goal. In pursuing it's terminal goal of maximising profits, the corporation decides to "Kill all humans." 😲

@Havermeijer 4 года назад

Your videos made AI an accessible topic for me. I love the pure logic and game-like thinking.

@Karpata1 4 года назад

Hey if I have to hit the "L" button a couple times so you can get a couple hundreds or even a couple thousands of pounds I'm fine with it.

@vincentguttmann2231 3 года назад

So really, this video was brought to you by... you, the patreons! You should make them decide on a sponsorship message!

@maloxi1472 4 года назад

Thank you for bringing this idea to my attention ! Holy cow ! This is such a simple, yet beautiful idea !

@n4th4ni3lmc5 4 года назад

Awesome explanation and sounds like great progress in the field! Thank you very much, sir.

@orcu 4 года назад

I liked this explanation very much. Great work!

@frib75 4 года назад

An amazing video. Never heard such a beautiful explanation of what reinforcement learning is. Thank you !

@Kobriks1 4 года назад

Excellent explanation! Thank you.

@daviddawkins 4 года назад

Incredibly well presented and articulate, thank you.

@LeanMeanLearningMachine Год назад

Refreshing approach to the actor-critic model :)

@fish_wizard618 Год назад

It seems like this method of evaluation could also help AI's learn to do much more arbitrary things. Like if you wanted a “pretty” pattern, you could train it to make more patterns that you find pretty using this.

@harrisonfackrell 3 года назад

This makes me very, _very_ excited. You're kinda' blowing my mind.

@TheRealFaceyNeck 4 года назад

I wholeheartedly agree: it is MUCH easier to evaluate a solution than to generate a solution. You could pretty much define mathematics that way: trying to evaluate as many known solutions as possible, to get new information, and generate a solution, if-and-only-if previous solution evaluations proved unsuccessful.

@mygreenlama 3 года назад

Thank you for another great video! I am very much looking forward to the continuation ;)

@gabrote42 3 года назад

These are brilliantly designed! I want more!

@bibasniba1832 4 года назад

Priceless knowledge, swift explanation. Bravissimo!

@stephentaylor356 Год назад

Not having a sponsor earned you a like an extra comment from me...for what that's worth. Keep up your fantastic work.

@adryncharn1910 2 года назад

This was highly interesting, thank you!

@johnopalko5223 4 года назад

Thank you for not accepting sponsorship from a company that wanted you to do a 60-second spiel. There are companies who sponsor videos and are happy with just having their logo displayed in the corner once or twice. At most, they have the presenter start out with, "This video is sponsored by So-and-So. [One or two brief sentences.] Link in the description below." These are the companies that get it.