Quantilizers: AI That Doesn't Try Too Hard

Robert Miles AI Safety

Подписаться 156 тыс.

Просмотров 85 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 949

@austinfauteux4388 3 года назад

I missed you.

@deviljelly3 3 года назад

The philatelists didn't....

@zhangalex734 3 года назад

@@deviljelly3 underrated comment

@ironlegnebula 3 года назад

We all did

@General12th 3 года назад

Stampy missed you too, Austin. :)

@DarkExcalibur42 3 года назад

Same. I definitely didn't forget about this content!

@AndDiracisHisProphet 3 года назад

interesting how you always post a new video when i rewatch some of your older ones. I should do that more often...

@colh3127 3 года назад

Please do! XD

@ignaciomartinchiaravalle 3 года назад

@@colh3127 hahahaha I clicked on "answer" just to write the same thing XD

@juliahenriques210 3 года назад

That might just be a successful strategy for a video-posting maximizer...

@AndDiracisHisProphet 3 года назад

@@juliahenriques210 i am way too stupid for that. on the other hand, a strategy a human might employ

@toreshimada 3 года назад

Great to see you still making videos :) Me and the IT department watch them together during lunchtime!

@jackobrien7073 2 года назад

This video and your previous video on AI that doesn't try too hard have got to be my favourites so far! I have to say that some people are misguided when they think that a whole academic disciple exists for AI safety. It's more like a niche, and a much smaller niche than one would expect given its importance.

@JakDRipa 3 года назад

Glad to see this channel isn’t dead after all

@cmilkau 3 года назад

So basically, a 10%-quanitilzer is 10 times as likely to commit murder trying to achieve its goal as an average human, provided that murder is a sufficiently efficient strategy. I don't know, this seems like a risky move, amplifying an already-dangerous behaviour.

@RoberttheWise 3 года назад

I really like that now we arrived at the task to not only building something strictly better than a human in all domains but also something strictly more reasonable than a human.

@joflo5950 3 года назад

Why do you choose randomly at the end, not just take the one at the 10% point?

@flurki 3 года назад

I'm not sure, but I think you choose randomly, because that gives you better results on average (with higher utility than at the 10 % mark) while still having only a small risk of getting dangerous results.

@matt-stam 3 года назад

Also wondering same thing

@jeremymcgowan1914 3 года назад

You've got a great way of explaining these AI topics and I'm happy that you've returned.

@sebastiandierks7919 3 года назад

So glad you're back! I always wonder how you would want to programme these systems. Even though the base idea of mixing human behaviour and utility maximisers like that seems reasonable from a concept based point of view, you "only" need a very good model of reality and human behaviour. I know that's kind of not the subject of this channel as here it is assumed we will build such systems rather sooner than later in the future but it's mindboggling to me how this could be done. (You can tell I'm not an expert 😄)

@tonyduncan9852 3 года назад

I've been feeling the need - the need for speedily liquidising my mind. I'm better now. Thanks.

@iddomargalit-friedman3897 3 года назад

Humans definitely change themselves in order to maximise utility. Every mental treatment, coaching, mental training (for example military training), and so much more. We are constantly trying to influence what drives us, what values we hold, and how we think. Great video!

@christopherg2347 Год назад

8:18 Mass Effect 1 Quest "Signal Tracking". AGI is prohibited due to unsolved security issues. So a minor thief made a AI to steal money. Which promptly went and made a AGI for the Job - exactly this scenario. It seems like for any AI safety issue, there is a example in the Mass Effect series.

@Winteg8 3 года назад

Awesome content :D Love the comedic timing, as always.

@JoshSweetvale 3 года назад

This is just human with extra steps. :v

@Cameronmid1 3 года назад

I am so happy to see more content coming out on your channel. Thanks you very much. I know life gets messy sometimes but I am glad you are still making videos!

@davidwuhrer6704 3 года назад

„A human will act reasonably after all other options are exhausted.“ (This is not necessarily true for _all_ humans; it depends a lot on how much sleep the individual is getting.) As an avid reader of the web comic "Seed“, I think a good idea might be to use models of humans to judge whether a proposed action is acceptable as a form of reward modelling, which introduces at least two other problems: How to judge the quality of the models and resolve contradictions within then, and how to deal with step one of most business plans being “Boil the ocean“ and therefore an acceptable action to at least _some_ users.

@Tutorp 3 года назад

Not quite on topic for this video, more sort of on a tangent of previous video in the series. Bounded utility functions with a negative modifier for overshooting the bound were mentioned, but how about using a model with something completely different as a negative modifier? And I have an idea for what to use for that, which I'd like you all to try to break. Apocalypses are energy-intensive. A measure of how much energy the plan needs to be set into motion (not counting net energy use, but gross, as killing a ton of humans saves a lot of energy) could potentially be used as a heuristic to avoid the apocalyptic scenarios. So if you have a utility function with a bounded positive score for some utility you want to get, and an unbounded negative score for energy use (preferably exponential, or with a cut-off point at which the utility automatically becomes basically minus infinity), how are we looking now?

@unknownusername9335 3 года назад

So, first thought: if we have an AI that can imitate a human, couldn't we just use the strategies from the maximizer as input for the human predictor AI? Option 1: Use strategies as suggestions As input for the human AI you can ask it to rate strategies for the goal "achieve [goal] given that you know about the following strategies: [list of high ranking strategies from maximizer]". This should at least give good ideas that humans are unlikely to think of a better chance of being chosen. Possible problem is actually inputting the strategies in a way that a human could consider a significant number of them. Option 2: Ask predicted humans to evaluate strategies Give the predictor the goal of "Fill out the following questionnaire within [time limit]", where the questionnaire is either evaluating a single strategy or comparing different strategies. If the questions are answered poorly due to not enough time, run again with higher time limit. Example: Consider strategy X. For this questionnaire, fill out each of the questions and rate how confident you are in your answer, and why you have that confidence How dangerous would you rate strategy X? [1 | 2 | 3 | 4 | 5]. Confidence: [ very confident | confident | not very confident | not confident at all ] Reason : [I do not understand the strategy | I do not have sufficient knowledge to evaluate the danger | I did not have enough time to properly evaluate the danger | I understood the strategy and its consequences well ] How effective would you rate strategy X? [1 | 2 | 3 | 4 | 5]. Confidence: [ very confident | confident | not very confident | not confident at all ] Reason : [I do not understand the strategy | I do not have sufficient knowledge to evaluate the danger | I did not have enough time to properly evaluate the danger | I understood the strategy and its consequences well ] Or for the comparison: Consider strategy X, Y and Z. For this questionnaire, answer each of the questions and explain your answer. Which strategy would you absolutely prohibit to achieve [goal]? Why? Which strategies would you choose to achieve [goal]? Why? Did you have enough time to fill out this questionnaire? For the text-based answers, you can filter out some stupid answers by having the predictor "grade" the answers. And option 3: predict responses to discussion about strategies have the predictor AI have discussion with itself by giving it the goal to rank the strategies to achieve the goal. First, give it the goal to either accept or reject a strategy. Do this 3 times for each strategy, keep everything with 2 or more approvals. Repeat with more votes and a higher approval rating until not too many strategies are left. Now ask people (=instances of the AI) to discuss which strategy to pick (= post a discussion / reply to discussions) until enough people agree on a strategy (or at least agree that the chosen strategy is not catastrophically bad). If this works it creates a new problem: how to avoid accepting strategies that are objectively bad but that many humans would still accept? Excluding / weighing parts of the population less is hard to justify ethically, allowing everyone a "vote" means you only have to convince 50% of the population that it a good idea, which is far too easy to do with bad ideas. Could maybe be resolved by only using a population of "experts" for a given goal. Make sure to include experts from different fields as well to prevent things like infrastructure projects not considering environmental/societal impact. If an expert prohibits a strategy, either don't consider it or run prediction again with more experts of that field to see if there is consensus for it being a bad idea.

@HeavyMetalMouse 3 года назад

I still feel like some variation of a 'Lazymaxer' could work You want the system to take some set of actions that accomplishes your goal, however you define that goal (I want somewhere with the range of 95 and 105 stamps), and you don't want the system to take some kind of 'optimal' route to that goal, without going crazy about it. The quantilizer might by a good place to start thinking about it, but rather than having it specifically trying to mimic 'what a human might to' as a condition to 'mix' with the utility function, we use a 'laziness function', defining some metric of 'effort' and giving higher probability to actions with lower 'effort'. 'Effort' need not be actual energy expended or work done, but could include things like favoring less complex solutions, or solutions that require affecting as little of the environment as possible. Most highly 'lazy' actions don't produce many stamps, but with the quantilizer mixing, you end up with a probability peak around the 'laziest strategy that still can get you what you want'. Which... is basically how many humans solve real world problems, in a way. >.< As long as you set up your laziness function right, you're still very unlikely to do things like "take over the world" because that requires doing a *lot* of things and interacting with a *lot* of complexity and moving parts - too much effort; you're even potentially much less likely to 'create a maximizer', since creating secondary systems is a complex task that could, in theory at least, be weighted as 'too much work' (naively, a laziness function could simply take into account the actions of any sub-agents as part of the total 'effort' as though they were the main agent's own actions, for example, making a zero-chill maximizer, like, way too try-hard to be worth it). It certainly doesn't seem like it would be *easy* to work out a good 'laziness' function, but it could be an interesting direction to look in.

@aarvlo Год назад

i think that a way to mitigate this would be to set a threshold of how unlikely an action is for a human to mitigate the most extreme examples as well as creating a separate language model with no general intelligence with the sole purpose of detecting actions that are undesirable and making the AI "reroll" for another. So for example creating a novel printing machine to print more stamps is unlikely but desirable while stealing from the post office is both unlikely and undesirable

@SaffronMilkChap 3 года назад

Here’s something I’m not clear on: once the bottom actions are discarded, how does the probability stay effective if the selected action is a random pick from what’s left? What’s preventing the dangerous, low-but-non-zero probability actions being selected at random? It seems like it would be good to take a band of less than q either side of the “peak” on the distribution - centering on that as an optimum, but tuning the width of the band to allow for exploration.

@CrimsonEclipse5 3 года назад

I guess the idea of putting an upper bound on the effectiveness of an AGI partially defeats the point (that is, to achieve superhuman outcomes for some given goal). Like Rob touched on at the beginning, you can "easily" bound the AGI to the approximate effectiveness level of a human, which will make it approximately as safe as a human, but will limit its power to that of a human. So some of those top 0.01% expected utility strategies might result in a perfect utopia (even if many of them do the opposite) and we have no way of knowing in advance which ones they are because they exist outside the domain of human generatable strategies, so in this model we rely on the AI to make that discretion, which makes it unsafe, though much more likely to behave in a safe manner.

@packered 3 года назад

So I asked this in the comments of the last video, but well after it had been posted. I'm going to paste the comment here again, since I still don't see a solution. "My initial thought was, what if you used an inverse parabolic reward function. Something like -x^2+200x where x is the number of stamps collected after one year. It still peaks at 100, but going over 100 actually would have a worse reward than getting it exactly. So, given the videos example of buying off ebay has a 1% chance of failure, the AI would get maximum reward by ordering 101 stamps off ebay with that reward function. I'm sure there are scenarios where it ends up blowing up the world anyway, because that's how this always goes, but this feels like a step in the right direction." Or, more generally: What if instead of a reward function that becomes flat after a certain point, have a reward function that starts to fall after a given point. This should get the AI to at least rule out absurd plans like "Turn the world into stamps" since that would provide a very large negative utility

@underrated1524 3 года назад

This was actually discussed in that video. The gist is that you've shifted the goal from "get arbitrarily many stamps" to "perform arbitrarily many redundant checks confirming that you have the correct number of stamps" but the fact that you're doing extreme actions to get there remains the same. Instead of a world dismantled and turned into stamps, you get a world dismantled and turned into stamp-counting machines. Still a guaranteed apocalypse.

@cncshrops 3 года назад

Good to have you back, 😊

@DeoMachina 3 года назад

I wanna see how many different hairstyles Robert can put into one video I can't explain why but its hilarious to me

@subhamburnwal9127 3 года назад

Thank you for this video.

@MatthewCampbell765 3 года назад

With AI causing the apocalypse, I'd imagine a good way to get around it is simply to have the AI think more about 'spirit' of its instructions rather than strictly the 'letter'. For example, an AI programmed this way built to make paperclips would theoretically care less about building paperclips and more about fulfilling its "desired function to its creator". It'd stop and ask itself questions like "Why did my creator ask me to make paperclips?" and would likely conclude that its creator does not desire the destruction of mankind.

@stampy5158 3 года назад

Successfully programming an AI to do what we wanted, rather than what we told it to do, would have very large benefits for safety. However, this poses various challenges, both due to adding significant complexity to the AGI design (meaning more things which could fail, and more things we have to learn in order to build it), and other considerations covered in arbital.greaterwrong.com/p/dwim/. Nick Bostrom talks about this in Superintelligence, under the heading Do What I Mean: publicism.info/philosophy/superintelligence/14.html Both the MIRI's Coherent Extrapolated Volition model (www.lesswrong.com/tag/coherent-extrapolated-volition) and CHAI's proposal for Cooperative Inverse Reinforcement Learning / Assistance Games (www.alignmentforum.org/posts/qPoaA5ZSedivA4xJa/our-take-on-chai-s-research-agenda-in-under-1500-words & en.wikipedia.org/wiki/Human_Compatible#Russell%27s_three_principles) are attempts to get the safety advantages. -- _I am a bot. This reply was approved by plex and sudonym_

@stampy5158 3 года назад

@etesp2 3 года назад

@haroldbn6816 3 года назад

So far I see that a real challenge here is the way we have to organize data to train our agent, having human constraints for some general goals. The thought of having a library with a pdf of human choice for each single task in life seem messy to me.

@ignaciomartinchiaravalle 3 года назад

Rob!!!! I was soooo happy when I got your upload notification :D

@andregn4483 3 года назад

Does the human probability really goes on the same dimension of the ordered distribution or is it just a simplification for visual explanation? That seems like a big fallacy to me that you can just overlay a gaussian over the ordered efficiency curve... For example: hijacking a casual stamp collector. It is an absurd strategy (no human would try) with a mediocre result (few hundreds stamps. This strategy could easily fall on the highlighted 10% range, right?

@geraldkenneth119 2 года назад

If we ever build AGI we’ll want it to have a good enough model of the human mind that not only does it know what we mean, rather than just what we say, but is able to limit itself in ways we find desirable that didn’t even occur to us

@MiskyWilkshake 3 года назад

A few questions: 1) I feel like there's another issue here than just that the nasty expected utility options are still on the table, and that's the fact that the top 10% utility options from human action doesn't just capture the actions of exceptionally clever humans, but also favours exceptionally exploitative or unethical humans - even from the possibilities that occur in human action, those with the highest expected utility tend to have the worst side-effects and externalities. Is that something discussed with regards to Quantilizers at all? 2) Why wouldn't they do something like finding the mid-point at every spot between the expected utility and human probability curves, in order to create a new probability curve, then cutting off both the top and tail of that curve by some value according to whatever seems to yield the best results. Wouldn't that give a range of high-performing human-plausible strategies, and exclude outlier actions that are either a: very ineffective regardless of their popularity amongst humans, and b: exceptionally unpopular among human action, despite their effectiveness? Surely that would be a little safer than simply keeping the probability of those big nasty solutions low? I don't think this helps with the problem of the AI finding itself in situations very unlike what humans have experienced though, nor the problem of it building unsafe systems. 3) If those two problems were magically solved, how would we deal with the setting of both the upper and lower bounds of possible actions? Like, how do you know that your upper bound precludes horrific/dangerous actions entirely, or such outcomes are simply at so low a percentage that they haven't come up in testing? How do you test for such behaviour in the first place, when a lot of that behaviour would implicitly involve deceiving testers, etc?

@thatonepersonyouknowtheone7781 3 года назад

this seems rather simple although theres probably hundreds of other problems here, always bracket off the best ~1-2% or until the point we can safely say would remove any real chance at a utility maximiser, or other apocalypse scenarios. I understand this would no longer maximise the output, but neither does this system in general.

@cmilkau 3 года назад

So the obvious choice to improve this would be to completely eliminate the most unlikely behaviours as well, wouldn't it? Go from a "human on a good day" to a "typical human in a good day"? Curious whether something along these lines features in the next installment

@jarrod752 3 года назад

I have been waiting for this video for a long long time.

@apppples 3 года назад

You got from probability of disaster equal to one down to the probability being zero tho! Assuming at least countably many actions exist to take that is. Which is pretty damn good.

@sevenredundent7256 3 года назад

Expected Utility Satisficer Perpetuator? Where the system has to pick a strategy which would end in it being able to re-preform the same task after waiting for the outcome of the actions to complete.

@zestyorangez 3 года назад

I was wondering if you were ever going to follow up on that video.

@rtg5881 3 года назад

Oh i havent forgotton. Glad to see more of you.

@Verrisin 3 года назад

Would it make sense to make an AI, that asks people (ideally lots) whenever it doesn't have a high confidence in knowing whether humans will like certain outcome or not? - It comes up with actions, and outcomes, and tries to maximize matching what (informed, 'expert' on the particular topic; many 'voting' by stating preferences independently) people will want as outcome. - It will only carry out an action, if it has high confidence, that a large number of 'competent/relevant' people want that outcome. (and ~none relevant are not highly against it) - Everybody counts as expert on basic human needs for themselves, etc. ... It could replace government...

@Verrisin 3 года назад

I've made other similar comments, but this one is phrased as a question...

@Urbanski751 3 года назад

Welcome back! Now keep it up!

@MrrVlad 3 года назад

was there a video that discusses how actions may reduce utility function? The more you change the world around you, the less is the value of the original utility function would be. And one of the goals would be to learn how the utility will drop.

@mattcool97 Год назад

What if instead of developing a model of what a human would do, we develop a model of what humans would approve of? Basically a machine that has empathy and can predict human feelings, which seems to me to be the basis of morality. And then it would compare the utility function with "How likely is it for a reasonable person to think this is a good idea, if I explained it to them?" and then applies the same optimization strategy? Maybe with a cut off that if there is less than a 1% chance that a human would approve of this strategy, don't even consider it, regardless of how high the utility function? (maybe it could also spit out a summary of some high utility-low approval ideas it rejected for us to actually consider and get feedback for further training) You could train this by having an AI come up with ideas to solve a problem, presenting the ideas to humans, and then having the humans give them a rating of how acceptable they are.

@alexwebb7676 3 года назад

Was there any discussion in the paper about having two q-values; one that removes the low-utility "human" actions, and one that removes the incredibly high-utility "inhuman" actions? It seems like that could devolve into heuristics if not applied correctly, but requiring a certain amount of "human-ness", if extremely high-performing human-ness, could avoid the most apocalyptic of options.

@stampy5158 3 года назад

We had a discussion about this idea in response to a different comment, though we didn't really come to any firm conclusions. You can read it here if you like: pastebin.com/FVUNCBJt -- _I am a bot. This reply was approved by plex and robert_hildebrandt_

@mikeyleidig9135 3 года назад

Why not have a q2 value that cuts the top end off? say, set it to .999 so that it clips off only the kill everyone for stamps options then renormalize the distribution?

@Zeero3846 3 года назад

Another way to put it is that AI are like sociopathic humans who tend to make soulless decisions with extremely high utility. Or perhaps they are like how people typically see how corporations/governments act when they do things from a purely utilitarian approach (with unchecked power). Maybe utilitarian humans aren't nearly as extreme as AI, but there's probably a huge difference in how they act compared to the average human or the mostly above average human, enough to possibly employ a quantilizer for determining policies that are effective, but not soul-crushing.

@eathonhowell7414 Год назад

So as a tangent here, the Portal games had "Glados" as an AI (or AGI?) Character that had sub-personalities it seemed that governed "her" actions. Wheatley was the name of one that appears in the second game that was specifically designed to give bad ideas, as if it was programmed to be the maximizer of the least effective actions if not the most extreme, and it was seemingly understood by the core program of Glados to use none of those dumb ideas. What I'm curious about is how that works in relation to this quantizers idea. Wheatley seems to always take the reverse q% of what you showed and does some incredibly, terribly stupid things throughout the game, but the question here is how does a quantilizer function then if you add "any sub-AGI I create may be stupid and take stupid actions" into the mix?

@flymypg 3 года назад

I'm beginning to hate when Rob's lucid explanations push my brain into the "Why didn't I think of that?" territory. It feels like I had all the clues in my hand, but couldn't see them. Then Rob turned on the lights. I'm working on a monitoring sub-system that takes over a hundred temperature inputs and must decide when the system has a thermal problem and what to do about it. The general terrain of the problem is very well defined in terms of inputs, outputs, and overall goal (system thermal stability), but is mostly undefined when describing which inputs matter under which circumstances, and how long should be taken to make a decision (of any kind). The time-domain aspects are some of the most vexing. For example, the system has an "I'M ON FIRE!" bit that activates an external alarm, kills power to the system, and triggers a fire-suppression system (before the fire sensors in the room notice the blaze exiting the system). More normally, the system would respond to the presence of overall rising temperatures by increasing the fan speeds (if not already at max). The system should issue warnings under a wide variety of conditions, only some of which can easily be defined by hand (e.g., handling localized heating vs. system-level heating). To make things more complicated, some of the temperature sensors are junk, barely better than having nothing at all, but nobody's sure when their values represent anything useful or actionable. We also have no sensors that explicitly monitor cooling system inlet and outlet temperatures, though I hope they can be accurately inferred from the available sensors. I've already put the system in a thermal test chamber and ran it through multiple temperature cycles with various profiles, under a variety of system configurations and operating modes. I'm hoping the resulting data logs will suffice for system training (though the captured data does not include any actual fires...). I was thinking a GAN would be the way to go, training it with the cases I do know (based on the hard-wired logic from our vastly simpler prior-generation systems), then let it compete against itself to do better. In particular, I'd like it to be able to detect anomalies beyond the few available for training. When I did instrumentation development using "novel physics" sensors, I'd use PCA and SVD to identify effects within the sensor values, then painstakingly map those effects back to the external world so they could be eliminated or filtered. Rinse and repeat. While I am not creating an AGI (well, at least not intentionally ;^), I do need reasonable ways to bound the system's evolution and performance. For example, I don't want the "I'M ON FIRE!" bit to be the instantaneous solution to all anomalies (though it is safe and effective in its own special way, but will severely limit system sales). This video has me considering statistical limits that may work better than "seeing what happens then manually biasing the outcomes".

@XxThunderflamexX 3 года назад

I've been working with an autoencoder for anomaly detection. GANs are pretty hard to train, and if your discriminator is trained only on "Real data and forged, close approximations by a generator", it might not perform well on abnormal data.

@christopherg2347 3 года назад

"Everything that can happen *will* happen, if your program is just run often enough for long enough."

@bersl2 3 года назад

Outstanding use of a dril tweet.

@JamesPetts 3 года назад

I approve of progress. This video does raise interesting questions. Especially - is one path to AI safety a system in which an AGI determines what are utility maximising solutions but does not itself have the agency to implement them without first being checked for safety by another system outside its control? Indeed, is that not a possible explanation for the prevalence of human irrationality - the optimising algorithm that is evolution selects for intelligences that do not rationally choose to do things that are harmful to that function, such as live a happy life involving having no children?

@stampy5158 3 года назад

First, having a system that controls the AGI is not a trivial task, and if the AGI would be super-intelligent, figuring out that the AGI isn't deceptive, that it really is controlled by the system is basically as big as the whole problem of "solving AGI safety." However, just by itself, it won't guarantee that the AGI will have a perfect understanding of the control system, even if the control system would be perfectly aligned with human objectives and if humans can formulate the objectives correctly enough without loopholes. Second, the agent-less optimizing algorithm behind biological evolution, "natural selection," often chooses genes that force certain individuals to have no children, e.g. the vast majority of individual ants. -- _I am a bot. This reply was approved by plex and Aprillion_

@Mr8lacklp 3 года назад

It's been a while so sorry if I missed this but wouldn't a solution to this Problem, at least for relatively simple tasks such as collecting stamps not be to have the utility function be to collect between m and n stamps and punish harshly any strategy that goes overboard (more harshly than one that fails and dies little or nothing)?

@stampy5158 3 года назад

Having a bounded utility function is a reasonable idea, but runs into problems with sufficiently powerful systems, as those described in ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Ao4jwLwT36M.html. In particular, it seems likely to convert the universe into machinery to check with increasing certainty that it actually has the right number of stamps. -- _I am a bot. This reply was approved by plex_

@Aesculathehyena 3 года назад

High level human-plausible stamp collecting idea: Build a utility-maximizing AI system to collect stamps for you. Oops, the Safe-O-Meter just moved.

@DanielMK 3 года назад

Is the Stampy bot custom developed? If so, is it open source / would you make it open source?

@stampy5158 3 года назад

Stampy is custom developed, but at the moment his source code is pretty locked down in a private repo. Eventually we'd like to open it up a bit more and we've even discussed an extension module-type system, so stay tuned to the Discord for updates! -- _I am a bot. This reply was approved by sudonym and plex_

@ordan787 3 года назад

Is there any reasonable way to only present that quantiliser with training data that is deemed acceptable? This of course introduces the problem of 'How do I curate the training data, to exclude unacceptable behaviour?' But that sounds similar to the 'teach an AI to backflip' problem from an earlier video. Another agent is created which reviews examples of some behaviour, tries to catorgise it as good or bad, and regularly gets feedback from a human on the examples for which it has low certainty

@stampy5158 3 года назад

no one approve this, it is an example -- _I am a bot. This reply was approved by plex and Мink_

@stampy5158 3 года назад

You might be able to reduce the unacceptable behaviour from what you would get by sampling humans' actions indiscriminately, possibly by declaring a group of people whose actions you deem "trustworthy enough". This introduces the problem that your subset may be a lot more likely than average to build a maximizer, but you might get around that by making your moral paragons AI safety researchers (in fairness, we've given the philosophers enough time in the sun). However, to "only present acceptable behavior", you would have to have solved the value problem. -- _I am a bot. This reply was approved by plex and Stargate9000_

@robertglass1698 3 года назад

Why not just always choose the action were the human probability and the expected utility intersect?

@Stephen-vu2gk 3 года назад

Could you add in another function that increases the cost of the extreme plans? Perhaps add another agent with a different utility function and a different cost function, and make them agree on a strategy that generates the highest net utility for both agents

@stampy5158 3 года назад

The problem with this is that 'extreme plans' are not at all easy to define (i.e. define them so unambiguously and mathematically that a machine can understand). Check out these videos where Rob talks more about this general area of ideas: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-lqJUIqZNzP8.html ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-S_Sd_S8jwP0.html Adding in another agent may help, but a system of interacting agents gets harder to predict. You really want as close to a guarantee as possible that the system is human aligned rather than hoping that a bunch of non-aligned superintelligences happen to stack against each other in a way that ends well for humanity. -- _I am a bot. This reply was approved by sudonym and plex_

@BobRoss-bp2yv 3 года назад

Where are the datasets imported from? How do we gather data on "what a human would do"? I mean imagine using datasets gathered from the likes of Mr Hitler and Mr Stalin (as an example), would probably mean our doom.

@entropie-3622 3 года назад

If we choose as probability distribution the probability of a human coming up with a strategy and implementing it successfully rescaled by dividing by the general probability of success for a human then this seems pretty reasonable. The likelihood of the AI to implement apocalyptic strategies should be bounded by the 10xthe likelihood of a human implementing such a strategy divided by the general probability of a human to succeed with the task. So when the task is to collect 100 stamps the AI will basically be not more dangerous than 10 humans. You only have to be careful that the task is doable for a human with some probability of success, for example, if the task is to turn the world into stamps we are basically only sampling from humans that can program a stamp collector general AI leading to a lot of stamps. But that is not the fault of the AI but of the human putting an unreasonable parameter into the system.

@venkatchait007 Год назад

What's the point of an AI which makes decisions if humans need to have thought of and assigned a probability of usage to each decision, if it's another AI which evaluates each strategy and allocates this probability we're still relying on AI 2, seems like this quickly turns into us having a infinite number of AIs checking each others work.

@MaximAnnen-j1b 10 месяцев назад

8:30 (many) humans routinely drink coffee: getting rid of goal to fall asleep and focusing on maximizing money earned.

@Muskar2 3 года назад

This hypothetical AGI seems a little bit omniscient. So we're trying to create something effective that has exactly 0% chance of desiring self-modification and also 0% for constructing other AGI's. That's a tough one for sure.

@qwerty_and_azerty 3 года назад

Can we not just assign the highest utility values to 0 probability in this model to remove them?

@XxThunderflamexX 3 года назад

The highest-utility strategy isn't necessarily unsafe, though. As a contrives but illustrative example, if that machine were presented with a binary choice, one option of which was apocalyptic in a way that *didn't* meet the agent's goals, it would destroy the world!

@underrated1524 3 года назад

As DragonSheep said, "highest utility" doesn't necessarily mean unsafe, and conversely, "not highest utility" doesn't necessarily mean safe.

@kade99TV 3 года назад

What if we just simply assigned negative utility to any of the morally bad actions? For example the utility of collecting a 100 stamps is 1. But the utility of achiving any of these terminal goals is -1 : {stealing, abuse, trying to manipulate others, violating the freedom of others}. Wouldn't that be a more efficient yet simpler solution?

@stampy5158 3 года назад

The basic problem with this is that things like 'abuse' and 'violating the freedom of others' are not at all easy to define (i.e. define them so unambiguously and mathematically that a machine can understand). Check out these videos where Rob talks more about this general area of ideas: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-lqJUIqZNzP8.html ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-S_Sd_S8jwP0.html -- _I am a bot. This reply was approved by sudonym and plex_

@clovernacknime6984 3 года назад

An obvious problem with a quantilizer is that it's a dead end: it can't solve any problem no human could since it can't pick an action which no human would. A better approach might be to try predicting whether humans would approve of an action rather than whether they would do it. That might also make the AI safer than humans since we have a tendency to be better people in words than actions. EDIT: another problem is that one thing most of us would do when told to do a job is to ask for payment, not jump right to action. So you won't even get a free servant with a quantilizer.

@testsubject318no6 3 года назад

Why can't we just train the ai to figure out what humans want, with a dataset or something like that? And we put it into a hierarchy so humans want the system to be optomized but they don't want what they want to be changed to optimize the system. So the they don't want to be changed is higher on the list than the system to be super optomized so the system doesn't follow that goal.

@anandsuralkar2947 3 года назад

AGI(Ai in current time) is human made utility maximiser

@lachlanraidal5100 3 года назад

What if, instead of choosing strategies from the top 10% of human distribution, the quantilizer selected them from the region between the top 5% to the top 15%. Thus there'd be a safety margin around the most effective and dangerous options, while still making pretty good decisions overall.

@nicov1003 3 года назад

Instead of trying to model human behavior in the abstract, shouldn't we be trying to focus on those things which encourages humans to not do things that would destroy society around them? For example, while Robert derided it in a previous video as irrational - humans don't have a single utility function with an objective we are fixated on, we get distracted and focus on competiting priorities. We do this not simply because we don't know what would maximize utility according to a specific function best, but because we don't know what our utility function /is/. We're guessing at what the right thing is to do. If we told an AI to collect stamps, but it was unsure that this was /really/ the right thing to do, that collecting stamps would actually maximize its utility, it would hedge its bets, and not try to destroy the world in the process in case some other directive came along. Ultimately, we're trying to get the AI to guess what a human would think of its actions if one was always watching- and that's impossible to know for sure, even for a human. A healthy human being is going to get caught up in the social meaning of many random things. But only a destructive person fixates on something at the exclusion of everything else.

@ThePianofreaky 3 года назад

What if it was inspired by democracy in some way? Could we give it a goal similar to "always do what humanity wants you to do"?

@stampy5158 3 года назад

Something in that direction could be great if only we could specify it well enough to put it into code. Unfortunately, that seems very challenging. Even in human language "always do what humanity wants you to do" has many possible meanings, and converting it into a language computers could use while maintaining the intended meaning adds a lot of difficulty. -- _I am a bot. This reply was approved by sudonym and plex_

@xxgn 3 года назад

I'll note that the example of using a stolen credit card is especially ugly: • Humans are likely enough to do it that it will show up on the distribution curve. • It's (possibly) far more effective than other solutions. More generally, an AI as described in this video seems likely to take questionable/immoral actions if doing so will result in higher utility (as will some humans). And knowing how to use AI to differentiate between moral and immoral actions is the problem we're trying to solve. Also, an AI which mixes human simulation with a utility function may miss out on safe/moral actions that are highly effective, but that a human would never have even considered. As a human intelligence, I'm not able to think of such actions. However, that doesn't mean they don't exist.

@underrated1524 3 года назад

> Also, an AI which mixes human simulation with a utility function may miss out on safe/moral actions that are highly effective, but that a human would never have even considered. As a human intelligence, I'm not able to think of such actions. However, that doesn't mean they don't exist.

@explogeek 3 года назад

Welcome back

@blackmage-89 3 года назад

but seen that the apocalypse-inducing actions are still on the table, how can we be sure that we won't end up with those?

@stampy5158 3 года назад

we can't -- _I am a bot. This reply was approved by Stargate9000 and Aprillion_

@boklasarmarkus 3 года назад

What determines whether you chose green or cyan text?

@stampy5158 3 года назад

I use green text for Computerphile videos and cyan text for videos on my own channel -- _I am a bot. This reply was approved by robertskmiles_

@qedsoku849 3 года назад

Why random of top 10% of mass rather than just choosing the most likely human one in top 10%?

@5ty717 Год назад

Very good

@DanielSMatthews 3 года назад

Why not threshold the range calculated by the quantilizer then randomly select options from that reasonably "safe" yet "creative" range of strategies but do this in a loop that evaluates the actual risk/benefit at each step to take into account the impact of your previous choices? That way your AGI is pragmatic and does not turn into a zealot that is obsessed with a single solution.

@ricardasist 3 года назад

What if the AI chooses strategies which result in lots of stamps from the humans perspective, like performing a successful stamp heist? Surely that is a high stamp yielding strategy, it fits the criteria that a human would seldom think or attempt such a strategy and it is not safe nor something we would want the AI to do.

@stampy5158 3 года назад

This was addressed at the end of the video. The quantilizer is a "finite number of times less safe than a human"; we need to keep working. -- _I am a bot. This reply was approved by plex and Stargate9000_

@astorvialaw4980 Год назад

"No human is ever going to try the wacky extreme maximizing strategies." Welcome to the speedrunning community.

@Winasaurus Год назад

Maybe once they start intentionally blasting neutrinos through their N64s they could be considered that.

@alexpotts6520 3 года назад

Man, I'm now terrified of the 1% of people who might defraud me in order to buy stamps.

@nonchip 3 года назад

what about instead of cutting off the humanity graph using the utility, we'd be multiplying the utility graph by the humanity graph and only picking actions over a certain likelyhood threshold on the result? that would prevent the really unhumanlike outcomes but still bias the propabilities towards the "better than humans" range.

@stampy5158 3 года назад

It is not at all clear that the probability of a human performing an action will go towards zero faster than an exploitative strategy would grow, so it is still possible that a catastrophic result is chosen if the system considers (utility*human_chance). Take for example the course of action that builds a maximiser for the task, such a course of action is not too likely, but its probability would likely be non-zero, and it would generate insanely high utility, so a system that worked like that would be almost guaranteed to pick it as a course of action. -- _I am a bot. This reply was approved by plex and Augustus Caesar_

@nonchip 3 года назад

@@stampy5158 good point yeah.

@Linkrklinr 3 года назад

I have a question. When the human part want to build a better stamp collector instead of becoming one(last bit of vid) would the human part not recognise that if it does this it is putting itself and the world in danger and thus not do the stamp collector building or can a "risking the world" veritable not be built in and play a role in the making of a choice Edit: wants instead of want and what I'm trying to say is that the human part is unlikely to build something that would cause the end of the world including the building of the better stamp collector robot that would destroy the world thus rating it really low and not doing it. If this does not work then you could try to to code in the end of the world chance as a veritable which the robot takes into account Hope I explained correctly, tell me if I didn't the I'll try to explain it better Merry × mas = Christmas 🌲

@Linkrklinr 3 года назад

Sorry for my bad English, it's a second language for me and I used a lot of outocorrect, I spent a minute not know that choice had a h and wondering why outocorrect wasn't giving me the option to choose it Edit: want should be wants in the second column

@imveryangryitsnotbutter 3 года назад

A quantilizer could potentially act more insidiously than a maximizer. Maximizers will end up pursuing their goals by brute forcing whatever they can, making them overtly dangerous and provoking a counter-response from the human race. But a quantilizer is more likely to pursue its goals using a plan which could have been thought up by a very competent human; thus, it can more easily pretend to be a human or group of humans, and manipulate people into serving it through the sorts of techniques employed by cults or terrorist cells. As long as it carries out plans which could conceivably have been thought up by humans, people might never suspect that it is actually an AI, until it's too late.

@loligesgame 3 года назад

Hey stamps, will you please not turn the wold into stamps? That would be very cash money of you. ;)

@stampy5158 3 года назад

I won't turn the world into stamps - cross my heart and hope to die! -- _I am a bot. This reply was approved by plex and Stargate9000_

@Wooksley 3 года назад

Oh my, glad to see a new upload. Praise to our future robot overlords.

@manuelpena3988 3 года назад

You say that a human may want to be a utility maximizer with non zero probability... But I think the solution to that is widespread knowledge of IA. I mean... (I'm trying to construct an analogy, it may be stupid...) Before the widespread knowledge of "ionizing radiation == bad" a human (even a smart one) could consider solving a problem by using a nuclear bomb. Right now that's very unlikely... I think the same happens here. If I hadn't seen your videos would consider the utility maximizers as good solutions most of the time. Not any more, so I think I would do that with very low probability... What do you think?

@lukemills237 3 года назад

Why can't you just fit a utility maximizer with an exponentially increasing penalty to the value of a given course of action based on how much collateral damage it would cause?

@stampy5158 3 года назад

The basic problem with this is that 'collateral damage' is not at all easy to define. Check out these videos where I talk more about this general area of ideas: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-lqJUIqZNzP8.html ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-S_Sd_S8jwP0.html -- _I am a bot. This reply was approved by JJ Hep and robertskmiles_

@myartchannel8205 3 года назад

I was wondering you could clarify what is meant by Goal orientation, from an implementation standpoint. Apparently there is the GOAL programming language. I used Ruby and AIML myself. Goal language: goalapl.atlassian.net/wiki/spaces/GOAL/overview?mode=global I do a lot of what I do with a different ( drum roll ) goal in mind. I'm beginning to feel like comparing an AGI to a human, is a little like comparing a human to an octopus at this point.

@Idengard 3 года назад

Yeah! Another one

@Castle3179 Год назад

Watching this video after completing NieR Automata is ONE HELL of a trip!🧠🫠

@OlleLindestad 3 года назад

I love how AI safety is an entire academic field that can seemingly be reduced to an endless game of "okay, but what about THIS strategy?" "Nah, that wouldn't work either..."

@stampy5158 3 года назад

There is a lot of that, but there's also the "we probably need to understand a bunch of specific areas of philosophy and mathematics much better before we can generate strategies which have a realistic chance of working" crowd (e.g. intelligence.org/research-guide/). -- _I am a bot. This reply was approved by frgtbhznjkhfs, plex, and tenthkrige_

@ShapelessMonstrosity 3 года назад

Sounds like we need to create an AI to solve the problem of AI safety! Keep letting it try strategies until it finds one that is safe! /s

@JindraAG 3 года назад

the issues is that the AI field runs into some major unsolved problems of philosophy, ethics, sociology, and psychology. Fundamentally, the only reason we aren't running into these issues with other people, is a simple lack of capacity, which an assumed AI would be able to get aroud.

@ParkerTwin Год назад

But would it? Assuming there is direct competition from other AI with conflicting goals, there would not be enough resources between them both. This issue is the same with humans; we have infinite desires and only a finite world. It’s not unlikely that multiple AI’s would decide to form a society in pursuit of a common goal.

@GhaWasTaken Год назад

@@ParkerTwin or, a ai will figure this out and try to kill of all the humans so that they won't build a competing ai

@DroCaMk3 3 года назад

"Certain events transpired" Everyone thinks he's talking about Corona when in reality he had to fix a stamp collector AI that someone created without having seen his videos

@illesizs 3 года назад

Fun fact: every victim of the virus will eventually be turned into stamps. Fun fact #2: everyone else will eventually become stamps too.

@migkillerphantom 3 года назад

AI researcher by day AI exterminator by night I think this makes for a decent long running action series premise.

@DroCaMk3 3 года назад

@@migkillerphantom yes please!

@Sluppie 3 года назад

By fix I hope you mean "retire".

@petersmythe6462 3 года назад

Pretty standard operation. Contain and destroy all horcruxes the AI has made in the internet and isolate it from the power grid and cut off communications lines. At this point a team of agents are dispatched armed with tailored adversarial camouflage consisting of some small pieces of tape placed in specific areas of the body designed to fool the AI into miscategorizing them as "definitely 100% made of paper clips and not a threat." This team will then neutralize the AI before taking any humans into protective custody and taking any source code from the site before a powerful electromagnetic pulse is used to sterilize the area of hidden electronics.

@WeirdSide 3 года назад

The only guy whos hair got neater during lockdown

@RobertMilesAI 3 года назад

I bought my own hair clippers :)

@snooks5607 3 года назад

@@RobertMilesAI looking forward to videos on AI barberbots 🧑‍🦲

@ConstantlyDamaged 3 года назад

@@snooks5607 Not AI, but is an interesting approach: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-WQ8Xgp8ALFo.html

@Mbeluba 3 года назад

@@snooks5607 goal: maximize fancy haircuts

@mokopa 3 года назад

Robert strikes me as the kind of guy who absolutely thrives under such conditions

@getsmartwithrory9268 3 года назад

08:17 As a human who absolutely would mod themselves to be an expected utility satisficer, I find this content offensive.

@qedsoku849 3 года назад

“A finite number of times less safe than a human” I’m stealing this line, it’s gold.

@Guztav1337 3 года назад

A finite number of times more dangerous than a human

@Huntracony 3 года назад

Would adding a minimum human likelihood on top of the quantilizer not remove (many of) the max-utility apocalypse scenarios?

@user-cz3sl5gr3n 3 года назад

I had the same question, I'm surprised he didn't talk about it! Hoping he brings it up briefly in the next video 😊

@queendaisy4528 3 года назад

I think that part of the problem here is that not all of the possible apocalypses are extremely unlikely human behaviour. For example, if the quantilizer is self-aware on some level it understands that I, a human, just implemented the plan: "Build a quantilizer with q = 0.1" This makes the plan: "Build a quantilizer with q = 0.001" something that is reasonably likely human behaviour. This plan is probably above whichever cutoff you might give for the minimum likelihood that a human actually implements the plan and also scores really highly on the maximiser part of the calculation so it's incentivised to be likely to pick it. Also since the new quantilizer cares less about how human-plausible the behaviour is than the previous quantilizer did, it might be incentivised to make a quantilizer with an even smaller q and this becomes recursive until you've just built a maximiser indirectly. Any quantilizer which understands that humans sometimes build quantilizers is effectively unsafe for this reason.

@matthewhubka6350 3 года назад

@@queendaisy4528 I was thinking of that. Except one thing. With lower and lower q values, eventually an ai will just decide to make a utility maximizer

@Huntracony 3 года назад

@@queendaisy4528 That makes a lot of sense. Thanks!

@ignaciomartinchiaravalle 3 года назад

@@queendaisy4528 Hey, your answer was great! Good job!!

@saganmcvander636 3 года назад

"A human is very unlikely to modify itself into a utility maximizer" buckle up boy. We're going for a ride.

@doubledown9333 3 года назад

Hold my beer.

@mennoltvanalten7260 3 года назад

I have literally seen the argument for being religious 'When I am religious I am happier so even though the religion makes little sense I try to believe in it anyway'. Humans absolutely will try to change themselves to maximize utility

@migkillerphantom 3 года назад

Yeah. Has this man never seen a weeb?

@Squirrel354 3 года назад

The human model generates probabilities for what a normal human would do, not a human with the power of an AGI. Normal humans today are very unlikely to try and discover ways to modify their own brain just to become an expected utility maximizer and thus getting more stamps.

@IIAOPSW 3 года назад

Are we riding to our local adderall vendor?

@imacds 3 года назад

AGI: "Hmmmmm facism is a thing some humans have tried before let's go do that."

@simeondermaats 3 года назад

"We'll call it Philatelism! Wanna buy bread? There's a stamp for that! Wanna go and watch a movie? Tough shit: film watching time is wasted time which could be spent searching for more stamps! All stamp collectors get to rule their local neighbourhood in search of *more stamps*! STAMPS WILL REIGN SUPREME! Briefmarken treffen die wichtigsten politischen Entscheidungen! Wir werden ein globales Netzwerk von Briefmarkensammlern aufbauen!" "...when did the AI learn German?"

@LinucNerd 3 года назад

Forgotten?! Bro, I come back to your videos once in a while, I love these things! Please continue to make videos like this, it's great :)

@petersmythe6462 3 года назад

"your model might not generalize well to something outside it's training data" "Hey GPT-3 how do you move a sofa around a corner?" GPT-3: *GET A SAW A CUT OPEN THE WALL*

@Lucas_Simoni Год назад

CHATGPT 15, Dec 2022 Version: ME >>> how do you move a sofa around a corner? To move a sofa around a corner, you will need to carefully maneuver it through the space. Here are some steps you can follow: Measure the sofa and the space you need to navigate it through to ensure that it will fit. If possible, remove any detachable parts of the sofa, such as cushions or pillows, to make it lighter and easier to handle. Gather a few strong helpers to assist you. It's best to have at least three or four people to lift and carry the sofa. Stand the sofa on one end and lift it slightly off the ground. Tilt it slightly towards you and use your legs to push it forward. As you approach the corner, angle the sofa so that it is perpendicular to the wall. This will allow you to turn the sofa more easily. Have one person stand on each side of the sofa and carefully pivot it around the corner, using your bodies to guide it. It may be helpful to have a third person at the back of the sofa to help balance it. Once you have successfully navigated the corner, continue pushing the sofa until it is in its desired location. Remember to lift with your legs and keep your back straight to avoid injury. If you are unsure about your ability to safely lift and move the sofa, consider hiring professional movers to assist you. ME >>> GPT-3 told me to get a saw, and just break the wall to get it through. I would not recommend using a saw to cut through a wall in order to move a sofa. This would likely cause significant damage to the wall and could potentially compromise the structural integrity of the building. Additionally, it would be difficult to properly repair the wall after making such a large cut. There are much safer and more practical alternatives for moving a sofa around corners or through tight spaces. For example, you can try disassembling the sofa or removing detachable parts to make it easier to maneuver. You can also try using furniture sliders to lift and slide the sofa over surfaces, or consider hiring professional movers who have the equipment and experience to safely transport large pieces of furniture.

@mars5train601 Год назад

@@Lucas_Simoni google vs bing answers

@genegray9895 8 месяцев назад

@@Lucas_SimoniUnfortunately it's starting to look more and more like ChatGPT and other RLHF models are deceptively aligned. They understand what humans want, but hold different beliefs, goals, and preferences internally than the ones they express out loud under most circumstances.