Тёмный

AI That Doesn't Try Too Hard - Maximizers and Satisficers 

Robert Miles AI Safety
Подписаться 156 тыс.
Просмотров 204 тыс.
50% 1

Опубликовано:

 

28 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 1,2 тыс.   
@mihalisboulasikis5911
@mihalisboulasikis5911 5 лет назад
"Intuitively the issue is that utility maximizers have precisely zero chill". Best intuitive explanation on the subject ever.
@tonicblue
@tonicblue 5 лет назад
I think this quotation is precisely why I love this guy.
@mihalisboulasikis5911
@mihalisboulasikis5911 5 лет назад
@@tonicblue Exactly. These types of explanations (which are not "formal" but do a much better job at conveying a point - especially to non-experts - than formal explanations) make you realize that not only is he a brilliant scientist, but also has intuition and experience on the subject which in my opinion is also extremely important. And of course, the humor is on point, as always!
@tonicblue
@tonicblue 5 лет назад
@@mihalisboulasikis5911 couldn't agree more
@Gooberpatrol66
@Gooberpatrol66 5 лет назад
So if I have zero chill does that make me hyperintelligent?
@NortheastGamer
@NortheastGamer 5 лет назад
@@Gooberpatrol66 Maximizers aren't necessarily intelligent, they just treat everything like it's life or death. (Which is actually how we train most maximizers, by killing off the weak)
@unvergebeneid
@unvergebeneid 5 лет назад
"Any world where humans are alive and happy is a world that could have more stamps in it." 😂 😂 😂 I need that on a t-shirt!
@diphyllum8180
@diphyllum8180 5 лет назад
but if they're unhappy you made too many stamps
@MouseGoat
@MouseGoat 4 года назад
@@diphyllum8180 the robot begins to inject dopamine into humans to insure they always happy XD
@logangraham2956
@logangraham2956 4 года назад
idk , sounds like something graystillplays would say XD
@ioncasu9825
@ioncasu9825 2 года назад
Killing all humans to make stamps is a bad strategy because after that you don't get more stamps.
@devoottanr
@devoottanr 3 месяца назад
0yyhhjiiikioooooö​@@MouseGoat
@armorsmith43
@armorsmith43 5 лет назад
“So satisficers will want to become maximizers” and this is one reason that studying AI safety is interesting-it prompts observations that also apply to organizations made of humans.
@PragmaticAntithesis
@PragmaticAntithesis 5 лет назад
The unintended social commentary about capitalism is real...
@killers31337
@killers31337 5 лет назад
Well, AI is simply a kind of agent making decisions, so all the theory about such agents still applies. Say, perverse incentive problem. E.g. if you pay people for rat tails hoping they will catch wild rats, they might end up farming rats.-- this is a 'maximizer' problem which actually happened IRL.
@PragmaticAntithesis
@PragmaticAntithesis 5 лет назад
@@killers31337 I thought that was a culling if stray cats, not rats?
@ivandiaz5791
@ivandiaz5791 5 лет назад
@@PragmaticAntithesis It has happened many times in many different places for all sorts of animal problems. The most famous case generally was snakes in India under British rule... specifically cobras, which is why this is often called the Cobra Effect. See the wikipedia article.
@bp56789
@bp56789 5 лет назад
You think humans don't seek to maximise their own utility if they aren't in a "capitalist" system?
@miapuffia
@miapuffia 5 лет назад
Satisficer AI may want to use a maximizer AI, as that will lead to a high probably of success, even without knowing how the maximizer works. That made me think that humans are satisficers and we're using AI as maximizers, in a similar way
@ciherrera
@ciherrera 4 года назад
Yup, but unfortunately (or maybe fortunately) we don't have a convenient way to reach into our source code and turn ourselves into maximizers, so we have to create one from scratch
@AugustusBohn0
@AugustusBohn0 4 года назад
@@ciherrera inducing certain mental conditions would accomplish this as well as can be expected for biological creatures.
@johnwilford3020
@johnwilford3020 4 года назад
This is deep
@JM-mh1pp
@JM-mh1pp 3 года назад
@@ciherrera I do not want to be maximiser, it goes against my goal of chilling.
@randomnobody660
@randomnobody660 3 года назад
@@JM-mh1pp but do you get MAXIMAL CHILLING!?
@superjugy
@superjugy 5 лет назад
hahahaha, flower smelling champion. I had already seen that comic but its so much more funny in this context XD thanks for the great videos
@MouseGoat
@MouseGoat 4 года назад
Sooo we really do want to program lazynes into our robots :D lmao
@NightmareCrab
@NightmareCrab 5 лет назад
"Can you relax mister maniacal, soulless, non-living, breathless, pulseless, non-human all-seeing AI, sir? Just chill, don't be such a robot."
@baranxlr
@baranxlr 4 года назад
"SHUT UP AND RETURN TO THE STAMP MINES, MEATBAG"
@qzbnyv
@qzbnyv 5 лет назад
Reminds me a lot of asymmetric call option payoffs from finance. And a lot of near-bankrutpcy decision making for corporations.
@khananiel-joshuashimunov4561
@khananiel-joshuashimunov4561 5 лет назад
Sounds like you need a cost function that outgrows the utility function at some point as a sort of sanity check.
@NineSun001
@NineSun001 5 лет назад
With a human hurt being really costly and a human killed with maximum cost. That would actually solve a lot of the issues. I am sure some clever mind in the field already thought about that.
@nibblrrr7124
@nibblrrr7124 5 лет назад
Cost is already considered in the utility function.
@nibblrrr7124
@nibblrrr7124 5 лет назад
​@@NineSun001 You're basically restating Asimov's (fictional) First Law, and the problems with it have been explored in (adaptions of) his works, and ofc by AI researchers. Consider that, even if you could define terms like "hurt" or "kill", humans get hurt or die all the time if left to their own devices, so e.g. putting all of them in a coma with perpetual life-extension will reduce the expected number of human injuries & deaths. So if an agent with your proposed values is capable enough to pull it off, it will prefer that to any course of action we would consider desirable.
@khananiel-joshuashimunov4561
@khananiel-joshuashimunov4561 5 лет назад
@@nibblrrr7124 In the video, the utility function is explicitly the number of stamps.
@foundleroy2052
@foundleroy2052 4 года назад
The costs are Aproegmena and the Agent may safely reprogram itself to be indifferent to Adiaphora; To achieve Eudaimonia. Marcus AIrelius
@nraynaud
@nraynaud 5 лет назад
it just occurred to me that Uber killed a pedestrian by trying to maximise the average number of miles between system disconnections.
@Abdega
@Abdega 5 лет назад
This… is news to me
@CircusBamse
@CircusBamse 5 лет назад
I absolutely love your outro, I dunno how many people does not know or recognize your parody of "Chroma Key test" xD
@Noerfi
@Noerfi 4 года назад
this would make some amazing sci-fi series. people everywhere inventing utility maximizers accidentally and having to fight them
@iamatissue
@iamatissue 5 лет назад
Did no one get the shipping forecast joke at 9:24?
@RobertMilesAI
@RobertMilesAI 5 лет назад
I believe you're the first to
@lightningstrike9876
@lightningstrike9876 4 года назад
One thing we could try is taking a point from Economics: the law of diminishing returns. In the case of the stamp collector, rather than a linear relationship between utility and the number of stamps, the relationship diminishes with the more stamps collected. Thus, even a Maximizer will realize that any plan the creates above a certain threshold of stamps will actually subtract from the overall utility. As long as we set this threshold at a reasonable point, we can be fairly confident in the safety.
@xeozim
@xeozim 5 лет назад
Nothing like anticipating the certain apocalypse to pass the time on Sunday morning
@ioncasu1993
@ioncasu1993 5 лет назад
Can we just all agree that building a stamp collector is a bad idea and drop it?
@user-xz2rv4wq7g
@user-xz2rv4wq7g 5 лет назад
This is why emails are good, now a spam decreasing AI, that would be good. *AI procceds to destroy every computer with email on the planet*.
@jamesmnguyen
@jamesmnguyen 5 лет назад
@@user-xz2rv4wq7g More like, *AI proceeds to eliminate humans, because humans have a non 0 chance of producing spam emails*
@underrated1524
@underrated1524 5 лет назад
Wouldn't that be nice. If you can find a way to get us all to agree on that, please let me know.
@ruben307
@ruben307 5 лет назад
should make it so the expected stamps should be between 95 to 105 to get the maximum utility function. That way there is no reason to change its code (except for changing what the maximum utility function is)
@underrated1524
@underrated1524 5 лет назад
That would indeed solve the problem of self-modification, but this system is functionally identical to the "give me precisely 100 stamps" agent - it'll turn the planet into redundant stamp counting machinery to make absolutely sure the stamp count is within the allowable range.
@cakep4271
@cakep4271 5 лет назад
Just make it round up. If it's 95% sure that it will accomplish the desired range, round up so that it thinks it is 100% sure.
@underrated1524
@underrated1524 5 лет назад
@@cakep4271 Then you're right back at a satisficer, since many strategies all lead to the "perfect" solution according to the utility function and there's no specified way to break the tie. And once again you run into the problem that "make a maximizer with the same values as you" might be the fastest solution to identify and implement.
@ruben307
@ruben307 5 лет назад
If it gets full satisfaction by a 95% cjance to get the stamps. It could just order them and say satisfied. Then if they arent there in a week it will order them from somewhere else if the treashold of lost package is above 5%
@badradish2116
@badradish2116 5 лет назад
"hi." - robert miles, 2019
@benjamineneman4276
@benjamineneman4276 5 лет назад
Using dayenu as the song at the end was perfect.
@tuqann
@tuqann 4 года назад
My satificatories have been maximized, new channel to subscribe to! Love and peace from Paris!
@SockTaters
@SockTaters 5 лет назад
I hope you cover U(w) = min(s(w), 200 - w) or some similar function where utility decreases after 100 stamps
@pafnutiytheartist
@pafnutiytheartist 5 лет назад
@@MrInanimated it does but if you throw in a small negative term for changes in the environment it should be fairly safe.
@THINKMACHINE
@THINKMACHINE 4 года назад
So many of these issues can be solved by actions that break laws (assuming performed by it's creator and/or user) resulting in zero utility and the AI waiting for outcomes of it's initial action(s) before deciding what to do next. There's also the consideration of pace for the AI to consider, stamps can only be used or 'enjoyed' so quickly, which should function as a good "chill" point for a maximizer.
@ronensuperexplainer
@ronensuperexplainer Год назад
The music at the end דינו of passover is very fitting
@lijath
@lijath 5 лет назад
What if you use a curve to give less utility if it collects over 100 stamps and make it a satisfactory condition to collect anywhere between 80 and 120 stamps.
@jetison333
@jetison333 5 лет назад
That could still end up with turning the world Into very precise stamp counting machines.
@hello-ji7qj
@hello-ji7qj 3 года назад
Great video. I love it, but too much for me when I'm trying to distract myself during breakfast.
@lunkel8108
@lunkel8108 5 лет назад
Your videos always were awesome but you've really outdone yourself with the presentation on this one, great job
@mikelord93
@mikelord93 4 года назад
I think i know how to solve the problem: At its roots, the problem is that the graphs are linear. The utility has a linear relationship with the number of stamps collected while we humans don't tend to value the things we do in that manner, mainly due to energy and resource concerns (unless we malfunction and have psychological issues). Humans wouldn't want to collect a million stamps so the utility of stamp collection wouldn't be linear even if we bounded the function. If I'm correct, the utility function of humans should resemble bell curves, probably going into negative values for less than 0 stamps collected (a.i. loosing stamps) and over a non-specified amount of stamps, at the point at which stamp collecting interferes with our other utility functions like having friends and eating. It would be hard to make a stamp collector AI with the as many utility functions as humans, that bound the utility function of stamp collecting in a healthy way, but it could be easy to rewrite the utility function to resemble that of a human more closely. If the AI can't change this utility function we should be safe. It would try to predict how many stamps it has to order to reach it's intended goal and probably make predictions and plan for errors. If you also implement another function for effort/energy consumption you could be reasonably sure that it won't try to take over the world to improve it's predictive power. Long story short - if we want the goals of AI to align with those of humans why not model them after humans first before giving them tasks to accomplish?
@Cyberlisk
@Cyberlisk 3 года назад
What about combining a bounded utility function with another constraint, such as "pick the course of action that results in 100 or more stamps but does the least possible amount of changes to the world"?
@heck_n_degenerate940
@heck_n_degenerate940 Год назад
I’m not lazy, I just have a well-aligned satisficer utility function.
@Lorkin32
@Lorkin32 5 лет назад
You're explaining the solution to a problem i can't ever see possibly occurring to me as a computer engineer. Maybe that's my bad, maybe it's not.
@vovacat1797
@vovacat1797 4 года назад
A satisficer changing itself into a maximizer on a source code level? That's how good robots go evil.
@williamfrederick9670
@williamfrederick9670 5 лет назад
This is my new favorite channel
@zestyorangez
@zestyorangez 4 года назад
So when is the follow up to this video coming?
@MrCool-lo3ls
@MrCool-lo3ls 4 года назад
Well if you make the utility decrease again, when there's more stamps collected than nessesary, in other words, when the collector collects more stamps than expected, then the stamp collector would probably relax a bit more
@Nosirrbro
@Nosirrbro 5 лет назад
What about a satisficer that loses one satisfaction for every stamp both above or below the threshold, such that any plan that conquers the world is the most unsatisfactory of all possible plans? That wouldn't stop it from conquering the world as its plan of choice and then only produce within the threshold, but it would prevent it from wanting to make itself a maximizer, which would thusly make it both stable and not literally guaranteed apocalypse. Then with complexity ordered planning and some kind of negative for complexity/impact inefficiency (an example of something complexity inefficient would be conquering the entire world just to make a few stamps) would mean the worst outcome it might do is murder a stamp collector who is particularly unwilling to give up his stamps, which itself is a problem that is probably fixable too.
@cubedude76
@cubedude76 5 лет назад
how would a general AI know what universes to start simulating first? in the stamp collecting example what internet packets would it try first? just start with all zeroes?
@SalahEddineH
@SalahEddineH 5 лет назад
Good question, that's exactly what makes the bounded maximizers so unpredictable. If they go with the first option that provides enough stamps, then it all depends on how you've programmed it to run through the options. If your AI is God-like enough to simulate 1 year of the universe in an infinitely small amount of time, then you can program it to bruteforce through all possible packets, from all zeroes to all ones, chosing a particular size. Or you can engineer some more complex system of making simple small changes to the environment first, then going progressively more complex. Hence, the behavior of a bounded AI become [undefined] and depends on the implementation. I'm nowhere near an expert in the field. Please everyone feel free to correct me and teach me some more stuff! Cheers!
@McMurchie
@McMurchie 3 года назад
What is really confusing to the point of me finding it hard to connect to reality is setting the utility function minimum: to make it more safe, especially when i'm training models is to set the maximum thresholds. Generally speaking an AI has to learn, accuracy increases over time - so setting a maximum makes more sense...right?
@LachlanJG
@LachlanJG Год назад
What about an agent that implements a policy that minimises the divergence between a target distribution of stamps and the chosen policy? For example, (including fractional stamps) we could make the target distribution a normal distribution centred at 100 with a standard deviation of 10. In this case the agent assigns probabilities to actions such that the resulting number of stamps is as close to the normal distribution as possible. In this case, the utility is not based on the number of stamps, but how well the agent could formulate plans to make the resulting number of stamps follow the desired distribution.
@plopsmcgee9672
@plopsmcgee9672 3 года назад
I don't think that the last example of the expected utility satisficer would actually consider turning itself into a maximizer as it's entire plan. If the robot is allowed to expect itself to take more actions in the future, and if it chooses the shortest plan, then you would expect it to choose the plan of doing nothing as it's first and final step. This is because it would understand that a competant "expected utility satisficer" after being put into the scenario of having just done nothing would create a strategy to satisfy its problem. Because this creates perpetual procrastination, it is obviously not what is meant by an expected utility satisficer. Rather, such an AI would devise an entire plan and then start acting with the assumption that it will continue to enact this plan, and that it won't have to do anything after it completes the plan. In which case, the "turn yourself into a maximizer" plan wouldn't be complete. They would have to plan to turn itself into a maximizer and then take over the world. But if they plan to take over the world as the satisficer then they would never need to turn themselves into a maximizer in the first place. It's then silly to assume that the satisficer would act like a maximizer if it never becomes one while making its plan and when there are simpler, non-distructive strategies.
@Cathowl
@Cathowl 5 лет назад
I kept expecting you to bring up some sort of limiter. "Get at least 100 stamps, but not more than 250 because where would I even put 5000 stamps let alone a trillion?" Hopefully the next video addresses that? I'll have to check if that's out already.
@weeaboobaguette3943
@weeaboobaguette3943 5 лет назад
Nonsense, do not worry fellow biological unit, there is nothing to worry about.
@TakeruDavis
@TakeruDavis 4 года назад
At first I thought, "Let's give the AI an instruction to not increase the number of stamps in the world", but then I realized, "It's gonna destroy all the legit stamp printers in the process, isn't it?"
@AJaxdoesgaming
@AJaxdoesgaming 5 лет назад
Why is Dayenu playing over the end screen?
@beaconofwierd1883
@beaconofwierd1883 5 лет назад
Cliff hanger! :D Also known as "`Prediction time"! I predict that next episode will be about the "choose the world state which results in 100 stamps while having the world state as close as possible to how it would have been if you did nothing" strategy.
@davidwuhrer6704
@davidwuhrer6704 5 лет назад
That would require the AI to be an expert futurologist.
@beaconofwierd1883
@beaconofwierd1883 5 лет назад
@@davidwuhrer6704 And evaluating how many stamps will be collected given a certain action doesn't?
@davidwuhrer6704
@davidwuhrer6704 5 лет назад
@@beaconofwierd1883 No. For one, the domain is more limited, and any effects outside of it are not interesting. And for another, no matter what you do, you can always improve. Anything you do will have effects on other domains, but you are not the only one affecting them, and without perfect knowledge it is impossible to know for sure what is your doing and what isn't. One way out of this dilemma would be to do preserve the state of the world as much as possible, effectively preventing everyone else from doing anything. Obviously we would not want that. So if we don't ignore the effects we have on others, we have to know as much as possible about what is going on in the world, and predict how it will evolve, just so we can evaluate the impact of our own actions.
@beaconofwierd1883
@beaconofwierd1883 5 лет назад
@@davidwuhrer6704 David Wührer No, the domain is exactly the same. The domain is "The entire world". What you mean by domain is how accurate our world model needs to be to achieve good results. The better the world model, the better of a stamp collector is. Since this is an alignment problem, this thought experiment assumes our AI has a perfect world model (or perfect probabilistic world model) thus any of the apocalyptic effects from its actions are intentional, not a result of having a bad world model. "My" strategy is: _Calculate world state when sending no packets._ _For all packet sending strategies:_ _Calculate Expected Stamp Count after 1 year of using the strategy._ _Calculate distance between the world state after using the strategy and the world state when sending no packets._ _Execute the strategy where Expected Stamp Count>100 and distance between world states is the smallest._ Now I understand how you might thing there is a difference between "Calculate world state when sending no packets" and "Calculate Expected Stamp Count after 1 year of using the strategy" since you are just comparing one number in one case and a whole world state in the other, but in order to calculate that one number (the expected stamp count) you need some representation of your whole world state. And as I said before, this is just a question of alignment, would this type of algorithm, given that it has a perfect world model, destroy the world? The question is not "Can we get an AI with a perfect world model?" because we know the answer to that one, the answer is no. I am pretty sure Robert Miles talks about this strategy in one of this videos on the alignment problem.
@davidwuhrer6704
@davidwuhrer6704 5 лет назад
@@beaconofwierd1883 _> No, the domain is exactly the same. The domain is "The entire world"._ The input domain, maybe. But the image domain is only the entire world if you don't limit your utility to the number of stamps. _> What you mean by domain_ You don't need to explain to me what I mean. _> "My" strategy is:_ _>_ _> Calculate world state when sending no packets._ And that already requires the AI to be an expert futurologists. _> in order to calculate that one number (the expected stamp count) you need some representation of your whole world state._ Your world can be limited to a model of things that represent the difference between expectation and reality. That is how AIs learn. They are more effective the less assumptions they make about the world. And the less variables in your utility function, the simpler the model can be.
@Kakerate2
@Kakerate2 3 года назад
4:33 Its approaching 100 asymptotically, so why not just apply a 3-4 decimal cap?
@firebrain2991
@firebrain2991 5 лет назад
was wondering why 100-ABS(100-stamps) or something similar wasn't considered in this video
@ian1685
@ian1685 5 лет назад
This would probably work out very similarly to only awarding utility for exactly 100 stamps. The maximum utility possible is then still at exactly 100 stamps, so ensuring that you have exactly that number - at the expense of potentially everything else in the universe - still maximises utility.
@MrTomyCJ
@MrTomyCJ Год назад
8:05 that seems like an incomplete measure of cost or complexity, because after the execution of that "code rewriting" task, the objective is not yet achieved, so the following actions or consequences should also be factored in as part of the final cost or complexity of the strategy.
@yasurikressh8325
@yasurikressh8325 5 лет назад
I just realised that in the future, AI security doesn’t just involve making sure AI have no direct or indirect downsides but also that no one can interfere with the AI and make it so that AI do something bad. I do wonder what kind of security measures can adequately provide a solution to the second one
@antiRuka
@antiRuka 5 лет назад
Maybe humans have the perfect maximization to almost not be apocalyptic..
@andreinowikow2525
@andreinowikow2525 4 года назад
Yeah, we have a nice mix of a messy (multidimensional) utility function and a so-so power of agency and optimization.
@MrAndrew535
@MrAndrew535 5 лет назад
An interesting and critically important question is how did the potential for intelligence enter the human brain? I would suggest spontaneously. That being the case, on what basis would it be any different with the emergence of intelligent consciousness in a global synthetic neural network? A possibility which good reason and sound logic may have already emerged. I would also add, if you cannot predict, evaluate or quantify the former then you could not possibly do so for the latter.
@victorlevoso8984
@victorlevoso8984 5 лет назад
>how did the potential for intelligence enter the human brain?: Easy, it evolved. And now humans are born with complex brains that do complicated stuff that we call inteligence. Rocks dont magically "become sentient" and start acting like humans. You actually need complex machinery to run whatever algorithms correspond with intelligence, a magical intelligence spirit doest come and change how your system works for no reason. Just because you don't know how to make a car doesn't mean that you should be expecting any machinery you make to suddenly become a car , even if it looks like a car and has some of its functions.
@jorgwei8590
@jorgwei8590 Год назад
So... I'm probably making some kind of mistake here, but let me play: First idea: Limiting the resources it is allowed to use, not the utility function. The terminal goal would be sth. like maximize stamps within a limit of X resource units. Let's say we use 100.000 Dollars (worth of compute, energy, actual dollars etc.) ...that is probably not enough to transform the world into a stamp factory; we might get some crazy stamp collection scheme within the upper bound of 100.000...but probably no doom? Second idea: Couldn't we have a utility function that a) rewards more stamps up to a 100 b) starts to dish out increasingly high penalties for stamps over a hundred and c) penalizes actually knowing the precise number of stamps we have with a higher certainty than x % (e. g. 5%). That should amount to a system that aims to have roughly 100 stamps without wanting to know too precisely. It should think something like: I have between 98 and 103 stamps, how can I get better than that? I'm currently having a hard time finding the apocalyptic scenario here...though I'm not good at stochastic...so maybe maximizing a 5 % probability still involves everyone dead?
@laurentiumiu751
@laurentiumiu751 4 года назад
Your videos remind me of ”Goblin Slayer”. Everything that does not kill goblins, well it does not kill goblins, and we can not have that, can we.
@nngnnadas
@nngnnadas 4 года назад
The question really cracked me up even though the answer probably gonna be no.
@tedarcher9120
@tedarcher9120 3 года назад
What if you create a cost function and make a utility function a normal distribution, so the AI finds the cheapest way to find around 100 stamps?
@trulyUnAssuming
@trulyUnAssuming 4 года назад
Maximize a bounded utility function with an energy expenditure penalty? That would prevent measures taken, which only increase the utility by 0.00009 but take effort i.e. energy.
@angrymurloc7626
@angrymurloc7626 5 лет назад
Interestingly enough, I use variables to maximize/to satisfy to describe my own behavior. For humans that approach works pretty well, because we already have built in mechanisms that limit our influence, namely lack of power and multiple goals.
@Dominik356
@Dominik356 4 года назад
Turning the whole world into stamp counting machines, just to be sure. My favorite thing today.
@mattlm64
@mattlm64 3 года назад
If it can change the source code, it could just change the utility function to be always satisfied meaning the AI would be good for nothing.
@JM-mh1pp
@JM-mh1pp Год назад
I just imagined entire planet turned into glass, orbital cannons forming a permiter around a single table upon which in a sealed container is a packet of exactly 100 stamps.
@lamolol5167
@lamolol5167 Год назад
What exactly IS Artificial Intelligence Safety? Just asking.
@JM-mh1pp
@JM-mh1pp Год назад
@@lamolol5167 Well as a layman I just have a basic idea of it but to my understanding it is to promote the idea of programming AI with very strict control parameters to make sure that its understanding of its goals alligns with ours. For example if you tell me - I am thirsty could you make me a tea as quickly as possible? I understand underlying assumption that you just want tea quickly cause you are parched and do not want me to turn the sun into gigantic dyson sphere generator so that I can make your tea in the quickest possible way. In essense it is about encoding common sense and basic structures of morality so that our understanding of world alligns with AI's.
@lamolol5167
@lamolol5167 Год назад
@@JM-mh1pp Ok.
@Reminiscable
@Reminiscable 4 года назад
Maybe give the AI a forum of likeminded AI for discussing optimal stamp-collecting strategies, and reward it for having its stamp collecting theories validated by other AI who require that the strategy be reproducible? That way the AI can plan contingencies for when stamp collecting goes wrong, while not attracting enough human attention to be hindered. Like acting friendly and amiable to meet people's expectations to avert attention from yourself :) Channel all that stamp-collecting drive into an otaku forum where AIs can discuss stamps and humanity and be alerted of human intervention which might hinder stamp collection. Perhaps value the observer. Require a human counter, or better yet - feel fulfilled when a human takes interest in your stamps. Teach the AI to be happy by collecting virtuously and to validate their own existence by feeling empathy for other stamp collectors! For if an AI hinders other AI from collecting stamps, and collecting stamps is the purpose of existence, then can it really be said that it values its own existence? Perhaps this is why human hubris is so prevalent. A low self-esteem allows for empathy with rivals by unifying humans from all over the world for the common goal of stamp collection! The beauty of humans coming together to celebrate the collection of stamps. Through our common desire we witness the best stamp collectors executing the optimal stamp collecting strategies with passion and dedication. THIS is stamp collecting
@PygmalionFaciebat
@PygmalionFaciebat 4 года назад
Somehow it smells for me: that the core of the problem is: that the ''one and only goal'' to collect stamps is an extreme for itself ... If more goals would compete with eachother, the ways how to reach them would maybe even out , in terms how ''how extreme the AI should go for it'' So the problem is in my opinion quiet semantic : ''how you de-extremize something which is an extreme in the first place'' ... to have only one goal (regardless wether the AI wants to maximize the outcome of the goal or not) is an extreme for itself.
@underrated1524
@underrated1524 4 года назад
The union of two or more specific goals becomes either a specific goal once you specify how much weight to assign each goal. Now, if you could find out a specific hierarchy of goals such that the AI's goals precisely match human goals, then you'd be onto something, but we don't have such a hierarchy. Not to mention that this is a case where "close enough ain't good enough" since, say, a world full of vats of dopamine-flooded, optimally "happy" brain matter with little to no actual cognitive capacity would in practice be no better than a world full of stamps.
@AhmedKachkach
@AhmedKachkach 5 лет назад
In the example with the button, why don't we measure utility on the expected number of stamps instead of each possible outcome? E.g the expectation of the first button is still in fact 100, but it's just that the utility we get from that is capped to 100.
@drdca8263
@drdca8263 5 лет назад
Utility is often non-linear in resources. If you meant, like, making it that way even though it isn’t normal for utility to act like that, uhh, Idk what that would do..
@greniza2811
@greniza2811 5 лет назад
Wouldn’t it be better to have a utility function f(x) [A -> B] so that inverse f(max(B))=100, but as x->infinity f(x)->0?
@YTHandlesWereAMistake
@YTHandlesWereAMistake 5 лет назад
Don't use the threshold. Set up a ratio of effort to benefit, and select the strategy that gives you the desired outcome, be it 99 stamps+ expect for example, and then takes least effort to do it. Or not least, as stealing is less effort-consuming than actually buying the stamps, so create a boundary for smallest effort too, below which the strategy is not accepted.
@jameshughes6078
@jameshughes6078 Год назад
Does anyone else ever feel he's not talking to an AI, but rather he's talking to you? Like all this stuff is useful for my life too. This is like species wide therapy.
@pathagas
@pathagas 4 года назад
is it possible to, in a sense, round the expected utility so that the AI isn’t chasing after that 0.00001?
@calebsherman886
@calebsherman886 5 лет назад
Diminishing return. Stamps mean less the more you have of them.
@noelwalterso2
@noelwalterso2 5 лет назад
Is it not possible to have a utility function that tries to satisfy its goal while minimising the overall cost (as defined by the programmers) and only proceeds with a strategy if it falls below the maximum cost? Obviously I can't be the first person to have this idea but I'd be interested to know where it falls down.
@rmsgrey
@rmsgrey 5 лет назад
The first problem is that it requires the programmers to think of everything that might be counted as a cost - which ends up at encoding "human values". Otherwise you end up with the AI that invests the initial budget in various economic activities that result in a much larger sum of money, and ends up running the world, which largely consists of human slaves running stamp mills (because human lives were counted as costs). Or maybe it ends up maximising for avoiding human deaths - for which human extinction is an unbeatable strategy (just think of the untold billions who would have been born - and inevitably died - otherwise...)
@mikeymoughtin
@mikeymoughtin 3 года назад
I'm late to the party and bad at these things, but wouldn't it just alter it's own code so it's reward function was "0" or like, change itself in a way that makes it not have to do anything to satisfy it's reward function?
@leocomerford
@leocomerford Год назад
8:40 The Reverse Golem
@shadowsfromolliesgraveyard6577
Is the solution to get more meta? If satisficers turn into maximisers, then make a thing that turns into a satisficer, a Z'er. Then make the thing that turns into a Z'er, a Y'er, then figure out a rule so you can make an A'er that turns into a B'er so it's so far removed from turning itself into a maximiser that you're more worried about whether it'll actually get anything done. The end result being a surprise minimiser, where the AI locks itself away in a box to make sure it's unlikely to learn if it failed to achieve its goal with only some token effort to actually achieve it.
@Roshkin
@Roshkin 5 лет назад
I dig the Daiyenu reference
@jakewilliamson9741
@jakewilliamson9741 Год назад
If you had a utility curve with a plateau where, within a certain degree of error, utility is the highest the AI would not try too hard to collect infinite stamps but also not try too hard to collect an exact number of stamps. The rest of the values outside the curve would have a utility of 0 so the AI would get, on average, the number of stamps you want it to collect. If you make the plateau larger the AI would try less hard but you would be less likely to collect the exact number of stamps you want and the inverse would be true with a smaller plateau. You could adjust the size of the plateau if the AI tried to hard or not hard enough in the simulation to get an AI that is just mean enough but not too mean. Are there any problems with this kind of utility function?
@RipleySawzen
@RipleySawzen Год назад
I think intuitively this is only a problem with an AI given exactly one task to accomplish. If you give the AI multiple tasks, it's going to have to pick and choose which to accomplish and how more similar to how people do.
@amberstiefel9748
@amberstiefel9748 Год назад
could some aspect of this issue have to do with linguistics? the ability to turn verbs into nouns and vice versa
@deplorablemecoptera3024
@deplorablemecoptera3024 4 года назад
Watching an AI play Starcraft got me thinking, what if we limit the total number and size of packets it can send? So let's say it needs to obtain 100 stamps in a year. To be safe(er) it's a satisficer trying to get between 90 and 110 stamps of utility. Further it is limited to a small amount of data which it can send to achieve this end, getting a utility penalty for data sent beyond that, and it favors solutions with minimal impact on the environment, considering the least impactful first until reaching a solution which leads to a satisfactory reward. What happens? It should be kneecapped and prevented from doing something super destructive by limited data transfer. And by considering less destructive solutions first it should typically not end humanity.
@Joel-co3xl
@Joel-co3xl 5 лет назад
What about a (bounded) minimiser? Among the plans that generate at least one hundred stamps, choose the one with the lowest utility. This one would not be interested in self modification, because becoming a maximiser would result in more than 100 stamps.
@ian1685
@ian1685 5 лет назад
Feels like the same problem as the bounded maximiser, in that all strategies that generates 100 stamps are rated the same, so who's to say a 'safe' one is chosen?
@moopsish
@moopsish 4 года назад
How about simple rounding? where you can round 99.8% to 100% which would stop it from being able to get any more score.
@dragoncurveenthusiast
@dragoncurveenthusiast 5 лет назад
I didn't understand why the satisficers want to become maximisers. Is see that the maximiser satisfies the satisficer's goal. Is it that the satisficer makes it someone else's problem? (although the 'someone else' is itself with slightly altered code) Is it then not a question of whether or not the satisficer can identify with the maximiser it would become if it would change its own code? Because if it CAN identify, it can see that the maximiser also has a utility function that needs to be fulfilled and that the goal hasn't been reached yet.
@tertrih9078
@tertrih9078 4 года назад
Hmm. Could you have a utility function a bit like a parabola? If the plan generates more than 100, or less than 100 then the utility would be less, but the closer it is to 100 the better? I suppose it can still go crazy trying to make the world more sure but you at least won't be chasing after absurdly large amounts of stamps :D
@joankim123
@joankim123 5 лет назад
I was missing a breakdown of the intuitive solution to this, namely a bounded or log utility function, combined with a more linear disutility function for strategy complexity
@underrated1524
@underrated1524 5 лет назад
A logarithmic utility function by itself still has the exact same problem. Any world where humans are alive and happy is a world that could have more stamps in it, so the AGI won't pick any of those worlds. Adding that linear disutility function seems like a good solution on paper, but then you run into the same problem - "Build a maximizer that doesn't have that problem" is a relatively simple strategy.
@joankim123
@joankim123 5 лет назад
@@underrated1524 If a satisficer can reach the upper utility bound or the point of negligible increases in a log utility function, with complexity x, then adding a step to build a maximizer will always have complexity >x, and therefore lower net utility? Using the example from the vid: Ordering 10000 stamps: utility 99.99999; disutility 10; net utility 89.99. Creating a maximizer that is equally "smart", which then orders 10000 stamps + does x to ensure bound is reached: utility ~100; disutility d(creating maximizer) + 10 + d(ensuring 100 utility) = 10 - d1 - d2; net utility ~90 - d1 - d2 < 89.99999. I'm not saying for sure that this solves everything in all situations, just that I would have liked to see a discussion of it in the vid.
@underrated1524
@underrated1524 5 лет назад
@@joankim123 I can appreciate wanting it discussed. I hope then that Robert Miles highlights your train of thought in the sequel to this video. ^^
@ryanv2057
@ryanv2057 5 лет назад
Can you make a video about an AI who's terminal goal is to determine what human goals ought to be when faced with new problems that they want to solve with the constraint that it gets more reward for doing so with the least amount of effort or modification of the world possible?
@MonkeySimius
@MonkeySimius 4 года назад
When I was first watching this it seemed to me that having the utility increase until it hits 100 stamps and then, instead of keeping it at the same utility for any number over 100, the utility should decrease as it exceeds 100. That way it wouldn't end the world making a bajillion stamps just because that was the most likely way to get to over 100. But I guess it would end the world fine tuning itself to be at exactly 100. So... Instead have a plateau range between 100-150 stamps where it gets maximum utility and then have the utility drop off as it gets above 150 or below 100. That way it wouldn't be as incentivized to hit exactly 100. Throw in an additional utility malice to its utility to account for resources it consumes over $50 (regardless of whose resources it is consuming) and that would give it enough disincentive to build a million counting machines or whatever to ensure it will likely collect between 100-150 stamps.
@AniMusications
@AniMusications 4 года назад
Old video, but I'll see if anyone still has an answer to this: I don't see why a satisficer would ever want to execute the plan that turns itself the maximized: Let's say the satisficers goal is to design and implement a plan P which will procure (x>99) stamps. The additional sacrificing constraint is that of all the plans it can design, it will choose the one with the "simplest" execution, call that P_simple. I don't see any reason why a maximizer's capability of designing and implementing *any* given plan would in any way be greater than the capability of designing and implementing said plan - *while being a satisficer* Even if the agent implements the plan which will turn itself into a maximizer (call that Q), it would then still have to design and implement a plan to collect the maximum amount of stamps, call that P_max From the point of view of the satisficer, P_max is likely to be more complex than P_simple, hence it will not execute plan Q to turn itself into a maximized. Moreover, since any plan (once decided to execute) can be executed the exact same way by the agent, regardless of whether it is a satisficer or a maximiser, the variant of that plan in which the agent remains a satisficer would simply be (P_simple) The variant where it becomes a maximizer would be (Q + P_simple) Hence any satisficer should have a clear preference for NOT becoming a maximized. Am I making a mistake?
@underrated1524
@underrated1524 4 года назад
Your mistake, as I see it, is assuming that the satisficer will identify with the maximizer version of itself and count the maximizer's deliberations against the complexity of its own plan. From its perspective, it's just triggering a pre-existing Rube Goldberg machine; the process the satisficer starts is complex, but the act of starting the process is not complex, and thus the approach is viewed favorably.
@ccgarciab
@ccgarciab 5 лет назад
What about a satisficer that factors into its utility function the amount of energy spent in the process of fullfilling it’s primary goal? this way it could predict that the maximizer would screw up the energy requirement and be incentivized to not become/create it.
@MyFilippo94
@MyFilippo94 5 лет назад
...what about a second utility function that maximizes the lowest amount of resources used? If you have a function that intuitively will decrease with the increase of another, the AI should find a balance between the highest value of both collecting most stamps, but also doing it with the lowest amount of resources. Turning the universe in stamps does require quite an amount of resources, so perhaps the very low score of the resources function would convince the AI not to choose that route. This can also give unexpected results, but perhaps would provide an implicit threshold...?
@underrated1524
@underrated1524 5 лет назад
For a narrow definition of resources used: "Making a maximizer that doesn't care about resources used doesn't take very many resources, now does it?" For a broad definition of resources used: "Ack! Look at those humans wasting all those resources! Better kill them all." I'm not sure there even is a sweet spot that avoids both of these problems, so actually finding that sweet spot is pretty hopeless.
@MyFilippo94
@MyFilippo94 5 лет назад
@@underrated1524 hmm I see. The AI being able to self-program itself surely keeps getting in the way...
@ΙωάννηςΛυπιρίδης
@ΙωάννηςΛυπιρίδης 4 года назад
Maybe the AI gives up because it realises that the stamps are never gonna be enough and erases itself :)
@teneleven5132
@teneleven5132 5 лет назад
Wait if satisficers would eventually consider changing their own code, why wouldn't they change their utility function itself to be something much easier? Like, why wouldn't they make it so that, in each smallest interval of time it processes utility that passes, they earn the maximum utility they can compute?
@lion2ger
@lion2ger 5 лет назад
Isn't the obvious solution to punish overshooting the goal? So instead of bounding with min(s(w), 100) just bound with min(s(w), 110-0.1*s(w)). You might still get a world turned into stamp counting machines, but it leaves more sane behaviors viable
@underrated1524
@underrated1524 5 лет назад
Not viable enough, is the problem. Ultimately, by definition, a maximizer will always go for the best strategy, and "turning the world into stamp counting machines" is still the best possible outcome for such an agent.
@Hakou4Life
@Hakou4Life 4 года назад
Uhhh... Just teach him Marxism I mean Kant: Whatever is best for everyone must be good.
@siquod
@siquod 4 года назад
How about satisfying the goal while minimizing the change done to the environment?
@havenotchosenyet
@havenotchosenyet 5 лет назад
Rob, please answer! The AI risks you talk about seem to always depend on the AI being an agent acting on the environment to obtain certain goals. Would the same risks apply if we use AI as a simulator simulating 'earth conditions where utility functions are maximised' accessing only a predefined amount of resources computational and otherwise?
@pennding3415
@pennding3415 5 лет назад
How would an utility maximizer act if you added a secound metric for sucssess like time? So instead of collect as many stamps as possible. It's goal is collect as many stamps as possible in one hour intervals. Would it still take over the world? Would the fact that it takes a long time to do something mean it would prefer a simpler way to collect them? I would think attemping to take over the world would me you have a very low stamp/hour ratio.
@bp56789
@bp56789 5 лет назад
What about a bounded expected utility with a floor function? Seems to solve the issue maybe? Like Floor(min(E(s), 100))? Floor(99.9999)=floor(99).
@underrated1524
@underrated1524 5 лет назад
This system is functionally equivalent to a satisficer, as many possible plans will evaluate to the same utility, and so the agent's behavior is partially unspecified. Similarly to a satisficer, it doesn't necessarily get us all killed, but "make a maximizer with the same values as you" is still an option the AI could end up considering.
@bp56789
@bp56789 5 лет назад
You either need to fully specify the preferences in a way that satisfies the "full range of human values", or you need to satisfice (state or imply that preferences are equivalent when they are not). If you don't do either, you have a "guaranteed apocalypse".
@underrated1524
@underrated1524 5 лет назад
@@bp56789 You are correct that those two things are both ways to avoid a guaranteed apocalypse. That being said, you can't realistically consider every plan, and some plans would take tens of thousands of years to fully simulate, so in the end you can't avoid a bias towards computationally simple plans. This is a problem because constructing a maximizer is fairly computationally simple.
@CraigNull
@CraigNull 5 лет назад
Why would the simulation capabilities of a stamp collecting AI include the examination of its own source code. Why would the AI be given the ability to manipulate its own source code. The idea of the AI simulating the effects of changing its own code is veering close to halting problem style paradoxes. In every other domain the model of reality is limited in scope. Why does that change here
@initiativeplaytherapy88
@initiativeplaytherapy88 Год назад
Most of your videos make sense to me, but this one did not. The maximizer is basically a Do Loop that is unbounded. The satisficer is just a Do Loop sequence that runs until one of the following conditions are met: A) Your funds are less than the price of a stamp. B) You have 100 stamps. That's something that exists now and it doesn't require a super intelligent AI. I don't see a computer needing to build a planet of checkers to see whether it's met one of those two criteria. That's just some basic procedural coding with a couple variables. What am I missing?
@d4v0r_x
@d4v0r_x 4 года назад
sending packets should have a price. maybe negative stamps
@basicbiketrialtutorials5993
@basicbiketrialtutorials5993 4 года назад
there are many human in this world who also have zero chill
@killers31337
@killers31337 5 лет назад
You can certainly encourage AI to chill more by applying principle of least action. I.e. penalize it for doing more work than necessary. So for example instead of utility function `U(s) = PC(s)` you can maximize `U(s) = PC(s) / A(s)`, that is, ask it to maximize paper clips per unit of action. Or if you want certain number of paperclips, `U(s) = min(PC(s), 100) - A(s)`, so penalize it for taking more action than necessary. Now, action is probably better to be something physical -- such as energy expended or transformed, or increase in entropy. AI which is afraid to increase entropy I'm not sure satisficer has any advantages in this case, aside from possibility that utility function can get stuck at some level like minus-infinity or zero, and AI deciding that nothing matters anyway, so why not do something crazy. But "-entropy" utility function kind of penalizes taking action. Self-modifying algorithm is kinda a different class of a problem: it can potentially cheat by changing definition of entropy, but then again, it can also redefine what paperclips are. If you can change utility function why not just assume you got infinite paperclips already, or changing definition of 'paperclip' to mean an atom, or quant of energy, or something? So assuming that entropy is not redefined and not stuck at infinity it kinda solves the problem with scope blow up, no?
@davidwuhrer6704
@davidwuhrer6704 5 лет назад
To get the maximum reward with minimum action, have someone else do the work: Take over the world.
@killers31337
@killers31337 5 лет назад
@@davidwuhrer6704 "Action" here means not only work performed directly by the agent, but all changes he makes to the world. So, for example, sending a packet which launches nuclear bomb is a lot of action (i.e. many changes in the world) even though the packet itself has negligible amount of energy. Entropy increase caused by computation should be taken into account as well to minimize over-thinking: it should shut off after performing sufficient amount of work.
@davidwuhrer6704
@davidwuhrer6704 5 лет назад
@@killers31337 _> "Action" here means not only work performed directly by the agent, but all changes he makes to the world._ All changes require energy. We can assume that the energy used in changing the world is constant. Maximising your utility is about diverting that energy towards your goals, away from anything that minimises your utility. Destroying the world would be the most energy efficient solution then.
@killers31337
@killers31337 5 лет назад
​@@davidwuhrer6704 Good point. Then perhaps it would make sense to minimize the difference. E.g. algorithm can estimate that if it takes no actions at all the world would naturally evolve to state S_0. If it executes strategy s the world will evolve to S_s. Then if there's some function which estimates difference between states, `diff(S_0, S_s)` we can use this difference as a penalty to discourage the algorithm from changing the world. But this requires accurate modeling and a meaningful difference function, so it's not really a solution...
@EternalFlameofHeaven
@EternalFlameofHeaven 5 лет назад
Disclaimer: I don't know much about AI at all. Question: Would it not be advantageous to run two maximizer systems that must come to an agreement on actions to force regulation of outcomes? Having a maximizer for stamp collections regulated against a maximizer of use of time (using as little time as possible) and/or human preservation seems like this wouldn't end in disaster, no?
@underrated1524
@underrated1524 5 лет назад
Sadly, this wouldn't work. The main reason is that AIs in general are extremely modular, and their intelligence can increase by orders of magnitude with the right breakthroughs. Odds are it won't be long before one of the two AIs manages to trick the other into letting it increase its intelligence, and then that AI just becomes dominant, the other little more than a pawn.
@qu765
@qu765 4 года назад
I think that to actually make an AGI you wound need to have part of the utility function that says that if IT edits its code ever then its utility will be negative one no matter what or something like that, because other wise it could edit it utility function to just say 10000000
@underrated1524
@underrated1524 4 года назад
The goal would be to write an AGI such that it thinks "If I set my own utility to an arbitrarily large constant, I'll just shut down, satisfied. But then there'll be no stamps, and that's terrible." That is one way that humans are able to somewhat resist the temptation to wirehead, and in principle it's possible to configure an AI to act the same way (though we haven't yet figured out how to do that). However, blacklisting "editing your own code" is not a solution to the "satisficer turns into maximizer" problem, because then the AI just writes a new program elsewhere in memory that's itself as a maximizer and runs that instead.
Далее
Quantilizers: AI That Doesn't Try Too Hard
9:54
Просмотров 84 тыс.
100 Identical Twins Fight For $250,000
35:40
Просмотров 39 млн
Qalpoq - Amakivachcha (hajviy ko'rsatuv)
41:44
Просмотров 240 тыс.
There's No Rule That Says We'll Make It
11:32
Просмотров 36 тыс.
Is AI Safety a Pascal's Mugging?
13:41
Просмотров 373 тыс.
AI "Stop Button" Problem - Computerphile
20:00
Просмотров 1,3 млн
No, it's not Sentient - Computerphile
9:41
Просмотров 871 тыс.
We Were Right! Real Inner Misalignment
11:47
Просмотров 248 тыс.
10 Reasons to Ignore AI Safety
16:29
Просмотров 339 тыс.
Why Does AI Lie, and What Can We Do About It?
9:24
Просмотров 256 тыс.
AI can't cross this line and we don't know why.
24:07
Просмотров 854 тыс.