AXRP

50
21 002

Подписаться 757

AXRP, pronounced axe-urp, is the AI X-risk Research Podcast. On this podcast, I (Daniel Filan), have conversations with researchers about their papers. We discuss the paper and hopefully get a sense of why it’s been written and how it might reduce the risk of artificial intelligence causing an existential catastrophe: that is, permanently and drastically curtailing humanity’s future potential.
Patreon: www.patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast

Комментарии

@damian_smith 5 дней назад

Superb. We've got a lot of catching up to do. Or a lot of changing the designs of the models to be interpretable.

@Ikbeneengeit 6 дней назад

Interesting topic, could you avoid so much jumping between subjects? Sometimes I need to listen for minutes before I know where you're going with your train of thought.

@adomtatam 8 дней назад

Excellent interview. Keep up the good work! Thank you for the quality content!

@marsdelver8349 11 дней назад

Awesome interview, but I think your audio mixing is off on the Spotify version. I was trying to listen to this in my car and had to crank my volume to the max to barely hear your voice

@axrpodcast 11 дней назад

Thanks for letting me know - I'll try to fix this.

@Dan-dy8zp 4 месяца назад

Except in the unlikely event you actually have the 'optimal policy' properly defined correctly on the first try for the AGI, aren't you . . . *done* ? You can just point to that. You probably won't actually have it properly defined though, because how do you do that (without some risk of getting it wrong)? I feel like defining what we really want is one of the hardest issues in AI.

@dizietz 4 месяца назад

Any other interesting work to recommend on the idea that our senses/control mechanisms are both generative processes and predictive processes? I also had some more to add on the topic why there isn't convergent behavior that might seem obvious. Like Jan mentioned, there might be local environment state differences, prediction optimization on a longer timescale has potential errors, etc. But there are also potential 'hardware' and 'software' differences as well -- humans don't run on completely homogenized brains, and it's possible to imagine that the initialization of various weights in our brains are randomly distributed that yields different outcomes.

@axrpodcast 4 месяца назад

Nothing I can think of beyond what's in the episode notes, I'm afraid.

@AnalyzeR84 2 месяца назад

Sometimes Guinness record in likes is needed to explain what's theory like.

@spaceprior 4 месяца назад

This was great to hear. Aside from the volume, which was too low. I had to max out my phone volume and it still wasn't quite high enough. I guess it's fine for me on closed back headphones, but anything would be.

@axrpodcast 4 месяца назад

Thanks for the feedback - the latest episodes are mixed to be somewhat louder, which should hopefully help.

@CatOfTheCannals 4 месяца назад

"not doing mech interp would be crazy"

@dizietz 5 месяцев назад

This one was pretty technical for those of us that haven't read some of the foundational work for SLT. I had to stop and look up some specific details later, and still don't feel like I fully grasp what makes SLT different than other predictions about degeneracy and simple functions preference in terms of making predictions about nn behavior. David's framing of fundamental structures in the data being more important across any training runs makes a lot of sense, I still don't grok how this helps with alignment. I suppose understanding stability of structure moves us closer, but both on something similar to interoperability but also on capabilities.

@nowithinkyouknowyourewrong8675 5 месяцев назад

Love it, I've run CCS and I learned a lot. You guys had a tranquil vibe too

@nowithinkyouknowyourewrong8675 5 месяцев назад

10ins in and I still don't get why it's interesting? like it's a math stats tool that he is still making that will enable us to do other things

@akmonra 5 месяцев назад

Just a few minutes in, and he gets the basics of low-rung adapters completely wrong. Starting to wonder how much he actually understands.

@tylertracy965 5 месяцев назад

In what ways? Could you provide examples so future listeners can understand them correctly.

@teluobir 5 месяцев назад

When you think about it, you sum all the "like" and they take 105 minutes in this video, the "yeah" take 40 minutes, and the "um" take about 15 minutes… Take them off and you'd have a much more digestible video.

@tylertracy965 5 месяцев назад

Unfortunately, some of the best researchers out there aren't the most fluent with speech. It didn't distract from the overall conversation for me.

@turtlewax3849 5 месяцев назад

Who is going to secure you from yourself and the AI securing you?

@matts7327 5 месяцев назад

This is a really nice deep dive not only on AI, but security and the state of the industry in general. Bravo!

@axrpodcast 5 месяцев назад

Thanks :)

@dizietz 5 месяцев назад

I've been loving this new stream of content on spotify during long drives! Daniel you are pretty well up to date on papers generally, I am always impressed.

@axrpodcast 5 месяцев назад

Glad to hear you like these :)

@MikhailSamin 6 месяцев назад

If a factory near my house produces toxic chemicals that have a 1%/year chance of killing me but doesn’t affect my health otherwise, is it possible for me to sue the factory and have a court order them to stop? (In Europe, in this situation you can probably sue the government if they don’t protect you against this, and have the ECHR or a national court order the government to stop the factory from doing it and possibly award a compensation.)

@dizietz 6 месяцев назад

Thanks -- it's been a while since AXRP released something!

@axrpodcast 6 месяцев назад

Alas it's true - but hopefully it won't be as long before the next episode :)

@OutlastGamingLP 6 месяцев назад

"Even if the stars should die in heaven Our sins can never be undone No single death will be forgiven When fades at last the last lit sun Then in the cold and silent black As light and matter end We'll have ourselves a last look back And toast an absent friend" Sorry. Feeling angsty about the world today. I had a friend in highschool who I'd sometimes complain about my problems to. She'd always say the same line in reply, and I couldn't argue. "Well, do better." That meme stuck around in my head. "Do better." It's a weird place to be. "Oh, these lab leaders think there's a 20% chance of doom." "So they haven't ruled out doom?" "Well, no, they just think it's unlikely." "I wouldn't call 10-20% 'unlikely' when we're talking about 'literally everyone dies and/or nearly all value in the future is irrevocably lost,' but okay, why do they think its possible but less likely than throwing heads in a coin flip." "Well, they don't really explain why, but it's something like 'human extinction seems weird and extreme, and while they can imagine it, they feel much more compelled by other grand and wonderful things they can imagine' - at least, that's the vibe I get." "Annnnd we don't think there's some kind of motivated cognition going on here? I think people buying lottery tickets are also imagining very vividly the possibility of them winning, but that doesn't make them right to say whatever % they feel intuitively." "They'd say AI is more like the invention of agriculture than a lottery. Like, maybe you make some huge foreseeable mistake and cause a blight, but if you have some random list of virtues like 'common sense' or 'prudence' or 'caution' then you'll probably just make a bunch of value." "I think Powerball is a good metaphor. Let's take features of the universe we'd all want to see in the future and tag them with a number. We then play a million number Powerball and hope each one of those numbers we chose show up. What are the odds that will happen? 80%?" "This sounds like a wonderful argument on how to reason about a specific kind of uncertainty, but people don't want to reason about uncertainty, they want to reason about how their most convenient and readily actionable policy is actually totally fine and probably not going to be an unrecoverable catastrophe." "Well, they should do better." "I appreciate the sentiment, though I would like to note that in this case this has to be nearly the largest understatement in the 13.8 billion years of the universe." "Here's another: I'm pretty bummed out about this."

@OutlastGamingLP 6 месяцев назад

"Is there any strategy these models can deploy that would allow them to cause problems." Has anyone in the Black Dolphin prison ever managed to kill a guard or another prisoner? Not sure, but I'd guess probably 'yes.' And those would just be humans, not even particularly selected for their intelligence, just selected for their 'killing intent' and prior extreme bad behavior. An AI that's as smart as our civilization working for an entire year per hour of IRL computer runtime will find any medium bandwidth channel of causal influence more than sufficient to destroy the world. Even if you give it a 1 bit output channel and iterate on asking it "yes/no" questions, that probably adds up to a lethally dangerous level of causal engagement with our universe eventually. Even if you reset it after every question, 0 online learning, it can probably guess its position in the sequence if the input contains enough deliberate or accidental shadows of the intervention the prior instances of the system have done. "Safe no matter what" sounds great, but it's like saying some product is "indestructible" - well, you're failing to imagine the maximum forces that can be brought to bare on the object. Specifically, a sheer and strict 'whitelist' policy is only as safe as your ability to predict the consequences of every action you whitelist, and if you could predict all of that, then the AI is no better than a Tool Assisted Speedrunner program or a manufacturing robot. It can precisely and quickly do only as much good as humans could do slowly and less precisely. As soon as you're getting "superhuman" you need something that does superhuman-level human-value alignment. Your merely human-level control/safety techniques will be insufficient to cover that wider space. You've got a relay in a circuit that's meant to carefully switch off a power supply when there's a surge, and it looks super safe and reliable, since you can prove it successfully activates and breaks the connection even up to the level of the capacitor banks that fire inertial confinement lasers. And yet, in practice, the surge comes through, the relay flips, and there's a white hot arc through open air as the electric field shreds air molecules into plasma, and the energy grounds out in the delicate machinery - now molten slag - you had downstream of that relay. That jump through open air is the problem. That's what outside the box is pointing to. Good luck constraining safe operation beyond the box, when you can't see that region in advance, and if you aren't trying to go outside of the box, then why the hell are we even trying to build this stuff.

@mrbeastly3444 6 месяцев назад

22:34 "roughly human level"... Ok, but even if this works, what are the odds that AI Labs will only use "roughly human level" AI Agents? E.g. Rather then "the best AI agents available". It seems likely that "roughly human level AI" will only be "roughly human level" for weeks or months, when they are then replaced with 2x or 10x versions. Even if you were able to contain a "roughly human level AI agent", this could be a very temporary solution? Would "roughly human level AI agents" be able to safely do any useful testing and alignment work on an ASI level model? Doing alignment work (goal and intention testing, capability testing, red teaming, even containment) on an ASI would likely require greater then "roughly human level AI agents"?

@Dan-dy8zp 7 месяцев назад

Religiosity is considered heritable 25%-60% .

@Dan-dy8zp 7 месяцев назад

Also, it seems relevant that bilateral anterior cingulate cortex destruction produces a psychopath, at any age. That seems a very important point. We don't really learn moral behavior, not from a blank slate exposure to our world. We learn morality the way we learn to walk. Its perfecting our strategy for doing something we basically already know how to do instinctively.

@Dan-dy8zp 7 месяцев назад

There's a lot more than heartbeat and such we are born with. We expect 3 dimensions of space and one of time and that we are agents with preferences for future states of the world. We expect other things that move or have what appear to be 'eyes', to be other agents. We try to figure out what those agents 'want' soon after birth. We can exhibit jealousy by 3 months of age. We can recognize some facial expressions instinctively. People can have phantom limb syndrome who never had the limb, so there is a mental map of a normal human body. Probably many many more things.

@T3xU1A 8 месяцев назад

Excellent interview with Scott. Thank you for posting!

@axrpodcast 8 месяцев назад

Glad to hear you liked it :)

@goodleshoes 9 месяцев назад

Yayaya subbed.

@fbalco 10 месяцев назад

The number of times "like" is said is painful, considering these relatively intelligent people. It was very "like" distracting and "like" difficult to "like" take them "like" seriously.

@Idiomatick 10 месяцев назад

Goes to show you that slick speaking skills isn't related to intelligence.

@akmonra 8 месяцев назад

I tend to find Leike's speaking style really difficult to listen to. Unfortunately, everything he says is pretty valuable, so you just kind of have to pull through.

@tirthankaradhikari4557 10 месяцев назад

Tf went on here?

@LEARNTAU 10 месяцев назад

Democritizing AI is DecentralizedAI is the goal of TauNet.

@maxnadeau3200 10 месяцев назад

these should be called “axrp excerpts”

@Words-. 10 месяцев назад

What if we have an AI that does this for us? And an ai that interprets the interpreter and so on. Maybe an ai wave process in order to give us a constant state of interpretation of what is going on.

@reidelliot1972 7 месяцев назад

There are approaches that use this tactic for outer alignment. I highly recommend checking out the classics: Christiano IDA and debate, etc. It's definitely a common motif in this area of research. But then again, I've seen people raise concerns that automating interpretability tools may enable deceptively aligned policies/agents to further entrench themselves. Check out "AGI-Automated Interpretability is Suicide" by RicG

@sumedh-girish 6 месяцев назад

thats great but how would you know its doing it correctly...

@Words-. 6 месяцев назад

@@sumedh-girish That is a fair question, idk. But at least its a step

@Words-. 10 месяцев назад

Thank you!

@unisloth 10 месяцев назад

I think this field is so interesting. I really hope Scott Aaronson will release his paper soon.

@chrisCore95 Год назад

"Mech interp is not necessary nor sufficient."

@Enzo-uv9us Год назад

Promo'SM 🤩

@InquilineKea Год назад

Is Taleb really into Knightian uncertainty?

@DylanUPSB Год назад

But what if I listen to my podcasts on youtube 😢

@axrpodcast Год назад

Try Google Podcasts - you can listen in your browser, and it lets you adjust the speed! podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5saWJzeW4uY29tLzQzODA4MS9yc3M And sorry - back when I uploaded this, only a small fraction of my AXRP listens were on RU-vid. Now that more are, it might make sense to cross-post to RU-vid (altho I haven't been as active on The Filan Cabinet recently).

@PaulWalker-lk3gi Год назад

So, if this video is part of the "bad AI's" training set will it be able to use this information to help mask its anomalous behavior?

@campbellhutcheson5162 Год назад

I think part of the problem is that this is actually an explanation of why the Neural Network's method works, not an explanation of what its method actually is. Since, the network hasn't learned the concept of the trig functions, it's just learned how to embed the inputs (0-113) on a lossy version of the trig curves etc... A mechanical description, I think, would also be clearer to a less math-y audience. It feels to me like the (quite excellent) authors saw the math that they were familiar with and honed in on why it worked, rather than giving just a straight account of what the network is doing in a step by step fashion.

@dizietz Год назад

Do you plan to publish the results from the survey?

@axrpodcast Год назад

Not at the moment, but I could do so if there were sufficient interest. There are only 20 responses, so I'm not sure how interesting it will be for others.

@BrainWrinklers Год назад

Hey Quinten, wanna come on our show? We talk rationality and AI safety+alignment.

@antigonemerlin Год назад

The images are quite helpful, especially for a complete beginner to the field when it comes to terms like stochastic descent. This channel is very underrated.

@axrpodcast Год назад

Thanks - nice to hear!

@DavosJamos Год назад

Beautiful interview. Great questions. You got the best out of Scott. Really interesting.

@axrpodcast Год назад

Glad to hear you liked it :)

@bobtarmac1828 Год назад

Yes, but what about Ai Jobloss? Or Ai as Weapons? Why can’t we Cease Ai / GPT immediately? Pausing Ai before it’s too late?

@ditian6234 Год назад

Excellent podcast! As a new undergraduate in computer science. This is truly inspiring to the field of Ai safety

@halnineooo136 Год назад

Looks like YT isn't your primary platform

@axrpodcast Год назад

Yep - AXRP is primarily a podcast, available whereever good podcasts can be accessed (e.g. spotify, apple podcasts, google podcasts). You can also read a lovingly-compiled transcript here: axrp.net/episode/2021/12/02/episode-12-ai-xrisk-paul-christiano.html

@dizietz Год назад

Great podcast!

@axrpodcast Год назад

Thanks :)

@rstallings69 Год назад

Just my two cents..i have a background in civil engineering and medicine, not computer science , but i am extremely good at connecting dots. The existential risk is not 20% in my opinion, i would guess the chance of shit NOT hitting the fan is less than 5 % , based on everything I've heard about the Black box nature of current systems and how far behind alignment research is in terms of funding as well as progress., this guy has like 3 yrs past school and he considers himself an expert? Sorry to be a troll but i am an extremely logical person and the nature of code and human psychology and the nascent exponential increase in power of AI and its open sourcing recently make me very pessimistic, i really hope im wrong... Even if it doesn't kill us directly there are a lot of malicious actors that will use open source AI to create malicious code l, bots and viruses to totally disrupt our current digital society, not to mention the energy cost of using this technology. Where am I wrong? This is the big problem my mind, even if AI itself is totally beneficent if it's used for malevolent purposes by some and it's extremely powerful are.we not hosed? What if it allows bad actors to hack into the nuclear weapons sites? Or someone can easily create code which will shut down energy grids which are all digital these days? Not to mention creating dangerous nanotech, the list goes on. Am I naive or are you?

@pankajchowdhury9755 Год назад

Yeah you are being naive. First of all, please listen to my man correctly here. You are talking about probabilities but the 20% he is talking about is not the chance or probability. It is expected value of the human potential. Also he was mainly saying that for alignment misses. Your worry about malicious actors is not an example of misalignment. And even in that case you are being naive in thinking that these guys are not thinking about that. Also if you look, this podcast is 1 year old and chatgpt/other generative models wasn't even on anyone's radar, not even openai's(they did not expect this much performance). It's easy come with hindsight here and say that look at all the exponential progress, AI is going to doom us etc etc. You think that even if there was no there would not be classical handcrafted algorithms that would be able to create software, malware at a faster rate? Do you know how much better classical algorithms have become over the years? The malicious actor thing always has been on everyone's list.

@briankrebs7534 Год назад

Really loving the confrontational tone that starts to seep out about halfway into this lol

@briankrebs7534 Год назад

So the bird's eye view hypothesis ought to make predictions about when and where the instances of the agent's source code ought to be occuring and what sort of inputs the agent ought to be subjectively perceiving, and if the inputs it is actually subjected to are in agreement with the hypothesized inputs, then there is a good match between the hypothesis and the objective?

@anabh4569 2 года назад

For the YT chapter feature to work, you need the description timestamps to each be on separate lines.

@axrpodcast 2 года назад

Huh - this is being crossposted from libsyn, where the description has dot-points for the timestamps (and also links for the research being discussed). Lemme try to fix.

@axrpodcast 2 года назад

OK, the timestamps may or may not work now.