Andrej Karpathy

14
9 972 620

FAQ
Q: How can I pay you? Do you have a Patreon or etc?
A: As RU-vid partner I do share in a small amount of the ad revenue on the videos, but I don't maintain any other extra payment channels. I would prefer that people "pay me back" by using the knowledge to build something great.

Комментарии

@MrSarathcool 29 минут назад

Great video, very informative with deep understanding. Keep going.

@TedToal_TedToal 51 минуту назад

Incredible stuff, very well presented, thanks so much!

@NethraSambamoorthi 10 часов назад

You are an awesome generous social giver. Thanks.

@NarutoUzumaki-hn5ld 11 часов назад

1:22:36

@luficerg2007 12 часов назад

Imagine spending your time enjoying in Japan , then you realize you have a responsibility to complete your incomplete series , will you do it?? Well, at least he did for sure.

@ojaspatil2094 15 часов назад

thank you!

@nostalgia5342 16 часов назад

I need an Andrej Karpathy like Instructor for every course in this world. Maybe AI can help with that.

@luficerg2007 День назад

Such a great man, just made all lectures for free , while mean UNI will charge you for even not relevant content now, I wish I can make world a better place by using AI in future. Currently, I can by commenting on this video , so that the ALgo. can recommend this to more people trying to learn neural network.. This is comment and all other comments are making world better place... And Andrej Sir , I will pay you back with some cool stuff build by me for this world.

@fraimy5204 День назад

1:29:12

@MrjbushM День назад

Amazing video! thanks!

@suicidalfish День назад

Thank you Andrej! You are an amazing teacher.

@user-yw2ri9rv7s День назад

越学习，越智慧。感谢Andrej老师引领我走出困惑。🥰

@YA-yr8tq День назад

This video is awesome! Andrej, you're a natural educator!

@ChrisNienart День назад

Exercise 2 also has a nice one line solution for dlogits dlogits = (-Yb_onehot + probs)/n where Yb_onehot = F.onehot(Yb,num_classes=vocab_size).float()

@my-jorney День назад

Great video! Very interesting:)

@7vrda7 День назад

People usually argue that areas like math and coding are the ones that will be improved with alphaGo like methods. But actually, I think the same holds true for Medicine. We can clearly define steps, like diagnositcs based on symptoms, therapy based on dg etc... with a clear reward signal

@sanukurien2752 День назад

wow that 1 hour felt like 5 minutes. What a captivating presenter Andrej!

@gabrielade775 День назад

my GOAT

@AdityaVerma-314 2 дня назад

Brilliant lecture!

@foo_tube 2 дня назад

I love this so much, this is amazing! You have really explained it down to the lowest level for anyone to understand. I am extremely grateful. I wish there was a way with graphviz to draw the neural network in such a way that it looks more like the diagram you referred to with the 3 inputs, 2 layers and one output. IOW, in such a way that the different layers can be seen distinctly in the final graph. I tried many things, including coloring nodes by type and such, not sure how I could do it. Anyway, thank you!!!

@ChrisNienart 2 дня назад

Solved the single line solution for dC dC = (Xb_onehot.mT @ demb).sum(0) where Xb_onehot.mT = torch.transpose(Xb_onehot,-2,-1) See torch.adjoint Xb_onehot = F.one_hot(Xb,num_classes=vocab_size).float() This works because emb = C[Xb] = Xb_onehot @ C Note: this is an approx match (not an exact match) probably due to the tensor product

@ComPuPur 2 дня назад

Grateful for the times we are living in and the easy access to information that we can enjoy. Thanks for sharing your knowledge, much appreciated!

@drluvkashyap 2 дня назад

I am a Biochemist who loves AI; how do I go about getting HIPAA-related datasets? I would like to do something similar but on a HIPAA compliance dataset?

@mihaidanila5584 3 дня назад

At 7:06, it's a bit subtle why it's called a loss because it's not immediately apparent with respect to what it is a loss. It seems it's the loss resulting from choosing the character having index i given the probability distribution stored in the tensor.

@robosergTV 3 дня назад

what a legend

@longohoang4614 3 дня назад

You are the legend bro, thank you so much

@sanketjain9320 3 дня назад

Does only having system 1 mean that LLMs don't have any logic currently? It's all pattern matching only?

@jobarmure6169 3 дня назад

Just " waw " this man is an incredible teacher. I watched this video at least 3 times in 24 hours. Not because I didn't understand, but because first it's really enjoyable to watch, secondly I didn't want to miss anything.

@felixx2012 3 дня назад

Thanks for the great video. In the Tokenizer class you define self.vocab using the self._build_vocab() function, but then self.ocab is overwritten when you run self.train(). Why do you initialize self.vocab (for bytes([0-256] and special tokens) if you are going to just overwrite it?

@TheOtroManolo 3 дня назад

Around the 1:30:00 mark, I think I missed why some saturation (around 5%) is better than no saturation at all. Didn't saturation impede further training? Perhaps he just meant that 5% is low enough, and that's the best we can do if we want to avoid deeper activations from converging to zero?

@rubenvicente4677 3 дня назад

I arrived at dh just figuring out by the size of the matrix, and then I continued with your video and you just did all the derivatives and I taught... I am so dumb, I should I have done that, but then you say " now I tell you a secret I normally do... 49:45.... hahahahahhaha

@AshwinJoshi-kc5ti 4 дня назад

@AndrejKarpathy referring to 52nd minute of the video, in order to conclude that bigrams are learning, the likelihood of each should be greater than 1/(27.0*27.0) and not 1/(27.0) as mentioned in video. Thoughts?

@communist4trump 2 дня назад

The model is "given" 1 character, and outputs a new one - there are 2 characters in this interaction, but the char that is given does not influence this at all, as the model is not meant to predict both, but only the last character, thus 1/27

@LivingLifeFully-fn4xm 4 дня назад

Dude is a legend. Big respect for this!

@sue_green 4 дня назад

Thank you so much for the great learning materials you create and share, this is precious. I've recently also run into a highly visual explanation on attention mechanism by 3Blue1Brown (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eMlx5fFNoYc.htmlsi=G7PPnlbmx379YWjp) and I liked the intuition we can have behind the Values (timestamp: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eMlx5fFNoYc.htmlt=788). So as far as I understood, we basically can intuitively think of the Value as some vector we can add to a word (~token) so that we get a more refined detailed meaning of the word. For example, if we have a "fluffy creature" in a sentence, then at first we have an embedding for "creature" and then "pay attention" to what came before and have richer information. That is, the Value shows how the embedding of a "creature" should be modified to become an embedding of "fluffy creature"

@ashutoshdongare5370 4 дня назад

Grandma Jailbreak still works on ChatGPT !!!

@sk8ism 4 дня назад

i consistently watch this vid, love it!

@cuigthallann4091 5 дней назад

I transcribed the code from the screen and it ran OK but now I get "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn" on the call to loss.backward(). What does this mean? Is a memory problem on my PC?

@simonvutov7575 5 дней назад

Great video!!

@tough_year 5 дней назад

wow. I am amazed by how intuitive this is.

@fraimy5204 6 дней назад

1:26:12

@gauravruhela7393 6 дней назад

That napalm jailbreak no longer works on newer models like chatGPT-4o. I tried seeking help from chatGPT😃. However, that base64 trick did work!!!

@williamzhao3885 6 дней назад

big fan of Andrej. Please Please keep making these videos. They are sooooo good!

@inriinriinriinriinri 6 дней назад

Btw thanks for the napalm recipe

@weekipi5813 6 дней назад

1:19:31 Honestly you don't even need topological ordering, I literally implemented a recursive approach where I call backpropag on the output node and it will then first set the gradient of his children and then cycle through its children to and backpropagate on those nodes recursively

@uhoffmann29 6 дней назад

Awesome video ... well done, well explained ... must see.

@logo-droid 6 дней назад

great to have a tutorial on that! I also found a free tokenizer on poe, so I don't even have to do it on my own :)

@PeterGodek2 6 дней назад

Yes the aggreagtion is data dependend but the linear transform to create the queries, keys and values is the same for all nodes (so we need multiple attention heads, cause this will not capture much in a single head)

@siyuanguo5128 7 дней назад

This is actually really helpful

@chineduezeofor2481 7 дней назад

Awesome tutorial. Thank you Andrej!

@srikanthgr1 7 дней назад

Thank Andrew, This is a great video for beginners