The spelled-out intro to language modeling: building makemore

Подписаться 468 тыс.

Просмотров 608 тыс.

50% 1

We implement a bigram character-level language model, which we will further complexify in followup videos into a modern Transformer language model, like GPT. In this video, the focus is on (1) introducing torch.Tensor and its subtleties and use in efficiently evaluating neural networks and (2) the overall framework of language modeling that includes model training, sampling, and the evaluation of a loss (e.g. the negative log likelihood for classification).
Links:
- makemore on github: github.com/karpathy/makemore
- jupyter notebook I built in this video: github.com/karpathy/nn-zero-t...
- my website: karpathy.ai
- my twitter: / karpathy
- (new) Neural Networks: Zero to Hero series Discord channel: / discord , for people who'd like to chat more and go beyond youtube comments
Useful links for practice:
- Python + Numpy tutorial from CS231n cs231n.github.io/python-numpy... . We use torch.tensor instead of numpy.array in this video. Their design (e.g. broadcasting, data types, etc.) is so similar that practicing one is basically practicing the other, just be careful with some of the APIs - how various functions are named, what arguments they take, etc. - these details can vary.
- PyTorch tutorial on Tensor pytorch.org/tutorials/beginne...
- Another PyTorch intro to Tensor pytorch.org/tutorials/beginne...
Exercises:
E01: train a trigram language model, i.e. take two characters as an input to predict the 3rd one. Feel free to use either counting or a neural net. Evaluate the loss; Did it improve over a bigram model?
E02: split up the dataset randomly into 80% train set, 10% dev set, 10% test set. Train the bigram and trigram models only on the training set. Evaluate them on dev and test splits. What can you see?
E03: use the dev set to tune the strength of smoothing (or regularization) for the trigram model - i.e. try many possibilities and see which one works best based on the dev set loss. What patterns can you see in the train and dev set loss as you tune this strength? Take the best setting of the smoothing and evaluate on the test set once and at the end. How good of a loss do you achieve?
E04: we saw that our 1-hot vectors merely select a row of W, so producing these vectors explicitly feels wasteful. Can you delete our use of F.one_hot in favor of simply indexing into rows of W?
E05: look up and use F.cross_entropy instead. You should achieve the same result. Can you think of why we'd prefer to use F.cross_entropy instead?
E06: meta-exercise! Think of a fun/interesting exercise and complete it.
Chapters:
00:00:00 intro
00:03:03 reading and exploring the dataset
00:06:24 exploring the bigrams in the dataset
00:09:24 counting bigrams in a python dictionary
00:12:45 counting bigrams in a 2D torch tensor ("training the model")
00:18:19 visualizing the bigram tensor
00:20:54 deleting spurious (S) and (E) tokens in favor of a single . token
00:24:02 sampling from the model
00:36:17 efficiency! vectorized normalization of the rows, tensor broadcasting
00:50:14 loss function (the negative log likelihood of the data under our model)
01:00:50 model smoothing with fake counts
01:02:57 PART 2: the neural network approach: intro
01:05:26 creating the bigram dataset for the neural net
01:10:01 feeding integers into neural nets? one-hot encodings
01:13:53 the "neural net": one linear layer of neurons implemented with matrix multiplication
01:18:46 transforming neural net outputs into probabilities: the softmax
01:26:17 summary, preview to next steps, reference to micrograd
01:35:49 vectorized loss
01:38:36 backward and update, in PyTorch
01:42:55 putting everything together
01:47:49 note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix
01:50:18 note 2: model smoothing as regularization loss
01:54:31 sampling from the neural net
01:56:16 conclusion

Наука

Опубликовано:

8 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 669

@minjunesong6667 Год назад

I haven't commented on a youtube video since 2017. But I have to, in the slim case that you actually read this comment Adrej! Please keep doing what you are doing! You are an absolute gem of an educator, and the millions of minds you are enlightening with each video will do great things that will compound and make the world a better place.

@AndrejKarpathy Год назад

reminded of ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-B8C5sjjhsso.html :D

@dhatrimukkamalla Год назад

@@AndrejKarpathy you sure made YT comments section a better place lol.. Excellent videos, please keep them coming, or shall I say make more! Thank you!!

@NewGirlinCalgary Год назад

@@AndrejKarpathy 🤣🤣🤣

@khalilsabri7978 Год назад

thanks for writing this comment for all of us ! please keep us these videos , as Minjune said , you're a gem of an educator !

@PatrickHoodDaniel Год назад

@@AndrejKarpathy So lo no mo. Classic!

@vincentd2418 Год назад

What a privilege to be learning from someone as accomplished as Andrej, all for free. The internet at its best🙏

@RalphDratman Год назад

Just what this is -- a privilege indeed! We don't even have to pay tuition, or travel to Stanford.

@barjeshbarjesh8215 11 месяцев назад

I am not lucky; I am blessed!

@nickfruneaux5232 8 месяцев назад

absolutely!!!

@kamikaze9271 7 месяцев назад

So true!

@iNTERnazionaleNotizia589 5 месяцев назад

Hopefully RU-vid will be free FOREVER AND EVER, not like Medium or Towardsdatascience...

@AndrejKarpathy Год назад

Update: I added some suggested exercises to the description of the video. imo learning requires in-person tinkering and work, watching a video is not enough. If you complete the exercises please feel free to link your work here. (+Feel free to suggest other good exercises!)

@AndrejKarpathy Год назад

@@ibadrather oh. Please go to Discord then, linked in the description. Sorry :\

@ibadrather Год назад

I don't know if this is a good place for Q&A but there is something I need to ask that I cant't wrap my head around. I was training the trigram language model and loss was less than for the bigram language model but the model was worse. I tried to generate a few names and I reliased I made a huge error in data preparation. The question I have is how big of an indicator is loss? Is loss the only thing that matters for model performance. I understand there are other metrics of model perfromance. I have actually faced something in my work. I am stabilizing a video using IMU sensor. And I am training a NN for camera pose estimation. For different architectures the lower loss models don not necessarily perform better. When our team looks at the stanilized video many times the model with higher loss generates a visually better stabilized video. I don't quiet understand this. That's why I am asking how much is loss the indicative of model performance. I don't expect you to answer this here but if you may talk abou this in your future lectures or somewhere else.

@OwenKraweki Год назад

My loss for trigrams, count model, using all data for training was around 2.0931 and I was able to get close with NN approach. I'm not sure if the resulting names were better, but I wasn't able to generate the exactly same names with the count and NN approaches anymore (even using the same generator). Also I'm not sure how to best share/link my solution (I have the notebook on my local drive).

@sanesanyo Год назад

I built the trigram model by concatenating the one hot encoded vector for the first two letters & feed them through a neuron & rest is the same. I think that is fine way to train a trigram model. Any views on that? I did attain a lower loss compared to bigram although the results are not significantly better.

@stanislavnikolov7423 Год назад

@@ibadrather Not an expert myself, but here’s how I would explain it: Coming up with a loss function is like coming up with a target you optimise for. Apparently your perception of how good a result is (your human brain loss function) differs from what you optimise your network toward. In that case you should come up with a better equation to match your gut feeling. Practical example. Let’s say you want to train a network that produces ice cream. Your loss function is the amount of sugar in the ice cream. The best network you train crushes the loss, but produces simple 100% sugar syrup. It does not have the texture and consistency of real ice cream. A different network may make great ice cream texturewise, but put less sugar in it, thus having worse loss. So, adjust your loss function to score for texture as well.

@clray123 Год назад

The reason why this is such excellent teaching is because it's constructed bottom-up. It builds more abstract concepts using more concrete ones, generalizations follow concrete examples. At no point there is a situation in which the learner has to "just assume" something, which will "become clear later on" (in most instances when a teacher says it, it doesn't; it just causes people to desperately try to guess the required knowledge on their own to fill in the gaps, distracting from anything that follows, and producing mental discomfort). The bottom up approach produces pleasure from a series of little "a-ha" and "I agree" moments and a general trust in the teacher. I contrast this to the much worse fastai courses - in which there are lots of gaps and hand waving because of their top-down approach.

@deyjishnu 5 месяцев назад

This is exactly my experience as well. Well said.

@RalphDratman Год назад

I cannot imagine a better -- or kinder -- teacher. He is feeding his audience knowledge and understanding, in small delicious bites, without worrying about their level of prior knowledge. And he is smiling irrepresively all the time! Such a good person.

@lukeanglin263 3 месяца назад

Never in my life have I found an educator like this. This is free gold.

@dx5nmnv Год назад

You're literally the best, man. These lessons are brilliant, hope you keep doing them. Thank u so much

@talhakaraca Год назад

Seeing him back to education is great. Hope to see some computer vision lessons 👍👌

@HarishNarayanan Год назад

@@talhakaraca If you search on this very site you will find truly clear lessons on computer vision from Andrej (from like 2016 or so)!

@talhakaraca Год назад

@@HarishNarayanan thanks a lot. i found it 🙏

@HarishNarayanan Год назад

@@talhakaraca You are very welcome. It was that lecture series that got me first informed and interested in deep learning.

@amanjha9759 Год назад

The scale of impact these lectures will have is going to be enormous. Please keep doing them and thanks a lot Andrej.

@samot1808 Год назад

I am pretty confident that the impact of his CS231n course is bigger than even his work at Tesla. I know too many people working in machine learning that where introduced to the field by CS231n. It changed my life. Makes you wonder if he should just spend all his efforts to teaching. The impact is truly exponential.

@XorAlex Год назад

Too many people are working on AI performance and not enough people are working on AI alignment. If this trend continues, the impact might be enormously negative.

@samot1808 Год назад

@@XorAlex please explain

@izybit Год назад

@@samot1808 AI alignment and the AI control problem are aspects of how to build AI systems such that they will aid rather than harm their creators. it basically means that we are like kids playing with plutonium and it won't take much for someone to turn it into a bomb (on purpose or by mistake) and make everyone's life a living hell. All that leads to a need for more regulation and oversight of the really advanced AI models because otherwise we may end up with AI generators that can take a photo of you and create a video showing you killing babies, or worse, an AI that self-replicates and takes over entire systems leading to collapsed economies and countries (or, maybe, even something like the Terminator).

@jakobpcoder Год назад

Absolutely insane levels of detail you are going into. This Series is invaluable for beginners in the field as well as for people like me, who are building own models all the time, but want to go back to basics from time to time to not get stuck in wrong assumptions learned from fast success with Keras :D I really hope you will continue this Series for quite a while! Thanks a lot, AI Grand Master Andrej!

@mrdbourke Год назад

Another weekend watch! Epic to see these videos coming out! Thank you for all your efforts Andrei!

@MattRosinski Год назад

I love how you make the connections between the counting and gradient based approaches. Seeing the predictions from the gradient descent method were identical to the predictions from statistical probabilities from the counts was, for me, a big aha moment. Thank you so much for these videos Andrej. Seeing how you build things from the ground up to transformers will be fascinating!

@dansplain2393 Год назад

I’ve literally never had heard the logits are counts, softmax turns into probs, way of thinking before. Worth the ticket price alone!

@krzysztofwos1856 Год назад

Andrej, your videos are the clearest explanations of these topics I have ever seen. My hat off to you. I wish you have taught my ML and NLP classes in college. There's a huge difference between your ground-up, code-first approach and the usual dry, academic presentation of these topics. It also demonstrates the power of RU-vid as an educational tool. Thank you for your efforts!

@Consural Год назад

A teacher that explains complex concepts both clearly and accurately. I must be dreaming. Thank you Mr. Karpathy.

@mahdi_sc 10 месяцев назад

The video series featured on your channel undoubtedly stands as the most consequential and intuitive material I have encountered. The depth of knowledge gained from these instructional materials is significant, and the way you've presented complex topics with such clarity is commendable. I find myself consistently recommending them to both friends and colleagues, as I truly believe the value they offer to any learner is unparalleled. The gratitude I feel for the work you've put into these videos is immense, as the concepts I've absorbed from your content have undoubtedly expanded my understanding and competence. This invaluable contribution to the field has benefited me tremendously, and I am certain it has equally enriched others' learning experiences.

@FelipeKuhne-us4cl 6 дней назад

I’ve just finished the whole playlist and, for some reason, I started from the last one (GPT Tokenizer), went through the ‘makemore’ ones, and finally watched this one. Each one is better than the other. I couldn’t appreciate more what you’re doing for the community of ‘homeless scientists’ (those who want to become better at their crafts but are not associated with an academic institution) out there, Andrej. The way you teach says a lot about how you learn and how you think others should learn. I hope to find more videos like yours and more people like you. Cheers!! 👏👏👏

@akshaygulabrao4423 Год назад

I love that he understood what a normal person wouldn’t understand and explained those parts.

@saintsfan8119 Год назад

Lex guided me here. I loved your micrograd tutorial. It brought back my A level calculus and reminded me of my Python skills from years back - all whilst teaching me the basics of neural networks. This tutorial is now putting things into practise with a real-world example. Please do more of these, as you're sure to get more people into the world of AI and ML. Python is such a powerful language for manipulating data and you explain it really well by building things up from a basic foundation into something that ends up being fairly complex.

@niclaswustenbecker8902 Год назад

I love your clear and practical way of explaining stuff, the code is so helpful in understanding the concepts. Thanks a lot Andrej!

@realquincyhill Год назад

Your intro to neural networks video is amazing, especially how you focused on the raw mathematical fundamentals rather than just implementation. I can tell this is going to be another banger.

@hlinc2 Год назад

The clarity from this video of all the fundamental concepts and how they connect blew my mind. Thank you!

@tusharkalyani4343 Год назад

This video is the goldmine. It's so intuitive and easy to understand. Even my grad classes could not squeeze this information over a semester-long course. Hats off and it's a privilege to be learning from the accomplished AI master and the best. Thank you for the efforts Andrej :).

@bluecup25 Год назад

I love how you explain every step and function so your tutorial is accessible for non-python programmers as well. Thank you.

@myfolder4561 5 месяцев назад

Thank you Andrej. I can't stress more how much I have benefited and felt inspired by this series. I'm a 40 yo father with a young kid. Work and being a parent have consumed lots of my time - I have always wanted to learn ML/neural network from the ground up but a lot of materials out there are just thick and dense and full of jargons. Coming from a math and actuarial background I had kind of expected myself to be able to pick up this knowledge without too much stumbling but seriously not until your videos did I finally feel so strongly interested and motivated in this subject. It's really fun learning from you and coding along with you - I'm leaving your lectures each time more energized than when it first started. You're such a great educator as many have said.

@Stravideo Год назад

What a pleasure to watch! love the fact there is no shortcut, even for what may seem easy. Everything is well explained and easy to follow. It is very nice to show us the little things to watch for.

@talis1063 Год назад

You're such a good teacher. Nice and steady pace of gradual buildup to get to the end result. Very aware of points where student might get lost. Also respectful of viewers time, always on topic. Even if I paid for this, I wouldn't expect this quality, can't believe I get to watch it for free. Thank you.

@taylorius Год назад

I think minimal, simple-as-possible code implementations, talked through, are just about the best possible way to learn new concepts. All power to you Andrej, and long live these videos.

@ax5344 Год назад

OMG, I feel soooo grateful for the internet! I would have never met a teacher this clear and to my needs in real life. I have watched the famous Standford courses before; they have set a standard in ML courses. It is always the Standford courses and the rest. Likewise, this course is setting a new standard on hands-on courses. I'm only half an hour into the video. I'm already amazed by the sensitivity, clarity and organization of the course. Many many thanks for your generosity to step out and share your knowledge with numerous strangers in the world. Much much indebted! Thank you!

@cassie8324 Год назад

No one on youtube is producing such granular explanations of neural network operations. You have an incredible talent for teaching! Please keep producing this series, it is so refreshing to get such clear, first-principles content on something I'm passionate about from someone with a towering knowledge of the subject.

@a9raag 6 месяцев назад

This is one of the best educational series I have stumbled upon on YT in years! Thank you so much Andrej

@vil9386 5 месяцев назад

What a sense of "knowledge satisfaction" I have after watching your video and working out the details as taught. THANK YOU Andrej.

@steveseeger Год назад

Andrej thanks for doing this. You can have a larger impact bringing ML to the masses and directly inspiring a new generation of engineers and developers than you could have managing work at Tesla.

@edz8659 Год назад

This is the first of your lessons I have watched and you really are one of the best teachers I've ever seen. Thank you for your efforts.

@RaviAnnaswamy Год назад

Mastery is ability to stay with fundamentals. Andrej derives the neural architecture FROM the counts based model! So the log counts, counts and probs are wrapped around the idea of how to get to probs similar to the counts model. Thus he explains why you need to log, why you need to normalizes, then introduces the name for it called softmax! What a way to teach. Brilliant master stroke is when he shows that the samples from the neural model exactly match the samples from the counts model. Wow, I would not have guessed it and many teachers might not have checked it. The connection between 'smoothing' and 'regularization' was also a nice touch. Teaching the new concepts in terms of the known so that there is always a way to think about new ideas rather than taking them as given. For instance the expected optimal loss of the neural model is what one would see in the counts model. Thanks Andrej! By the way one way to interpret the loss, is perplexity. What the number 2.47 says is that every character on average has typically about 2 or 3 characters that are more likely to follow it.

@adityay525125 Год назад

I just want to say, thank you Dr Karpathy, the way you explain concepts is just brilliant, you are making me fall in love with neural nets all over again

@sebastianbitsch Год назад

What an incredible resource - thank you Andrej. I especially enjoyed the intuitive explanation of regularization, what a smooth way of relating it to the simple count-matrix

@darielsurf Год назад

Hi Andrej, I heard two days ago (from a Lex Fridman podcast) that you were thinking in pursuing something related to education. I was surprised and very excited, wishing that it was true and accessible. Today, I ran into your RU-vid channel and I can't be happier, thanks a lot for doing this and for sharing your valuable knowledge! The lectures are incredible detailed and interesting. It's also very nice to see how you enjoy talking about these topics, that's very inspiring. Again, thank you!

@gabrielmoreiraassuncaogass8044 Год назад

Andrej, i'm from Brazil and love ML and to code. I have tried several different classes with various teachers, but yours was by far the best. Simplicity with quality. Congratulations! I loved the class! Looking forward to taking the next ones. The best!.

@stephanewulc807 Год назад

Brillant, simple, complete, accurate, I have only compliments. Thank you very much for one of the best class I had in my life !

@muhannadobeidat Год назад

Amazing delivery as always. Fact that he spent time explaining broadcast rules and some of the quirks of Keepdim shows how much knowledgeable he is and fact that he knows that most struggle with little things like that to get past what they need to do.

@Mutual_Information Год назад

This is how I'm spending my time off from Thanksgiving break. Watching this whole series 🍿

@AntLabJPN Год назад

Another absolutely fantastic tutorial. The detail is incredible and the explanations are so clear. For anyone watching this after me, I feel that the micrograd tutorial is absolutely essential to watch first if you want to really understand things from the ground up. Here, for example, when Andrej runs the loss.backward() function, you'll know exactly what's happening, because you do it manually in the first lesson. I feel that the transition from micrograd (where everything is built from first principles) to makemore (relying on the power of pytorch) leaves you with a suprisingly deep understand of the fundamentals of language modeling. Really superb.

@baboothewonderspam 10 месяцев назад

Thank you for creating this - it's incredibly high-quality and I'm learning so so much from it! It's such a privilege to be able to learn from you.

@BT-te9vx 11 месяцев назад

10 mins into the video, I'm amazed, smiling and feeling as if I've cracked the code to life itself (with the dictionary of bigram counts). of course, I can't build anything with the level of knowledge I have currently but I sure can appreciate how it works in a much better manner. I always knew that things are predicted based on their occurrence in data but somehow seeing those counts(for eg. for first 10 words, of `('a', ''): 7`) makes it so glaringly obvious which no amount of imagination could've done for me. You are a scientist, researcher, high paid exec, knowledgeable, innovator but more than anything, you are the best teacher who can elucidate complex things in simple terms which then all make sense and seem obvious. And that requires not just mastery but a passion for the subject.

@jeankunz5986 Год назад

Andrej, the elegance and simplicity of your code is beautiful and an example of the right way to write python

@biswaprakashmishra398 Год назад

The density of information in these tutorials is hugeeeee.

@jorgitozor Месяц назад

Really incredible how you can explain clearly a complex subject only with raw material. Thanks a lot for the valuable knowledge

@log_of_1 7 месяцев назад

Thank you for taking the time out of your busy days to teach others. This is a very kind act.

@lorisdeluca610 Год назад

Wow. Just WOW. Andrej you are simply too good! Thank you for sharing such valuable content on RU-vid, hands down the best one around.

@mahmoudabuzamel7038 Год назад

You're getting me speechless the way you explain things and simplify concepts!!!!

@alternative1967 2 месяца назад

You're lifting the lid on the black box and it feels like Im sitting on a perceptron and watching the algos make their changes, forward, and back. It has provided such a deeper understanding of the topics in the video. I have recommended it to my cohort of AI students, of which I am one, as supplementary learning. But to be honest, this is the way it should be taught. Excelllent job, Andrej.

@richardmthompson 11 месяцев назад

There's so much packed in there. I spent the whole day on this and got to the 20 minute mark, haha. Great teacher, thank you for this logical and practical approach.

@yangchenyun Год назад

This lecture is well paced and introduces concepts one by one where later complex ones built on top of previous ones.

@guitdude13 Год назад

This is connecting so many dots for me. I really enjoy the combo of theory with practical tips for using the torch APIs.

@blaze-pn6fk Год назад

Amazing videos! it's insane how detailed and intuitive these videos are. Thanks for making these.

@adarshsonare9049 Год назад

I went through building micro grad 3~4 times, It took me a week to understand a good portion of that and now started with this. I am really looking forward to going through this series. Thanks for doing this Andrej, you are amazing.

@rockapedra1130 Год назад

Fantastic! Thank you for sharing your many years of experience! You are a truly gifted educator!

@radek_osmulski Год назад

Unbelievable lecture, Andrej 🙏 So many wonderful parallels. Thanks a lot for recording this and sharing it so freely with the world 🙂

@noah8405 Год назад

Taught me how to do the Rubik’s cube, now teaching me neural networks, truly an amazing teacher!

@filipcvetic6606 10 месяцев назад

Andrej’s way of explaining is exactly how I want things to be explained to me. It’s actually insane these high quality videos are free.

@JuanuHaedo Год назад

Please DONT STOP soing this! The world is so lucky to have you sharing this knowledge!

@ronaldlegere 11 месяцев назад

There are so many fantastic nuggets in these videos even for those already with substantial pytorch experience!

@mcnica89 Год назад

The level of pedagogy is so so good here; I love that you start small and build up and I particularly love that you pointed out common pitfalls as you went. I am actually teaching a course where I was going to have to explain broadcasting this term, but I think I am just going to link my students to this video instead. Really excellent stuff! One small suggestion is to consider using Desmos instead of wolframalpha is you just want to show a simple function

@kindoblue Год назад

Thank God you are pushing videos! Grateful 🤟

@FireFly969 Месяц назад

I love how you take nn, and explain to us, not by already built in function in pytorch, but by how things works, then giving us what the equivelent lf it in pytorch

@yuanhu6031 4 месяца назад

I absolutely love these entire episode, high quality content and very educational. Thanks a lot for doing this for the good of general public.

@benjaminlai5638 Год назад

Andrej, thank you for creating these video. They are the perfect balance of theory and practical implementation.

@jimmy21584 Год назад

I’m an old-school software developer, revising machine learning for the first time since my undergrad studies. Back in the day we called them Markov Chains instead of Bigram Models. Thanks for the fantastic refresher!

@candrewlee14 Год назад

You are incredible! This makes learning about ML so fun, your passion and knowledge really shine here. I’m a college student studying CS, and you lecture better than many professors. Not a knock on them though, props to you.

@DigiTechAnimation-xk1tp 3 месяца назад

The music in this video is perfect. It really sets the mood and creates a powerful emotional connection.

@jtl_1 3 месяца назад

Besides having the best explanation of LLMs from this great teacher, you get a free hands on python course, which has also better explanation than lots of others. Thx a lot Andrejq!

@pablofernandez2671 Год назад

Amazing explanations, Andrej. Thanks for sharing your knowledge in such a clear and enlightening way. Thank you soooo much! I'm really motivated thanks to you.

@jedi10101 7 месяцев назад

new sub here. i started w/ "let's build gpt: from scratch, in code, spelled out". i learned lot, enjoyed coding along, appreciated the thoughtful explanations. i'm hooked & will be watching the makemore series. thank you very much sir for sharing your knowledge.

@pauek Год назад

Two hours of pure clarity... soo addicting!!

@otter662 2 дня назад

Really generous providing this walkthrough. Thank you.

@pastrop2003 Год назад

Thank you Andrej, this is absolutely the best hands-on coding neural nets & PyTorch tutorial. Special thanks for decoding cryptic PyTorch docs. Very, very useful!

@fotonical Год назад

Another awesome breakdown, once again love how you take the time to help transfer intuition up and above implementation details.

@oshaya Год назад

Not exactly revolutionary but so damn well explained and resolved in PyTorch. This is ML pedagogy for the masses. I praise your efforts.

@oshaya Год назад

However, getting people to understand the nitty-gritty of a Transformer Language Model (like GPT), that will prove truly revolutionary!

@raziberg Год назад

Thanks a lot for the videos. I was familiar with the concepts of basic machine learning but not with the actual workings of it. You really helped me get to the next level of understanding.

@tycox9364 Год назад

Holy shit, an actual starting point ❤️

@curiousnerd3444 Год назад

Can’t believe how easily you demystify these things!!! Can’t wait for the next one and the next one!

@maxhansen5166 Год назад

This Channel is an amazing complement to the Andrew Ng's DL Specialisation!!!

@spazioLVGA Год назад

You definitely have a talent for education. Use it and you'll do so much good for so many people. Thank you Andrej!

@NarendraBME 5 месяцев назад

Let me say it, THE best educational series. Sir, I don't have enough words to thank you.

@tolifeandlearning3919 6 месяцев назад

This is so awesome. Thanks Andrej for being so nice and sharing your knowledge.

@camorimd 11 месяцев назад

I can not stress enough how much this videos are helping me. Thank u. A lot.

@javidjamae 9 месяцев назад

Phenomenal tutorial, thanks so much! I went through the entire thing and built it up from scratch and learned a ton!

@snarkyboojum Год назад

So cool to see the equivalance between the manually calculated model and neural network model optimised with gradient descent. It's not quite the same output either. The regularization loss is required to get the two super close too. Pretty neat.

@owendorsey5866 Год назад

So excited to watch this! The tutorial about micrograd was great. Can’t wait to make my way through this one, loving the content :)

@anvayjain4100 3 месяца назад

The way he explained zip method even a beginner can understand. From very basic python to an entire language model. I can't thank this man enough for teaching us.

@Pythoncode-daily 6 месяцев назад

Thank you for the unique opportunity to learn how to write code from the most advanced developer, Andrej! An almost priceless and irreplaceable opportunity! Extremely useful and efficient!

@mrmiyagi26 Год назад

Amazing tutorial to explain the intricacies of various LM and ML concepts! Looking forward to the next LM video.

@vq8gef32 2 месяца назад

Thank you so much Andrej . Amazing I am really enjoying every second of these series.

@chesstictacs3107 Год назад

This is such a valuable content! Keep it up, Andrej! Thank you for sharing your knowledge!

@sheikhshafayat6984 Год назад

These videos are absolutely gold! What a time to be alive

@Nova-Rift Год назад

You're amazing! So glad to have you in this world and industry.

@DocLow Год назад

Thank you for posting these! It's extremely valuable. The end result of the neural net wasn't all that anticlimatic, at least the longer "names" did differ slightly so it wasn't 100% the same weights as in the first case :)

@RodRigoGarciaCasado Год назад

Andrej enseña de manera pedagógica y sencilla un tema muy complejo y además regala muchos tips invaluables de programación, python, Torch y cómo aproximarse a la solución de un problema. Tengo varios años aprendiendo ML, casi literalmente desde cero (no soy ingeniero, ni estadístico, ni programador), y estas lecciones me ordenaron muchas cosas en mi cabeza, (ahá moments, como comentó alguien en el hilo), entendí mucho mejor conceptos y procesos que antes apenas alcanzaba a intuir. De verdad es como abrir la AI y ver cómo es por dentro. Recomiendo ver los videos en orden, antes de este vi el de Micrograd y me pareció increíble entender todo. De verdad, mil gracias por este aporte Andrej.

@djubadjuba Год назад

Amazing, this baby steps approach is so powerful to sediment the concepts.

@esaliya Год назад

This is like listening to a lecture by Richard Feynman. Super clear!

@groundingtiming 5 месяцев назад

Andrej, you are simply amazing for doing this makemore series. I do not usually comment on videos, have not commented in a very long time, I just want to say thanks for your work and that the AI world is probably crazy now, it is videos like these that help even trained engineers get a proper understanding of how the models are made and the thoughts behind it, and not just implement and run or spend hours debugging because of a bug like broadcasting...