Sourish Kundu

14
61 899

Hi all! I produce content about computer science and technology in general, along with some of my thoughts on life. I believe that in order to become the best version of yourself, one must find true passion and joy in what they work on and this channel is me sharing that with others.

I recently graduated college from the University of Wisconsin-Madison majoring in Computer Science, Data Science, & Economics. One thing that was really hard for me during this period was preparing for internships and getting offers. I also want to share some of my best tips and advice for current college students so hopefully their experiences are a little bit smoother than mine!

All content on this channel is produced by and is the intellectual property of Sourish Kundu LLC.

Комментарии

@VincentKun 6 часов назад

This video is super helpful my god thank you

@Chak29 8 часов назад

I might just wanna attend GTC next year, nice video!

@codersama День назад

the implementation has more information than a whole semester of my MSc in AI

@alexanderwoxstrom3893 2 дня назад

Sorry did I misunderstand something or did you say SGD when it was only GD you talked about? When was stochastic elements discussed?

@byzantagaming648 2 дня назад

I am the only one to not understand the RMS propagation math formula? What is the gradient squared is it per component or is the Hessian? How do you divide a vector by another vector? Could someane explain me please.

@ajinkyaraskar9031 2 дня назад

what a title 😂

@aadilzikre 2 дня назад

Very Clear Explanation! Thank you. I especially appreciate the fact that you included the equations.

@vladyslavkorenyak872 2 дня назад

I wonder if we could use the same training loop NVIDIA used in the DrEureka paper to find even better optimizers.

@ferlaa 2 дня назад

The intuition behind why the methods help with convergence is a bit misleading imo. The problem is not in general with slow convergence close to optimum point because of a small gradient, that can easily be fixed with letting step size depend on gradient size. The problem that it solves is when the iterations zig-zag because of large components in some directions and small components in the direction you actually want to move. By averaging (or similar use of past gradients) you effectively cancel out the components causing the zig-zag.

@sourishk07 2 дня назад

Hello! Thanks for the comment. Optimizers like RMSProp and Adam do make step size dependent on gradient size, which I showcase in the video, so while there are other techniques to deal with slow convergence close to the optimum point due to small gradients, having these optimizers still help. Maybe I could've made this part clearer though. Also, from my understanding, learning rate decay is a pretty popular technique used so wouldn't that just slow down convergence even more as the learning rate decays & the loss approaches the area with smaller gradients? However, I definitely agree with your bigger point about these optimizers from preventing the loss from zig-zagging! In my RMSProp example, I do show how the loss is able to take a more direct route from the starting point to the minimum. Maybe I could've showcased a bigger example where SGD zig-zags more prominently to further illustrate the benefit that RMSProp & Adam bring to the table. I really appreciate you taking the time to give me feedback.

@ferlaa День назад

@@sourishk07 Yeah, I absolutely think the animations give good insight into the different strategies within "moment"-based optimizers. My point was more that even with "vanilla" gradient descent methods, the step sizes can be handled to not vanish as the gradient gets smaller, and that real benefit of the other methods is for altering the _direction_ of descent to deal with situations where eigenvalues of the (locally approximate) quatratic form differs in orders of magnitude. But I must also admit that (especially in the field of machine learning) the name SGD seem to be more or less _defined_ to include a fixed decay rate of step sizes, rather than just the method of finding a step direction (where finding step sizes would be a separate (sub-)problem), so your interpretation is probably more accurate than mine. Anyway, thanks for replying and I hope you continue making videos on the topic!

@theardentone791 3 дня назад

Absolutely loved the graphics and intensive paper based proof of working of different optimizers , all in the same video. You just earned a loyal viewer.

@sourishk07 2 дня назад

Thank you so much! I'm honored to hear that!

@bobinwobbin4094 3 дня назад

The “problem” the Adam algorithm in this case is presented to solve (the one with local and global minima) is simply wrong - in small amounts of dimensions this is infact a problem, but the condition for the existence of a local minima grows more and more strongly with the amount of dimensions. So in practice, when you have millions of parameters and therefore dimensions, local minima that aren’t the global minima will simply not even exist, the probability for such existence is simply unfathomably small.

@sourishk07 2 дня назад

Hi! This is a fascinating point you bring up. I did say at the beginning that the scope of optimizers wasn't just limited to neural networks in high dimensions, but could also be applicable in lower dimensions. However, I probably should've added a section about saddle points to make this part of the video more thorough, so I really appreciate the feedback!

@electronicstv5884 3 дня назад

This Server is a dream 😄

@sourishk07 2 дня назад

Haha stay tuned for a more upgraded one soon!

@Alice_Fumo 4 дня назад

I used to have networks where the loss was fluctuating in a very periodic manner every 30 or so steps and I never knew why that happened. Now it makes sense! It just takes a number of steps for the direction of Adam weight updates to change. I really should have looked this up earlier.

@sourishk07 2 дня назад

Hmm while this might be Adam's fault, I would encourage you to see if you can replicate the issue with SGD w/ Momentum or see if another optimizer without momentum solves it. I believe there are a wide array of reasons as to why this periodic behavior might emerge.

@masterycgi 5 дней назад

why not using a metaheuristic approach?

@sourishk07 2 дня назад

Hi! There seems to be many interesting papers about using metaheuristic approaches with machine learning, but I haven't seen too many applications of them in industry. However, this is a topic I haven't looked too deeply into! I simply wanted to discuss the strategies that are commonly used by modern day deep learning and maybe I'll make another video about metaheuristic approaches! Thanks for the idea!

@Mutual_Information 5 дней назад

Great video dude!

@sourishk07 5 дней назад

Thanks so much! I've seen your videos before! I really liked your videos about Policy Gradients methods & Importance Sampling!!!

@Mutual_Information 5 дней назад

@@sourishk07 thanks! There was some hard work behind them, so I’m happy to hear they’re appreciated. But I don’t need to tell you that. This video is a master piece!

@sourishk07 5 дней назад

I really appreciate that coming from you!!

@gemini_537 5 дней назад

Gemini 1.5 Pro: This video is about optimizers in machine learning. Optimizers are algorithms that are used to adjust the weights of a machine learning model during training. The goal is to find the optimal set of weights that will minimize the loss function. The video discusses four different optimizers: Stochastic Gradient Descent (SGD), SGD with Momentum, RMSprop, and Adam. * Stochastic Gradient Descent (SGD) is the simplest optimizer. It takes a step in the direction of the negative gradient of the loss function. The size of the step is determined by the learning rate. * SGD with Momentum is a variant of SGD that takes into account the history of the gradients. This can help the optimizer to converge more quickly. * RMSprop is another variant of SGD that adapts the learning rate for each parameter of the model. This can help to prevent the optimizer from getting stuck in local minima. * Adam is an optimizer that combines the ideas of momentum and adaptive learning rates. It is often considered to be a very effective optimizer. The video also discusses the fact that different optimizers can be better suited for different tasks. For example, Adam is often a good choice for training deep neural networks. Here are some of the key points from the video: * Optimizers are algorithms that are used to adjust the weights of a machine learning model during training. * The goal of an optimizer is to find the optimal set of weights that will minimize the loss function. * There are many different optimizers available, each with its own strengths and weaknesses. * The choice of optimizer can have a significant impact on the performance of a machine learning model.

@sourishk07 5 дней назад

Thank you Gemini for watching, although I'm not sure you learned anything from this lol

@jeremiahvandagrift5401 6 дней назад

Very nicely explained. Wish you brought up the relationship between these optimizers and numerical procedures though. Like how vanilla gradient descent is just Euler's method applied to a gradient rather than one derivative.

@sourishk07 5 дней назад

Thank you so much. And there were so many topics I wanted to cram into this video but couldn't in the interest of time. That is a very interesting topic to cover and I'll add it to my list! Hopefully we can visit it soon :) I appreciate the idea

@da0ud 6 дней назад

7hrs of work per day, that's pretty sweet. Wondering how many hours your Chinese colleagues do...

@sourishk07 6 дней назад

It really depends on the team! My work life balance is pretty good, but some nights I do have to work after I get back home!

@da0ud 6 дней назад

Gem of a channel!

@sourishk07 6 дней назад

Thank you so much!

@ShadowD2C 6 дней назад

what a good video, I watched it and bookmarked so I can come back to it when I understand more about the topic

@sourishk07 6 дней назад

Glad it was helpful! What concepts do you feel like you don’t understand yet?

@halchen1439 7 дней назад

I dont know what I did for youtube to randomly bless me with this gem of a channel, but keep your work up man. I love your content, its nice to see people with similar passions.

@sourishk07 6 дней назад

I’m really glad to hear that! Thanks for those kind words

@halchen1439 7 дней назад

This is so cool, im definitely gonna try this when I get my hands on some extra hardware. Amazing video. I can also imagine this must be pretty awesome if youre some sort of scientist/student at a university that needs some number crunching machine since youre not limited to being at your place or some pc lab.

@sourishk07 6 дней назад

Yes, I think it’s a fun project for everyone to try out! I learned a lot of about hardware and the different softwares

@joaoguerreiro9403 7 дней назад

Just found out your channel. Instant follow 🙏🏼 Hope we can see more Computer Science content like this. Thank you ;)

@sourishk07 7 дней назад

Thank you so much for watching! Don't worry, I have many more videos like this planned! Stay tuned :)

@simsimhaningan 8 дней назад

I need help. I tried using the code and the trials are being saved somewhere, but I can't find it. can you tell me where it is getting stored at? Edit: I found it. it was stored in the C:User\(UserName)\AppData\Local\Temp folder.

@sourishk07 7 дней назад

If you're simply running main.py, then the checkpoints should be saved in the same directory as main.py under a folder titled 'output.' Let me know if that's what you were looking for!

@simsimhaningan 7 дней назад

@@sourishk07 what do I do if I can't find the output folder?

@sourishk07 5 дней назад

@@simsimhaningan Are there any errors while running main.py? My guess is you're not in the same folder as main.py when you run it. Make sure you're in the root directory of the repository when you run main.py!

@robertthallium6883 8 дней назад

Why do you move your head so much

@sourishk07 7 дней назад

LMAO idk man...

@AbhishekVerma-kj9hd 8 дней назад

I remembered when my teacher gave me assignment on optimizers I have gone through blogs, papers and videos but everywhere I see different formulas I was so confused but you explained everything at one place very easily.

@sourishk07 7 дней назад

I'm really glad I was able to help!

@Higgsinophysics 9 дней назад

love that title haha

@sourishk07 9 дней назад

Haha thank you!

@orellavie6233 10 дней назад

Nice vid, I'd mention MAS too, to explicity say that Adam at the start is weaker and could fit local minima(until it gets enough data) and SGD peforms well with its stochasity, and then slower, so both methods (peformed nearly like I mentioned in MAS Paper)

@sourishk07 9 дней назад

Thank you for the feedback! These are great things to include in a part 2!

@David-lp3qy 10 дней назад

1.54k subs it's crazy low for this quality remember me when you make it my boy <3

@sourishk07 9 дней назад

Thank you for those kind words! I'm glad you liked the video

@AndreiChegurovRobotics 10 дней назад

Great, great, great!!!

@sourishk07 9 дней назад

Thanks!!!

@sohamkundu9685 10 дней назад

Great video!!

@sourishk07 9 дней назад

Glad you enjoyed it

@ChristProg 10 дней назад

Thank you So much sir. But I will like you to create videos on upconvolutions or transposed convolutions. Thank you for understanding

@sourishk07 10 дней назад

Hi! Thank you for the great video ideas. I'll definitely add those to my list!

@Ryu-ix8qs 11 дней назад

Great video! You forgot to add the paper links in the description :)

@sourishk07 11 дней назад

I'm glad you enjoyed it! And thank you for the catch; The description has been updated!

@punk3900 11 дней назад

is this system good for inference? Llama 70b will run on this? I wonder whether RAM really compensates for the VRAM

@sourishk07 10 дней назад

Hello! That's a good question. Unfortunately, 70b models struggle to run. Llama 13b works pretty well. I think for my next server, I definitely want to prioritize more VRAM

@punk3900 11 дней назад

Hi, what is your experience with this rig? Is it not a problem for the temperature that the case is so tight?

@sourishk07 10 дней назад

The temperature has not been an issue with the same case size

@punk3900 11 дней назад

this is por**graphy

@sourishk07 10 дней назад

LMAO

@punk3900 11 дней назад

I love it! You are also nice to hear and see! :D

@sourishk07 10 дней назад

Haha thank you very much!

@RaunakPandey 11 дней назад

Loved this vid

@sourishk07 11 дней назад

Thank you Raunak!

@rimikadhara6009 11 дней назад

a “week” in the life😭😂

@sourishk07 11 дней назад

Don't expose me 😭😂

@Lena.parhar 11 дней назад

U had me for a sec with the vanilla mangoes😭😭😂😂😂😂😭😭

@sourishk07 11 дней назад

HAHAHA had to keep you on your toes!

@kigas24 11 дней назад

Best ML video title I think I've ever seen haha

@sourishk07 11 дней назад

LOL thank you so much!

@MalTramp 12 дней назад

Nice video :) I appreciate the visual examples of the various optimizers.

@sourishk07 11 дней назад

Glad to hear that!

@samyaid2433 12 дней назад

That's great! I hope to become an AI engineer too.

@sourishk07 11 дней назад

Hi! It's a very fun job! Good luck :)

@alirezahekmati7632 13 дней назад

GOLD!

@sourishk07 13 дней назад

Thank you so much!

@alirezahekmati7632 6 дней назад

@@sourishk07 that would be greate if you create part 2 about how to install wsl2 in windows for deep learning with nvidia wsl drivers

@sourishk07 5 дней назад

@@alirezahekmati7632 From my understanding, the WSL2 drivers come shipped with the NVIDIA drivers for Windows. I didn't have to do any additional setup. I just launched WSL2 and nvidia-smi worked flawlessly

@TEGEKEN 14 дней назад

Nice animations, nice explanations of the math mehind them, i was curious about how different optimizers work but didnt want to spend an hour going through documentations, this video answered most of my questions! One that remains is about the AdamW optimizer, i read that it is practically just a better version of Adam, but didnt really find any intuitive explanations of how it affects training (ideally with graphics like these hahaha). There are not many videos on youtube about it

@sourishk07 13 дней назад

I'm glad I was able to be of help! I hope to make a part 2 where I cover more optimizers such as AdamW! Stay tuned!

@spencerfunk6697 14 дней назад

dude love the video title. came just to comment that. i think i searched something like "who is adam w" when i started my ai journey

@sourishk07 13 дней назад

Haha I'm glad you liked the title. Don't worry I did that too!

@LouisChiaki 15 дней назад

A lot of times in academia, people are just using SGD with momentum but playing around with learning rate scheduling a lot. You don't always want to get the deepest minimum since it can actually give you poor generalizability. That's why Adam isn't that popular when researchers are trying to push to SOTA.

@sourishk07 13 дней назад

Hi! I can only speak to the papers that I've read, but I still seem to see Adam being used a decent amount. Your point about overfitting is valid, but wouldn't the same thing be achieved by using Adam but just training for less iterations?

@LouisChiaki 15 дней назад

The notation for the gradient is a bit weird but nice video!

@sourishk07 5 дней назад

Sorry haha. I wanted to keep it as simple as possible, but maybe I didn't do such a good job at that! Will keep in mind for next time

@nark4837 16 дней назад

You are incredibly intelligent to explain such a complex topic formed of tens of research papers of knowledge in a single 20 minutes video... what the heck!

@sourishk07 15 дней назад

Wow thank you for those kind words! I'm glad you enjoyed the video!

@jcorey333 16 дней назад

This was a really interesting video! I feel like this helped me understand the intuitions behind optimizers, thank you!

@sourishk07 15 дней назад

I really appreciate the comment! Glad you could learn something new!