Тёмный

Machine Learning in C (Episode 1) 

Tsoding Daily
Подписаться 126 тыс.
Просмотров 232 тыс.
50% 1

Опубликовано:

 

23 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 316   
@alexgodson4176
@alexgodson4176 Год назад
"It takes a genius to make something complex sound so simple", Thank you for teaching so well
@Acetyl53
@Acetyl53 Год назад
I disagree. This notion is why everything has turned into edutainment. Terry had a lot to say about that too.
@Acetyl53
@Acetyl53 Год назад
@Cody Rouse I agree with my disagreement and disagree with your disagreement with my disagreement.
@SimGunther
@SimGunther Год назад
The actual quote was "Every genius makes it more complicated. It takes a super-genius to make it simpler."
@samgould8567
@samgould8567 Год назад
@@Acetyl53 Given two people with identical knowledge of a subject, the person who can explain it more thoroughly and understandably to a layperson either has elevated communication abilities, or has a deeper understanding beyond what can easily be measured. In either case, they are demonstrably smarter in what we care about.
@mar4ko07
@mar4ko07 Год назад
If you can explain topic to 5 year old, that means you understand topic.
@samwise1491
@samwise1491 Год назад
1:32:27 The bias is needed because otherwise all your inputs were zero, no matter what your weights were. Y was being calculated as 0*w1 + 0*w2 and then passed through the sigmoid, which S(0) = 0.5. Adding the bias allowed it to provide a non zero input to the sigmoid in that case Of course this is an old stream so I'm sure you figured that out later, just in case anyone watching was curious, great stuff as always Zozin!
@ashwinalagiri-rajan1180
@ashwinalagiri-rajan1180 Год назад
A simpler explanation is that some times you want to move the entire curve rather than just changing the slope of the curve.
@AD-ox4ng
@AD-ox4ng Год назад
Another simple explanation is that when our inputs are in a range of values between A and B, say like BMI values which in most standard cases are between 16-30 or so, it's helpful to standardize the range to something between 0 and 1. The weights are multiplied/divided by the inputs to scale them up or down. We could divide the upper BMI boundary value by 30 to get 1. However, when we divide the lower BMI boundary by 30, we don't get 0. In fact, no matter what number we choose, we can not bring the lower boundary to 0 by multiplication alone. This is because there is a "bias" in the range (or an offset or **addition**) on the range. The bias term is that extra addition/subtraction needed so that we first make sure that the range starts at 0. Then we do the scaling to make it range from 0 to 1.
@ElGnomistico
@ElGnomistico Год назад
I remember from my machine learning classes that the bias term comes from the idea of having a threshold value for the activation. Instead of writing an inequality, you would subtract the threshold from the usual perceptron's linear function (W • x - threshold). The bias is just the negative threshold for mathematical convenience. In fact, the bias can also be thought of as a weight whose input is always 1 (helps understand why you also updated the bias the same as you do with the rest of the weights).
@liondovelearning1391
@liondovelearning1391 Год назад
The 🎉 did I just read?
@Mr4NiceOne
@Mr4NiceOne Год назад
Easily the best introduction to machine learning. Thank you for taking your time to make these!
@623-x7b
@623-x7b Год назад
It's much more interesting to learn machine learning like this than to just use some pre-made library I'm far more interested in the underlying mathematics and algorithms than just some 'cat food' approach to learning where we just get a brief overview of how to use some preexisting technology. The mathematics and algorithms are interesting and worth learning especially if you want to be innovative in any field. While it might not be an 'expert' example is an intuitive explanation which is as in depth if not more so than at the universe level AI course which I have taken. Thanks for the great content!
@alextret6787
@alextret6787 Год назад
Редчайший контент на ютубе. Чистый C, даже не c++. Очень круто
@bossysmaxx3327
@bossysmaxx3327 Год назад
I was waiting for this tutorial for like 7 years finally someone made it, good dude subscribed
@potatopassingby
@potatopassingby Год назад
Zozin, you are a wonderful teacher for anything Computer Science related :) you are teaching in a way that actually helps people understand things, so thank you for your videos. If only Universities had people like you to teach
@hc3d
@hc3d Год назад
Indeed. This is the best ML explanation I have seen so far, finally things are making sense.
@rubyciide5542
@rubyciide5542 11 месяцев назад
Bro this dudes brain is definitely something else
@0ia
@0ia 10 месяцев назад
Is he called Zozin? I thought that was him pronouncing Tsoding with an accent
@RoadToFuture007
@RoadToFuture007 10 месяцев назад
@@0ia I've heard his name is Alexey.
@danv8718
@danv8718 Год назад
As far as I understand it, the reason for using the square, instead of for example the absolute value, is that apart from giving you always positive values so they won't cancel out when you add them up, the square function has some nice properties in terms of calculus, for example, the derivative exists everywhere (this is not the case for the absolute value) and this can be important for implementing algorithms like gradient descent. The reason can't be to amplify any error, even if it's very small because if it's indeed close to zero, and you square it, instead of amplifying it, you'd make even much smaller! But anyway, this was a thoroughly enjoyable intro to ML.
@artemfaduev6228
@artemfaduev6228 10 месяцев назад
I am a ml engineer and you are absolutely correct. The main reason to use square instead of modulus is to take derivatives from any point given in order to calculate and preform a gradient descent, which is used for model optimization. But there are some downsides to it. For example, it really bumps up the error. Imagine you are calculating prices of apartments based on some features provided to you. If error is 1`000$, you will ramp it up to whopping 1`000`000$. That means your model will be affected more by the outliers in your training data and model will be trying to compensate the damage of outlying squared values. That is why ML-engineers often have to make a choice between MSE (mean squared error) or MAE (mean absolute error). If you need more optimization and there are no obvious outliers - pick the first one. If there are a lot of outliers in data, pick MAE to make your model less "emotional" if you could say so :)
@burarum1
@burarum1 10 месяцев назад
@@artemfaduev6228 MSE and MAE are not the only loss functions that exists. MSE/L2 loss means that we assume gaussian noise for the data. Instead of gaussian noise we could use student-t distribution as the noise distribution and use the negative log likelihood (differentiable everywhere) of that as the loss. Student-t has heavier tails (with controllable hyperparameter nu) -> more robust to noise. Also there is something like huber loss.
@AnnasVirtual
@AnnasVirtual Год назад
you should try to model continuous function like sin, cos, or even perlin noise and see if the neural network can act like it
@patrickjdarrow
@patrickjdarrow Год назад
I’ve done this with neural networks. In short, the common neural network with ReLU activations will look like a piecewise function with linear characteristics at the boundaries. In practice this is avoided with sinusoidal output encoding which makes the issue of sinusoid approximation trivial
@ashwinalagiri-rajan1180
@ashwinalagiri-rajan1180 Год назад
@@patrickjdarrow so you approximated sine with a sine 🤣🤣🤣
@patrickjdarrow
@patrickjdarrow Год назад
@@ashwinalagiri-rajan1180 No, I said that's what's done in practice
@ashwinalagiri-rajan1180
@ashwinalagiri-rajan1180 Год назад
@@patrickjdarrow yeah ik i was just joking
@darkdevil905
@darkdevil905 Год назад
I have a degree in Physics and i have a feeling you deeply understand mathematics better than i do lol. Best method for sure is central difference method but it doesnt matter your way of teaching and problem solving absolutely rocks and is the best
@klnnlk1078
@klnnlk1078 Год назад
The mathematics needed to understand what a neural network does is extremely elementary.
@darkdevil905
@darkdevil905 Год назад
@@klnnlk1078 true
@HarishThotakura-n2w
@HarishThotakura-n2w Год назад
Like basics calculus and linear algebra
@scarysticks66
@scarysticks66 Год назад
Not actually, if you go deep to the convolutional nn or other architectures, the math needed there are pretty advanced, like tensor calculus @@klnnlk1078
@Amplefii
@Amplefii 10 месяцев назад
@@klnnlk1078 Well im bad at math so it seems complicated to me but i still love to learn about it. Need to find the time to study some math the American school system didn't do me any favors.
@chjayakrishnajk
@chjayakrishnajk 4 месяца назад
Generally I can't make myself sit and watch your videos entirely because I don't know what you're doing especially C videos, but today I saw this entire video mainly because of how simply you explained it
@johanngambolputty5351
@johanngambolputty5351 Год назад
The nice thing about doing this in c is that you could use OpenCL (c98 syntax) to parallelise certain operations on gpu cores (like you were using a thread pool) without really changing much of the logic (so long as you're not using some language features that aren't available like function pointers).
@llothar68
@llothar68 5 месяцев назад
Madness, you want to use libraries for this.
@potatopassingby
@potatopassingby Год назад
a more intuitive way of understanding when XOR "stagnates at 0.25" is pretty much because the neural network is able to model up to 3 of the 4 states that we want it to be able to model. After being able to model 3 of those states, it absolutely cannot model the 4th one due to it's limitations of how it was built, so that last one takes up 25% of all the scenarios of an XOR gate that we want it to be able to imitate :D so at best it will still have cost 25% (or accuracy 75%)
@teenspirit1
@teenspirit1 11 месяцев назад
01:38:00 @Tsoding Daily The reason you couldn't model XOR with a single neuron is because xor requires a non-linear classifier to separate the two cases. If you adjust them in a 2x2 matrix you can see why: AND: (instead of writing [0 0 0 1]) [0 0 0 1] you can draw a line separating 0s and 1 OR: [0 1 1 1] again, the classifier only needs a straight line XOR: [0 1 1 0] you need some sort of oval shape, a line isn't enough to classify xor.
@Anonymous-fr2op
@Anonymous-fr2op 6 месяцев назад
Yeah, equation of an ellipse maybe?
@sillymesilly
@sillymesilly 3 месяца назад
Is it because the points don't exist on the same axis?
@gabrielmartinsdesouza9078
@gabrielmartinsdesouza9078 Год назад
I've passed the last 7 hours writing code, "following " this video, thanks a lot for this.
@SlinkyD
@SlinkyD Год назад
Watching you made me realize that understanding the definitions and concepts is perhaps the most important part of programming. The second important part is distilling a high level concept down to its base components is close behind. Third is typing. My knuckles hurt from all the vids I watched. Now I wanna watch parametric boxing (all techinique while blindfolded).
@pauleagle97
@pauleagle97 Год назад
Уровень годноты контента зашкаливает, спасибо! Подписался
@The_Savolainen
@The_Savolainen Год назад
Well this is cool! By just using mathematics and the power of computer you build something that was able to predict the next number (even when it was just that the next number 2 times the input) and also something that can recognise logic gates is just mind bogling. And only with 1-3 neurons. I was very interested about this topic before this video and now i am hooked!
@rogo7330
@rogo7330 Год назад
Every word at this channel I'm taking with cup of tea.
@arkadiymel5987
@arkadiymel5987 Год назад
1:37:35 I think it is possible to model XOR with just one neuron if its activation function is non-monotonic, such as a sine function
@landsgevaer
@landsgevaer Год назад
Yeah, essentially a xor b = (a+b) mod 2 suffices, if false=0 and true=1. There, a+b is the linear combination, and mod is the periodic function. Similarly works with sin and cos; and, up to scaling, any activation function with multiple zeros (except the zero constant function).
@revimfadli4666
@revimfadli4666 Год назад
Or if you use a cascaded network with skip connections Or an Elman-Jordan network
@daviskipchirchir1357
@daviskipchirchir1357 Год назад
@@landsgevaer explain this more. I want to see something
@landsgevaer
@landsgevaer Год назад
@@daviskipchirchir1357 I think I wrote it clearly enough, if I say so myself. It doesn't get more precise than when written in a formula. Not sure what the something is that you want to see, but thank you.
@sillymesilly
@sillymesilly 3 месяца назад
yep that worked.
@torphedo6286
@torphedo6286 Год назад
Thanks dude, I tried to write something for AI in C a while ago but it was incredibly difficult because all of the information is Python. Saving this video for later.
@eprst0
@eprst0 Год назад
Try it using c++ 😄
@AD-ox4ng
@AD-ox4ng Год назад
Couldn't you just follow along one of the ML implementation videos in pure python and just translate the code?
@torphedo6286
@torphedo6286 Год назад
@@AD-ox4ng I wasn't aware of any pure python implementation videos, all the info I saw used numpy. Re-implementing matrices while trying to understand the math at the same time sucked.
@alefratat4018
@alefratat4018 Год назад
@@andrewdunbar828 There are a lot of deep learning libraries / frameworks in C that are relatively simple and less heavyweight than the mainstream frameworks in Python.
@filipposcaramuzza2953
@filipposcaramuzza2953 Год назад
About the XOR thing. The way I understood it, is that with a single neuron you can only model a linear equation, i.e. a line. If you try to plot in a 2D graph the inputs for the OR function, that is putting a "0" in the coordinates (0, 0) and "1" in the coordinates (0, 1), (1, 0), (1, 1), you can clearly see that you can "linearly separate" 0s and 1s outputs (this means drawing a line that separates them). If you try to plot the XOR function, instead, you won't be able to linearly separate 0s and 1s on a 2D plot with a single line, but you will need a more complex model, e.g. two neurons. Moreover, the weights can be seen as the angular coefficient of the line and the bias as the intercept.
@sillymesilly
@sillymesilly 3 месяца назад
We don't know if neuron models linear equation.
@sillymesilly
@sillymesilly 3 месяца назад
you can do xor with a single neuron using sine as an activation function
@blacklistnr1
@blacklistnr1 Год назад
1:09:17 activation functions have the main purpose of being non-linear (have weird shapes) Because if you add lines with lines you get more lines, so your 1 trillion deep neuronal network is just as effective as your last brain cell. So with something like ReLU(which is a glorified if) you can have a neuron light up for specific inputs, then in turn trigger other specific neurons to build complexity with every layer.
@michaeldeakin9492
@michaeldeakin9492 Год назад
fp arithmetic for adding lines is also non-linear: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Ae9EKCyI1xU.html in case you haven't seen it, the whole talk is amazing and built on this
@meta_escaping
@meta_escaping Год назад
I learned a lot of concepts watching this, it's a very detailed and awesome explanation, thanks you.
@blackhaze3856
@blackhaze3856 Год назад
This man is the bible of the programming field.
@ammantesfaye
@ammantesfaye Год назад
this is pretty brilliant to understand neural nets from first principles thanks tsoding
@neq141
@neq141 Год назад
I think that he will be making us in c in the next episode, he is becoming too powerful!
@rngQ
@rngQ Год назад
fool, he has already constructed this entire timeline in C
@neq141
@neq141 Год назад
@@rngQ bold of you to assume only _this_ timeline
@diegorocha2186
@diegorocha2186 Год назад
As a Brazilian, I'm glad to know english so I have access to this kind of content!!!! Amazing stuff as usual Mr Uga Buga Developer!
@konigsberg72
@konigsberg72 Год назад
O cara é muito bom mesmo
@antronixful
@antronixful Год назад
nice explanation... the square is used because the variance has units of "stuff we're measuring"², because if your error is eg .8, it'll not be amplified when squaring it
@alexandersemionov5790
@alexandersemionov5790 Год назад
And some people pay for their degree without seeing the fun of exploration. This was really fun and degree worthy material
@kirkeasterson3019
@kirkeasterson3019 11 месяцев назад
I appreciate you using a fixed seed. It made it easy to follow along!
@LBCreateSpace
@LBCreateSpace Месяц назад
I can’t always catch your livestreams cause of time zones, but I really enjoy getting to catch your videos here on RU-vid. (I’m branchwag btw when I hop into twitch! ^.^). I just love the way you explain your thought process.
@cr4zyg3n36
@cr4zyg3n36 Год назад
Love your work. I was looking for some one who is a die hard c fan.
@TsodingDaily
@TsodingDaily Год назад
I absolutely hate C
@deniskhakimov
@deniskhakimov 10 месяцев назад
@@TsodingDaily dude, the more you hate something, the more you like it. I mean, you can't hate someting you don't care about. So you were interested in it enough to start hating it 😂
@mihaicotin3261
@mihaicotin3261 Год назад
Hey! Can you try to do your own build a kernel series! ? Would be a cool thing! Maybe the comunity will Love it . Keep up the good work!❤
@byterbrodTV
@byterbrodTV Год назад
Awesome video and very interesting topic! Can't wait to see the next episode. Next stream "4-bit adder" - i see where we going. We slowly approach to build an entire cpu through neural network :D I like the way you explain complicated things from the ground. I love it. It's worth a lot. Thank you!
@hedlund
@hedlund Год назад
Oh hell yeah, dat timing though. I started doing precisely this just yesterday evening, and in C rather than my go-to C++, at that. Got stuck with backpropagation so I decided to spend tonight doing some more reading. Said reading has now been moved to tomorrow evening :) Edit: Oh, and side-note; have you read The Martian? A quote came to mind. "So it's not just a desert. It's a desert so old it's literally rusting".
@grantpalmer5446
@grantpalmer5446 Год назад
for some reason Ive tried starting and am already stuck aroun the 20 minute mark, I can’t get it to generate a random number 1-10 for some reason? I am using the same code and am confused why it isn’t working, has anyone encountered this problem?
@viacheslavprokopev8192
@viacheslavprokopev8192 Год назад
ML/AI is a function approximation basically.
@TheSpec90
@TheSpec90 7 месяцев назад
49:40 Bias is important to prevent the "bias" to fit in the entire training set, for example the over fitting, it's important to have the bias term so we can avoid many classical overfitting problems, so as far I know the bias is a just a parameter created to avoid this issues and like you said improve the model training. EDIT: The correct therm to avoid the overfit and also the underfit is called regularization therm, and arrives when we split the data into training set and test set (to validate our model) and see that with this therm we can get fastter the correct model (for largers datasets and complex models)
@Czeckie
@Czeckie Год назад
the lore deepens. degree in ferrous metallurgy, really?
@TsodingDaily
@TsodingDaily Год назад
It was more of a CS in Ferrous Metallurgy. It was the closest to CS option near me lol.
@margarethill801
@margarethill801 7 месяцев назад
THANK YOU SO MUCH FOR THIS TUTORIAL!!! I have learnt so much, your explanation and reasoning is very insightful - delivery on subject matter EXCELLENT and humorous :)
@themakabrapl2248
@themakabrapl2248 Год назад
for the 4 bit adder you would need 5 output neurones because if you add 1111 and 1111it's just gonna overflow to 0001 1110 so it won't work correctly
@sillymesilly
@sillymesilly 3 месяца назад
tanh is almost always preferable to sigmoid as the activation function. tanh activation function would give you 0 instead 0.5. Also it would enable your cost function to go to 0 fast.
@simonwagner1426
@simonwagner1426 Год назад
Damn, I just clicked at this video for fun, but got very much hooked! You‘re doing a great job at making these videos entertaining!🙌🏻
@mr_wormhole
@mr_wormhole Год назад
he is using clang to compile, he just earnt a follower!
@hc3d
@hc3d Год назад
"I'm ooga booga software developer (...)" 🤣🤣 23:24 But in all seriousness, great intro explanation at the beginning. Edit: After watching the whole thing, this was the best ML explanation I have ever seen. Looking forward to the next video.
@v8metal
@v8metal Год назад
absolutely genius. amazing stuff. thanks for sharing ! *subscribed*
@Cemonix
@Cemonix Год назад
Couple of days ago I was programming feedforward neural network from scratch in python and I have to say that it was painful and interesting
@arslanrozyjumayev8484
@arslanrozyjumayev8484 Год назад
My man literally has a degree in rust! Must watch content 💯!!!!!!
@hubstrangers3450
@hubstrangers3450 2 месяца назад
Thank you....wow!!! A chemist with exceptional "lower level computing" knowledge, skills and talent.....out of 8.1 billion folks, must be rarest of rarer species(0.01%)......wasn't aware of that, should do some cheminformatics, molecular modelling, medcinal chemistry etc topics with C .....Thx.....
@alexanderzikal7244
@alexanderzikal7244 28 дней назад
Yes, Best explanation ever. Thank You!
@abi3135
@abi3135 6 месяцев назад
2:23:58 That's cool, the model found out this seemingly random configuration that after doing the final boolean simplification gives us x ^ y "OR" neuron actually does --> x & y "NAND" neuron does --> x | y "AND" neuron does --> (~x) & y so after forward-feeding --> (~(x & y)) & (x | y) actually simplifies to x ^ y in the end
@x_vye
@x_vye Год назад
Just as I was wondering if you had any new uploads, not even 1 minute ago!
@valovanonym
@valovanonym Год назад
Wow I commented about this a few weeks ago. I guess it was recorded before but I'm happy to see this vod, I can't wait to have seen it and the next one too.
@bestechdeals4539
@bestechdeals4539 Год назад
He "got a degree in Rust" 🤣😂😂I can't . I subbed
@TransformationPradeepBhardwaj
keep adding videos , you r doing great job for C lovers
@BboyKeny
@BboyKeny Год назад
Human neurons are also activated concurrently doing different thing in different parts of the brain. It's not a number crunching machine only concerned with processing to get to 1 answer for 1 query.
@CodePhiles
@CodePhiles Год назад
this was really amazing, I learn a lot from you, thank you for your all insights.
@michalbotor
@michalbotor Год назад
they picture that you've found was excellent: it showed that distinction between weights and bias is artificial, bias is just another weight. i mean to be truthful, bias has a different purpose than weights; weights control influence of the inputs, bias controls output with the inputs fixed, it shifts the output. but as far as computation goes y = w*x + b is the same as y = w*x + b*1 = [w; b] * [x; 1] = w' * x'. this is super important as gpus are optimized for matrix multiplication and y = w*x is in the form of matrix multiplication.
@michalbotor
@michalbotor Год назад
btw. as far as i understand the reason why we do y = f(w*x) instead of just y = w*x is because the latter is linear i.e. the output is linearly proportional to the input. and not all systems that we want to model are linear, hence the input is funneled through a nonlinear function f to make it nonlinear.
@michalbotor
@michalbotor Год назад
btw. don't understand me wrong, but you would be the best teacher for the future engineers and scientists that i can imagine. i am quite old, but when i see you code, i have this sense of learning through curios discovering, rather than learning by heart. you actually make me wanna learn, and for that i am very much grateful.
@usptact
@usptact Год назад
As someone with degree in machine learning (before it was cool), I found this amusing. Keep it up.
@wagsman9999
@wagsman9999 Год назад
If you ever find yourself in middle America, I would love to buy you lunch!
@KoshisigreOsage-t6t
@KoshisigreOsage-t6t 2 месяца назад
I wish you great success in your health, love and happiness!
@nofeah89
@nofeah89 Год назад
Such an interesting stream!
@dysfunc121
@dysfunc121 Год назад
1:38:50 You can get any boolean function by just describing the inputs that result in a true output, so for XOR with inputs a and b: a b 0 0 -> 0 0 1 -> 1 1 0 -> 1 1 1 -> 0 inputs [0 1] and [1 0] results are true, now just describe these inputs in terms of a and b: not a and b or a and not b XOR = āb+aƀ I believe this is called the "Canonical SoP form" also known as the sum of min terms, if I remember correctly.
@geovanniportilla7159
@geovanniportilla7159 Год назад
can you please describe this in a C algorithm? I think the purpose of these videos are this. Meanly educational. Everybody knows the theory about this!.
@kuijaye
@kuijaye Год назад
1:29:05 I think you can write your result to a file and gnuplot has an option that let's you plot in real time
@coddr4778
@coddr4778 Год назад
wait whats that folder name right at the bottom 😅
@lolcat69
@lolcat69 7 месяцев назад
24:49 also, if the error is low like 0.5, 0.5*0.5=0.25, it will get smaller, so if the error is already small, it will be smaller so the AI thinks it is doing better
@godsiom
@godsiom Год назад
yes, ML && tscoding ! let’s go!!
@yairlevi8469
@yairlevi8469 Год назад
"Ooga Booga, Brute Force" - Tsoding, 2023
@monisprabu1174
@monisprabu1174 Год назад
thank youuuuuu, always wanted a c/c++ machine learning tutorial since python is slow at everything
@gabrielmartins7642
@gabrielmartins7642 Год назад
Hey, student here. How are you typing so efficiently ? I use a custom I3 for my desktop and vim for my text editor. But i am not even 10% of that efficiency, like you can mirror text do dark magic and everything.
@fabian9300
@fabian9300 7 месяцев назад
Quite nice intro to Machine Learning in C, but there's something you missed during the explanation: One does not square the error to amplify it but because we want to calculate the euclidian distance for the error, otherwise if our model's f(xᵢ) was superior to the actual observed value of f(xᵢ) the cost function for xᵢ would return a negative value
@NikolaNevenov86
@NikolaNevenov86 Год назад
I'm at 1h:34 and I don't think saying "I didn't program it to do AND or OR". I mean in a way you are programing it by telling it what it should be the result. The thing it is doing is like finding the optimal values to get to what you told it the result should be(or I might be understanding it wrongly here).
@Simon-cz1jg
@Simon-cz1jg 4 месяца назад
kind of late but mse is used instead of absolute value also because mse is differentiable at x=0, unlike the absolute value function.
@icephonex
@icephonex Год назад
38:48 if you divide dcost expression by eps and then you do w -= rate * dcost where rate = eps, isn't dcost just f(x+eps) - f(x) ?
@Avighna
@Avighna Год назад
50:02 the bias is indeed an extremely necessary thing, let's say your training data was not y = 2x but y = 2x + 1 (so 0 ->1, 1 -> 3, 2 -> 5, ...), then no matter how close to '2' your weight was, it would never be enough to reach a near zero error unless you added on the bias (of 1 in this case). The bias is extremely useful in all ways! Great explanation though, and I'm not holding this against you since you openly admitted to not knowing much.
@TheChucknoxus
@TheChucknoxus Год назад
Loved this video. We need more content like this looking under the hood and removing all the hocus pocus.
@TheSpec90
@TheSpec90 7 месяцев назад
You teach in 2 hours for free much more than 99% of people selling courses out there
@hamzadlm6625
@hamzadlm6625 Год назад
Nope I didn't expect that either, but great work zuzin thank you for the efforts to explain these concepts
@LeCockroach
@LeCockroach 7 месяцев назад
Super useful info, I learned a lot! Thanks for sharing
@StevenMartinGuitar
@StevenMartinGuitar Год назад
Great! Looking forward to more in the series
@sohpol
@sohpol 7 месяцев назад
This is wonderful! More!
@hbibamrani9890
@hbibamrani9890 Год назад
This guy is a real hero!
@sukina5066
@sukina5066 9 месяцев назад
43:07 relatable 1:26:11 loving the journey...
@arjob
@arjob 9 месяцев назад
Thank you from my heart.
@noctavel
@noctavel Год назад
tsoding: "im gonna create a neural network without all this matrixes bullshit" 12:38 - tsoding, line 3
@serial9457
@serial9457 Год назад
Finally, I can put my last brain cell to good use! 😂
@yamantariq
@yamantariq 4 месяца назад
Could someone tell what vim motion he uses at 18:00 I just don't understand. I am talking about how he flips "rand_max" to max_rand" in seemingly one click Thanks in advance
@dmytro.sereda
@dmytro.sereda Год назад
Have author looked into Odin lang? Maybe he mentioned it somewhere. I saw he used Jai before, which is a kinda elder brother of Odin. If not - how to ask Alexey to take a look at it?
@aniketbose4360
@aniketbose4360 7 месяцев назад
Is it Newton raphson method under the hood because what we are doing is just root finding in a probable neighborhood of the actual root
@zedeleyici.1337
@zedeleyici.1337 8 месяцев назад
you are such an amazing teacher
@dimak76
@dimak76 10 месяцев назад
just curious, could we just call sigmoid function once like sigmoid(forward()) within cost() function, instead of calling it the time inside forward() function?
@2dapoint424
@2dapoint424 Год назад
@1:45:35, what scribble pen editor are you using?
@ericchan1026
@ericchan1026 Год назад
The application name is on the top bar and called MyPaint
@k4r4m310.
@k4r4m310. Месяц назад
!! wow !! , you have virtue of a professor, you just have to adjust your weights, your age will adjust your weights, I like your channel, I am going to subscribe and follow you with great interest, I think you. He is a very capable person, when you. Find your way, it must be tremendous, I don't understand how you are in Siberia, when you could be anywhere in the free world. a cordial greeting from Spain
@RahulPawar-dl9wn
@RahulPawar-dl9wn Год назад
Awesome video, superbly making all concepts simple to understand. 👌🏻 I'm following through using termux on my Android phone 😅
@codefast93
@codefast93 Год назад
Amazing lesson. I get the intuition behind it but what is the explanation for substracting dcost from w?
@debajyatidey9468
@debajyatidey9468 Год назад
We want more episodes on this topic
@zombie_engineer
@zombie_engineer 6 месяцев назад
Thanks a lot for this content. Nice T-shirt in the video, btw. Does it have to do with МФТИ logo)? Anyways, the part with adding "eps" variable, while having derivative by rate in parallel is not very clear to me. Should not it be that eps is assigned the value of derivative * rate * -1 each new iteration?
@zombie_engineer
@zombie_engineer 6 месяцев назад
Ok, I have figured this out. W + eps is to measure speed of change and direction of change of cost function in point W, not the speed of change based on the change of W
@ukrustacean
@ukrustacean Год назад
Perfect title indeed 43:12 XD
@robehickmann
@robehickmann Год назад
Can you go through and reverse engineer all of the possibilities of solutions that it comes up with for your XOR gate? Like, how is the 'not valid' gate you found in one instance working in the whole system? All this really says is that the models that we have come up with as humans to describe this stuff are incomplete.
@Fnta_discovery
@Fnta_discovery Год назад
Hi I have a question is it difficult to understand AI using c/C++ which method do you recommend to me !!
Далее
Making a New Deep Learning Framework (ML in C Ep.02)
3:07:01
Using the Art of Dark C to Analyse 3D Models
2:15:03
Просмотров 14 тыс.
НОВЫЙ РОЛИК УЖЕ НА КАНАЛЕ!
00:14
Просмотров 432 тыс.
I tried React and it Ruined My Life
1:19:10
Просмотров 131 тыс.
Is Computer Science still worth it?
20:08
Просмотров 236 тыс.
Watching Neural Networks Learn
25:28
Просмотров 1,3 млн
I regret doing this...
1:20:07
Просмотров 73 тыс.
how NASA writes space-proof code
6:03
Просмотров 2,2 млн
ASMR Programming - Spinning Cube - No Talking
20:45
Просмотров 3,9 млн
I tried Swift and came out a different person
1:56:59
Просмотров 77 тыс.